Gradient Descent ConvergenceDoes gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?

Is it possible to measure lightning discharges as Nikola Tesla?

Python "triplet" dictionary?

Is GOCE a satellite or aircraft?

Past Perfect Tense

How can Republicans who favour free markets, consistently express anger when they don't like the outcome of that choice?

What's the polite way to say "I need to urinate"?

Pulling the rope with one hand is as heavy as with two hands?

Subtleties of choosing the sequence of tenses in Russian

When to use 1/Ka vs Kb

Feels like I am getting dragged in office politics

Does a creature that is immune to a condition still make a saving throw?

Is thermodynamics only applicable to systems in equilibrium?

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Find the coordinate of two line segments that are perpendicular

How to create an ad-hoc wireless network in Ubuntu

Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?

Given what happens in Endgame, why doesn't Dormammu come back to attack the universe?

Build a trail cart

Multiple options for Pseudonyms

When did stoichiometry begin to be taught in U.S. high schools?

Transfer over $10k

Was it really necessary for the Lunar Module to have 2 stages?

What word means to make something obsolete?

A question regarding using the definite article



Gradient Descent Convergence


Does gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?













1












$begingroup$


I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?










share|improve this question









$endgroup$
















    1












    $begingroup$


    I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?










      share|improve this question









      $endgroup$




      I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?







      machine-learning regression gradient-descent






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 26 at 3:24









      bladeblade

      1084




      1084




















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.



          For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.






          share|improve this answer









          $endgroup$




















            0












            $begingroup$

            If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:



            Link






            share|improve this answer









            $endgroup$













              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47987%2fgradient-descent-convergence%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.



              For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.






              share|improve this answer









              $endgroup$

















                1












                $begingroup$

                Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.



                For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.






                share|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.



                  For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.






                  share|improve this answer









                  $endgroup$



                  Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.



                  For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 26 at 4:36









                  Shamit VermaShamit Verma

                  1,6891414




                  1,6891414





















                      0












                      $begingroup$

                      If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:



                      Link






                      share|improve this answer









                      $endgroup$

















                        0












                        $begingroup$

                        If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:



                        Link






                        share|improve this answer









                        $endgroup$















                          0












                          0








                          0





                          $begingroup$

                          If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:



                          Link






                          share|improve this answer









                          $endgroup$



                          If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:



                          Link







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Apr 9 at 3:31









                          TuyenTuyen

                          313




                          313



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47987%2fgradient-descent-convergence%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                              Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                              Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?