Gradient Descent ConvergenceDoes gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?

Is it possible to measure lightning discharges as Nikola Tesla?

Python "triplet" dictionary?

Is GOCE a satellite or aircraft?

Past Perfect Tense

How can Republicans who favour free markets, consistently express anger when they don't like the outcome of that choice?

What's the polite way to say "I need to urinate"?

Pulling the rope with one hand is as heavy as with two hands?

Subtleties of choosing the sequence of tenses in Russian

When to use 1/Ka vs Kb

Feels like I am getting dragged in office politics

Does a creature that is immune to a condition still make a saving throw?

Is thermodynamics only applicable to systems in equilibrium?

Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?

Find the coordinate of two line segments that are perpendicular

How to create an ad-hoc wireless network in Ubuntu

Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?

Given what happens in Endgame, why doesn't Dormammu come back to attack the universe?

Build a trail cart

Multiple options for Pseudonyms

When did stoichiometry begin to be taught in U.S. high schools?

Transfer over $10k

Was it really necessary for the Lunar Module to have 2 stages?

What word means to make something obsolete?

A question regarding using the definite article

Gradient Descent Convergence

Does gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?

I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?

asked Mar 26 at 3:24

blade

1084

add a comment |

asked Mar 26 at 3:24

blade

1084

add a comment |

asked Mar 26 at 3:24

blade

1084

machine-learning regression gradient-descent

asked Mar 26 at 3:24

blade

1084

asked Mar 26 at 3:24

blade

1084

asked Mar 26 at 3:24

blade

1084

asked Mar 26 at 3:24

blade

1084

asked Mar 26 at 3:24

blade

1084

add a comment |

2 Answers
2

active

oldest

votes

Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.

For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.

answered Mar 26 at 4:36

Shamit Verma

1,6891414

add a comment |

If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:

Link

answered Apr 9 at 3:31

Tuyen

313

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47987%2fgradient-descent-convergence%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.

For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.

answered Mar 26 at 4:36

Shamit Verma

1,6891414

add a comment |

Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.

For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.

answered Mar 26 at 4:36

Shamit Verma

1,6891414

add a comment |

Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.

For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.

answered Mar 26 at 4:36

Shamit Verma

1,6891414

Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.

For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.

answered Mar 26 at 4:36

Shamit Verma

1,6891414

answered Mar 26 at 4:36

Shamit Verma

1,6891414

answered Mar 26 at 4:36

Shamit Verma

1,6891414

answered Mar 26 at 4:36

Shamit Verma

1,6891414

add a comment |

Link

answered Apr 9 at 3:31

Tuyen

313

add a comment |

Link

answered Apr 9 at 3:31

Tuyen

313

add a comment |

Link

answered Apr 9 at 3:31

Tuyen

313

Link

answered Apr 9 at 3:31

Tuyen

313

answered Apr 9 at 3:31

Tuyen

313

answered Apr 9 at 3:31

Tuyen

313

answered Apr 9 at 3:31

Tuyen

313

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

J,BQII Hq fnmm,Xec,pr

搜尋此網誌

Trjtdtk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2