Gradient Descent ConvergenceDoes gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?
Is it possible to measure lightning discharges as Nikola Tesla?
Python "triplet" dictionary?
Is GOCE a satellite or aircraft?
Past Perfect Tense
How can Republicans who favour free markets, consistently express anger when they don't like the outcome of that choice?
What's the polite way to say "I need to urinate"?
Pulling the rope with one hand is as heavy as with two hands?
Subtleties of choosing the sequence of tenses in Russian
When to use 1/Ka vs Kb
Feels like I am getting dragged in office politics
Does a creature that is immune to a condition still make a saving throw?
Is thermodynamics only applicable to systems in equilibrium?
Examples of non trivial equivalence relations , I mean equivalence relations without the expression " same ... as" in their definition?
Find the coordinate of two line segments that are perpendicular
How to create an ad-hoc wireless network in Ubuntu
Did Henry V’s archers at Agincourt fight with no pants / breeches on because of dysentery?
Given what happens in Endgame, why doesn't Dormammu come back to attack the universe?
Build a trail cart
Multiple options for Pseudonyms
When did stoichiometry begin to be taught in U.S. high schools?
Transfer over $10k
Was it really necessary for the Lunar Module to have 2 stages?
What word means to make something obsolete?
A question regarding using the definite article
Gradient Descent Convergence
Does gradient descent always converge to an optimum?Stochastic gradient descent based on vector operations?procedure for gradient descentRegression problem - too complex for gradient descentIntuition in Backpropagation (gradient descent)Algorithm to apply Lasso and Ridge in Gradient descentGradient descent and partial derivativesStochastic Gradient Descent BatchingLinear classifier and gradient descentHow do I approach learning Data Science/ML the 'rightest' way?
$begingroup$
I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?
machine-learning regression gradient-descent
$endgroup$
add a comment |
$begingroup$
I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?
machine-learning regression gradient-descent
$endgroup$
add a comment |
$begingroup$
I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?
machine-learning regression gradient-descent
$endgroup$
I'm a double major in Math and CS interested in Machine Learning. I'm currently taking the popular Coursera course by Prof. Andrew. He's talking and explaining Gradient Descent but I can't avoid noticing a few things. With my math background, I know that if I'm trying to find the global min/max of a function, I must first find all the critical points first. The course talks about convergence of GD, but is it really guaranteed to converge to the global min? How do I know it won't get stuck at a saddle point? Wouldn't be safer to do a 2nd derivative test to test it? If my function is differentiable it seems reasonable it converges to a local min, but not to the global min. I have tried looking for a better explanation but everyone seems to take it for granted without questioning. Can someone point me in the right direction?
machine-learning regression gradient-descent
machine-learning regression gradient-descent
asked Mar 26 at 3:24
bladeblade
1084
1084
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.
For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.
$endgroup$
add a comment |
$begingroup$
If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:
Link
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47987%2fgradient-descent-convergence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.
For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.
$endgroup$
add a comment |
$begingroup$
Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.
For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.
$endgroup$
add a comment |
$begingroup$
Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.
For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.
$endgroup$
Gradient Descent does not always converge to Global minima. It only converges if function is convex and learning rate is appropriate.
For most real life problems, function will have local minimums and we need to run training multiple times. One of the reason is to avoid local minima.
answered Mar 26 at 4:36
Shamit VermaShamit Verma
1,6891414
1,6891414
add a comment |
add a comment |
$begingroup$
If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:
Link
$endgroup$
add a comment |
$begingroup$
If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:
Link
$endgroup$
add a comment |
$begingroup$
If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:
Link
$endgroup$
If you use a version called Backtracking Gradient Descent, then convergence to one single local minimum can be proven in most cases for most functions, including all Morse functions. Under the same assumptions, you can also prove convergence for backtracking versions of Momentum and NAG. More details can be found in my answer and the cited paper, as well as link to source codes on GitHub, in this link:
Link
answered Apr 9 at 3:31
TuyenTuyen
313
313
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47987%2fgradient-descent-convergence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown