Newton method and Vanishing GradientRegression problem - too complex for gradient descentSimple ANN visualisationWhy do CNNs with ReLU learn that well?Why is vanishing gradient a problem?What is the relationship between hard-sigmoid function and vanishing gradient descent problem?Error in Neural NetworkFinding perfect weights for modelsHow to understand backpropagation using derivativeWhy Gradient methods work in finding the parameters in Neural Networks?The mix of leaky Relu at the first layers of CNN along with conventional Relu for object detection

Can I use my Chinese passport to enter China after I acquired another citizenship?

Can a Bard use an arcane focus?

Can a Gentile theist be saved?

How will losing mobility of one hand affect my career as a programmer?

Why has "pence" been used in this sentence, not "pences"?

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

Why is .bash_history periodically wiped?

Are Warlocks Arcane or Divine?

does this mean what I think it means - 4th last time

Bob has never been a M before

Is camera lens focus an exact point or a range?

Visiting the UK as unmarried couple

Lightning Web Components - Not available in app builder

A social experiment. What is the worst that can happen?

What is the opposite of 'gravitas'?

What does the "3am" section means in manpages?

How did Monica know how to operate Carol's "designer"?

I2C signal and power over long range (10meter cable)

What is this type of notehead called?

How to deal with or prevent idle in the test team?

Modern Day Chaucer

How to deal with loss of decision making power over a change?

A car is moving at 40 km/h. A fly at 100 km/h, starts from wall towards the car(20 km away)flies to car and back. How many trips can it make?

You're three for three

Newton method and Vanishing Gradient

Regression problem - too complex for gradient descentSimple ANN visualisationWhy do CNNs with ReLU learn that well?Why is vanishing gradient a problem?What is the relationship between hard-sigmoid function and vanishing gradient descent problem?Error in Neural NetworkFinding perfect weights for modelsHow to understand backpropagation using derivativeWhy Gradient methods work in finding the parameters in Neural Networks?The mix of leaky Relu at the first layers of CNN along with conventional Relu for object detection

I read the article on Vanishing Gradient problem, which states that the problem can be rectified by using ReLu based activation function.

Now I am not able to understand that if using ReLu based activation function solves the problem, then why there are so many research papers suggesting the use of Newton's method based optimization algorithms for deep learning instead of Gradient Descent?
While reading research papers, I was having the strong impression that vanishing gradient problem was the core reason for such suggestions but now I am confused whether Newton's method is really needed if Gradient Descent can be modified to rectify all the problems faced during machine learning.

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

add a comment |

I read the article on Vanishing Gradient problem, which states that the problem can be rectified by using ReLu based activation function.

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

add a comment |

I read the article on Vanishing Gradient problem, which states that the problem can be rectified by using ReLu based activation function.

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

I read the article on Vanishing Gradient problem, which states that the problem can be rectified by using ReLu based activation function.

machine-learning neural-network optimization

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

edited Mar 20 at 20:50

Esmailian

1,766115

edited Mar 20 at 20:50

Esmailian

1,766115

edited Mar 20 at 20:50

Esmailian

1,766115

asked Mar 20 at 14:41

Aman

305

New contributor

asked Mar 20 at 14:41

Aman

305

asked Mar 20 at 14:41

Aman

305

New contributor

Aman is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

Why there are so many research papers suggesting the use of Newton's
method based optimization algorithms for deep learning instead of
Gradient Descent?

Newton method has a faster convergence rate than gradient descent, and this is the main reason why it may be suggested as a replacement for gradient descent.

Is Newton's method really needed if Gradient Descent can be modified
to rectify all the problems faced during machine learning?

Existence of vanishing gradient problem depends on the choice of "activation function" and the "depth" of network. Newton method and gradient descent would both face this problem for a function like Sigmoid, since in the flat extremes of Sigmoid both first and second order derivatives are small and exponentially vanishing by depth. In other words, the problem is solved for both methods by the choice of function.

As a side note, 1st- and 2nd-order derivatives of Sigmoid go to zero at the same rate. Here is a graph of Sigmoid and its derivatives; zoom into the extremes.

Historical note. Newton method predates the vanishing gradient problem (which was faced after the introduction of Backpropagation in 60s) by centuries.

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

$begingroup$
In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.
$endgroup$
– Aman
Mar 20 at 15:35

$begingroup$
@Aman removed the controversial parts
$endgroup$
– Esmailian
Mar 20 at 15:51

$begingroup$
which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.
$endgroup$
– Aman
Mar 20 at 15:53

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Aman is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47679%2fnewton-method-and-vanishing-gradient%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Why there are so many research papers suggesting the use of Newton's
method based optimization algorithms for deep learning instead of
Gradient Descent?

Newton method has a faster convergence rate than gradient descent, and this is the main reason why it may be suggested as a replacement for gradient descent.

Is Newton's method really needed if Gradient Descent can be modified
to rectify all the problems faced during machine learning?

As a side note, 1st- and 2nd-order derivatives of Sigmoid go to zero at the same rate. Here is a graph of Sigmoid and its derivatives; zoom into the extremes.

Historical note. Newton method predates the vanishing gradient problem (which was faced after the introduction of Backpropagation in 60s) by centuries.

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

$begingroup$
In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.
$endgroup$
– Aman
Mar 20 at 15:35

$begingroup$
@Aman removed the controversial parts
$endgroup$
– Esmailian
Mar 20 at 15:51

$begingroup$
which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.
$endgroup$
– Aman
Mar 20 at 15:53

add a comment |

Why there are so many research papers suggesting the use of Newton's
method based optimization algorithms for deep learning instead of
Gradient Descent?

Newton method has a faster convergence rate than gradient descent, and this is the main reason why it may be suggested as a replacement for gradient descent.

Is Newton's method really needed if Gradient Descent can be modified
to rectify all the problems faced during machine learning?

As a side note, 1st- and 2nd-order derivatives of Sigmoid go to zero at the same rate. Here is a graph of Sigmoid and its derivatives; zoom into the extremes.

Historical note. Newton method predates the vanishing gradient problem (which was faced after the introduction of Backpropagation in 60s) by centuries.

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

$begingroup$
In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.
$endgroup$
– Aman
Mar 20 at 15:35

$begingroup$
@Aman removed the controversial parts
$endgroup$
– Esmailian
Mar 20 at 15:51

$begingroup$
which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.
$endgroup$
– Aman
Mar 20 at 15:53

add a comment |

Why there are so many research papers suggesting the use of Newton's
method based optimization algorithms for deep learning instead of
Gradient Descent?

Newton method has a faster convergence rate than gradient descent, and this is the main reason why it may be suggested as a replacement for gradient descent.

Is Newton's method really needed if Gradient Descent can be modified
to rectify all the problems faced during machine learning?

As a side note, 1st- and 2nd-order derivatives of Sigmoid go to zero at the same rate. Here is a graph of Sigmoid and its derivatives; zoom into the extremes.

Historical note. Newton method predates the vanishing gradient problem (which was faced after the introduction of Backpropagation in 60s) by centuries.

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

Why there are so many research papers suggesting the use of Newton's
method based optimization algorithms for deep learning instead of
Gradient Descent?

Newton method has a faster convergence rate than gradient descent, and this is the main reason why it may be suggested as a replacement for gradient descent.

Is Newton's method really needed if Gradient Descent can be modified
to rectify all the problems faced during machine learning?

As a side note, 1st- and 2nd-order derivatives of Sigmoid go to zero at the same rate. Here is a graph of Sigmoid and its derivatives; zoom into the extremes.

Historical note. Newton method predates the vanishing gradient problem (which was faced after the introduction of Backpropagation in 60s) by centuries.

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

edited Mar 20 at 16:17

answered Mar 20 at 15:21

Esmailian

1,766115

answered Mar 20 at 15:21

Esmailian

1,766115

answered Mar 20 at 15:21

Esmailian

1,766115

$begingroup$
In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.
$endgroup$
– Aman
Mar 20 at 15:35

$begingroup$
@Aman removed the controversial parts
$endgroup$
– Esmailian
Mar 20 at 15:51

$begingroup$
which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.
$endgroup$
– Aman
Mar 20 at 15:53

add a comment |

$begingroup$
In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.
$endgroup$
– Aman
Mar 20 at 15:35

$begingroup$
@Aman removed the controversial parts
$endgroup$
– Esmailian
Mar 20 at 15:51

$begingroup$
which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.
$endgroup$
– Aman
Mar 20 at 15:53

In this paper, section 2.1 says that scale invariance is important as it eliminates the need to tweak the learning rate, then why scalability of gradient descent is preferred over the scale invariance of newton method.

– Aman
Mar 20 at 15:35

@Aman removed the controversial parts

– Esmailian
Mar 20 at 15:51

which is more worse based on efficiency , vanishing gradient in newton method or vanishing gradient on gradient descent. I'm new to machine learning, so I would appreciate if you can clear or atleast guide me with the above doubt. thank you.

– Aman
Mar 20 at 15:53

add a comment |

Aman is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Aman is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Zl,SvZan3Xm2fylf 2Tpr2JGwRp,gPx3JmM,tjezD4KM75vuCe8,NkVJb3kOHHm0srA 447s,6MpL,xbcA4Va5 rIxP,OsH6qp94VtphVxy

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1