Different learning rate for each of the layers? The 2019 Stack Overflow Developer Survey Results Are InChoosing a learning ratepossible to use different learning rate for different neuron in keras/tensorflow?PyTorch vs. Tensorflow FoldWhat is the purpose of setting an initial weight on deep learning model?Neural Network Learning Rate vs Q-Learning Learning RateWhat is the different between Fine-tuning and Transfer-learning?Why is the learning rate for the bias usually twice as large as the the LR for the weights?Is there a way to set a different activation function for each hidden unit in one layer in keras?Is GEMM used in Tensorflow, Theano, PytorchIs it a good practice to always apply `ReduceLROnPlateau()`, given that models benefit from reducing learning rate once learning stagnates?
What do hard-Brexiteers want with respect to the Irish border?
Does it makes sense to buy a new cycle to learn riding?
What is the meaning of Triage in Cybersec world?
Is this food a bread or a loaf?
Is flight data recorder erased after every flight?
Is an up-to-date browser secure on an out-of-date OS?
I am seven letter word. Find me Who Am I?
Where to refill my bottle in India?
Is three citations per paragraph excessive for undergraduate research paper?
Lethal sonic weapons
Could a US political party gain complete control over the government by removing checks & balances?
How to deal with fear of taking dependencies
It's possible to run Ubuntu straight from a USB stick and use the same stick as HDD?
Time travel alters history but people keep saying nothing's changed
Monty Hall variation
Do characters know how to read/write languages or just speak them?
What is the purpose of the constant in the probability density function
How come people say “Would of”?
What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?
Why do UK politicians seemingly ignore opinion polls on Brexit?
Access elements in std::string where positon of string is greater than its size
Is "plugging out" electronic devices an American expression?
What is the steepest gradient that a canal can be traversable without locks?
How to implement Time Picker in Magento 2 Admin system.xml?
Different learning rate for each of the layers?
The 2019 Stack Overflow Developer Survey Results Are InChoosing a learning ratepossible to use different learning rate for different neuron in keras/tensorflow?PyTorch vs. Tensorflow FoldWhat is the purpose of setting an initial weight on deep learning model?Neural Network Learning Rate vs Q-Learning Learning RateWhat is the different between Fine-tuning and Transfer-learning?Why is the learning rate for the bias usually twice as large as the the LR for the weights?Is there a way to set a different activation function for each hidden unit in one layer in keras?Is GEMM used in Tensorflow, Theano, PytorchIs it a good practice to always apply `ReduceLROnPlateau()`, given that models benefit from reducing learning rate once learning stagnates?
$begingroup$
I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.
What are the benefits of that approach?
machine-learning neural-network deep-learning keras pytorch
$endgroup$
add a comment |
$begingroup$
I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.
What are the benefits of that approach?
machine-learning neural-network deep-learning keras pytorch
$endgroup$
add a comment |
$begingroup$
I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.
What are the benefits of that approach?
machine-learning neural-network deep-learning keras pytorch
$endgroup$
I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.
What are the benefits of that approach?
machine-learning neural-network deep-learning keras pytorch
machine-learning neural-network deep-learning keras pytorch
edited Feb 27 at 8:10
Vaalizaadeh
7,55062263
7,55062263
asked Feb 27 at 8:00
Daniel ChepenkoDaniel Chepenko
1615
1615
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46305%2fdifferent-learning-rate-for-each-of-the-layers%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.
$endgroup$
add a comment |
$begingroup$
In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.
$endgroup$
add a comment |
$begingroup$
In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.
$endgroup$
In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.
answered Feb 27 at 8:09
VaalizaadehVaalizaadeh
7,55062263
7,55062263
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46305%2fdifferent-learning-rate-for-each-of-the-layers%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown