Different learning rate for each of the layers? The 2019 Stack Overflow Developer Survey Results Are InChoosing a learning ratepossible to use different learning rate for different neuron in keras/tensorflow?PyTorch vs. Tensorflow FoldWhat is the purpose of setting an initial weight on deep learning model?Neural Network Learning Rate vs Q-Learning Learning RateWhat is the different between Fine-tuning and Transfer-learning?Why is the learning rate for the bias usually twice as large as the the LR for the weights?Is there a way to set a different activation function for each hidden unit in one layer in keras?Is GEMM used in Tensorflow, Theano, PytorchIs it a good practice to always apply `ReduceLROnPlateau()`, given that models benefit from reducing learning rate once learning stagnates?

What do hard-Brexiteers want with respect to the Irish border?

Does it makes sense to buy a new cycle to learn riding?

What is the meaning of Triage in Cybersec world?

Is this food a bread or a loaf?

Is flight data recorder erased after every flight?

Is an up-to-date browser secure on an out-of-date OS?

I am seven letter word. Find me Who Am I?

Where to refill my bottle in India?

Is three citations per paragraph excessive for undergraduate research paper?

Lethal sonic weapons

Could a US political party gain complete control over the government by removing checks & balances?

How to deal with fear of taking dependencies

It's possible to run Ubuntu straight from a USB stick and use the same stick as HDD?

Time travel alters history but people keep saying nothing's changed

Monty Hall variation

Do characters know how to read/write languages or just speak them?

What is the purpose of the constant in the probability density function

How come people say “Would of”?

What are the motivations for publishing new editions of an existing textbook, beyond new discoveries in a field?

Why do UK politicians seemingly ignore opinion polls on Brexit?

Access elements in std::string where positon of string is greater than its size

Is "plugging out" electronic devices an American expression?

What is the steepest gradient that a canal can be traversable without locks?

How to implement Time Picker in Magento 2 Admin system.xml?



Different learning rate for each of the layers?



The 2019 Stack Overflow Developer Survey Results Are InChoosing a learning ratepossible to use different learning rate for different neuron in keras/tensorflow?PyTorch vs. Tensorflow FoldWhat is the purpose of setting an initial weight on deep learning model?Neural Network Learning Rate vs Q-Learning Learning RateWhat is the different between Fine-tuning and Transfer-learning?Why is the learning rate for the bias usually twice as large as the the LR for the weights?Is there a way to set a different activation function for each hidden unit in one layer in keras?Is GEMM used in Tensorflow, Theano, PytorchIs it a good practice to always apply `ReduceLROnPlateau()`, given that models benefit from reducing learning rate once learning stagnates?










2












$begingroup$


I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.



What are the benefits of that approach?










share|improve this question











$endgroup$
















    2












    $begingroup$


    I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.



    What are the benefits of that approach?










    share|improve this question











    $endgroup$














      2












      2








      2





      $begingroup$


      I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.



      What are the benefits of that approach?










      share|improve this question











      $endgroup$




      I noticed that some popular deep learning frameworks like Keras or Pytorch allow you to set different learning rate for each layer.



      What are the benefits of that approach?







      machine-learning neural-network deep-learning keras pytorch






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 27 at 8:10









      Vaalizaadeh

      7,55062263




      7,55062263










      asked Feb 27 at 8:00









      Daniel ChepenkoDaniel Chepenko

      1615




      1615




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46305%2fdifferent-learning-rate-for-each-of-the-layers%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.






            share|improve this answer









            $endgroup$

















              0












              $begingroup$

              In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.






              share|improve this answer









              $endgroup$















                0












                0








                0





                $begingroup$

                In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.






                share|improve this answer









                $endgroup$



                In trivial update rules like gradient descent, the learning rate is important and it somehow specifies the speed you go downhill. In popular papers like Adam optimisation technique, and in non-paperised(!) popular solution namely RMSProp the authors cared that the slope of different features may vary differently and in a direction you may need to go faster due to its slope. Consequently, They decided to set the learning rate and update each parameter based on its own slope and this learning rate is somehow affected by the slope of each direction independently to the other dimensions. The motivation is this. As far as I know, you just need to set the learning rate for your optimisation and it will be adapted by itself.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Feb 27 at 8:09









                VaalizaadehVaalizaadeh

                7,55062263




                7,55062263



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46305%2fdifferent-learning-rate-for-each-of-the-layers%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High