What should I observe when choosing which optimizer suits my Deep Neural Network model? The 2019 Stack Overflow Developer Survey Results Are InGuidelines for selecting an optimizer for training neural networksCross validation when training neural network?Test accuracy of neural net is going up and downTensorFlow: Regression using Deep Neural NetworkError in Neural NetworkHow to improve loss and avoid overfittingAdjusted coefficient pearson as CNN loss functionWhen does decision tree perform better than the neural network?What are the reasons of select a optimizer to be SGD or Adam in DQN?Why?Why is my generator loss function increasing with iterations?

What information about me do stores get via my credit card?

"consumers choosing to rely" vs. "consumers to choose to rely"

Falsification in Math vs Science

Why isn't airport relocation done gradually?

I am eight letters word. Find me who Am I?

How to type a long/em dash `—`

Why was M87 targeted for the Event Horizon Telescope instead of Sagittarius A*?

Keeping a retro style to sci-fi spaceships?

Why not take a picture of a closer black hole?

How to type this arrow in math mode?

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Can we generate random numbers using irrational numbers like π and e?

How to display lines in a file like ls displays files in a directory?

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Why is the Constellation's nose gear so long?

What do hard-Brexiteers want with respect to the Irish border?

How to obtain a position of last non-zero element

Did Scotland spend $250,000 for the slogan "Welcome to Scotland"?

If a sorcerer casts the Banishment spell on a PC while in Avernus, does the PC return to their home plane?

Worn-tile Scrabble

How would you translate 「腰掛で仕事をする」

Outer glow on 3 sides of a rectangular shape

Is it possible for absolutely everyone to attain enlightenment?

"as much details as you can remember"



What should I observe when choosing which optimizer suits my Deep Neural Network model?



The 2019 Stack Overflow Developer Survey Results Are InGuidelines for selecting an optimizer for training neural networksCross validation when training neural network?Test accuracy of neural net is going up and downTensorFlow: Regression using Deep Neural NetworkError in Neural NetworkHow to improve loss and avoid overfittingAdjusted coefficient pearson as CNN loss functionWhen does decision tree perform better than the neural network?What are the reasons of select a optimizer to be SGD or Adam in DQN?Why?Why is my generator loss function increasing with iterations?










1












$begingroup$


I have trained my neural network model with optimizers such as RMSProp, AdaGrad, Momentum, and Adam.



Currently, after running the code, I have printed out the Train and Test Accuracy of every epoch (50 in my case). However, I would like to know how should I determine which of these optimizers performs the best?



Does a higher train accuracy at the last epoch determine which is best or would a higher test accuracy do so? Also, I observed that when using the Momentum optimizer, the model train accuracy reached its' highest around 0.91 in the 16th epoch compared to the other optimizer.



Hence, would that conclude that the Momentum optimizer performs best in this case?










share|improve this question











$endgroup$
















    1












    $begingroup$


    I have trained my neural network model with optimizers such as RMSProp, AdaGrad, Momentum, and Adam.



    Currently, after running the code, I have printed out the Train and Test Accuracy of every epoch (50 in my case). However, I would like to know how should I determine which of these optimizers performs the best?



    Does a higher train accuracy at the last epoch determine which is best or would a higher test accuracy do so? Also, I observed that when using the Momentum optimizer, the model train accuracy reached its' highest around 0.91 in the 16th epoch compared to the other optimizer.



    Hence, would that conclude that the Momentum optimizer performs best in this case?










    share|improve this question











    $endgroup$














      1












      1








      1


      1



      $begingroup$


      I have trained my neural network model with optimizers such as RMSProp, AdaGrad, Momentum, and Adam.



      Currently, after running the code, I have printed out the Train and Test Accuracy of every epoch (50 in my case). However, I would like to know how should I determine which of these optimizers performs the best?



      Does a higher train accuracy at the last epoch determine which is best or would a higher test accuracy do so? Also, I observed that when using the Momentum optimizer, the model train accuracy reached its' highest around 0.91 in the 16th epoch compared to the other optimizer.



      Hence, would that conclude that the Momentum optimizer performs best in this case?










      share|improve this question











      $endgroup$




      I have trained my neural network model with optimizers such as RMSProp, AdaGrad, Momentum, and Adam.



      Currently, after running the code, I have printed out the Train and Test Accuracy of every epoch (50 in my case). However, I would like to know how should I determine which of these optimizers performs the best?



      Does a higher train accuracy at the last epoch determine which is best or would a higher test accuracy do so? Also, I observed that when using the Momentum optimizer, the model train accuracy reached its' highest around 0.91 in the 16th epoch compared to the other optimizer.



      Hence, would that conclude that the Momentum optimizer performs best in this case?







      machine-learning neural-network deep-learning tensorflow






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 29 at 20:58









      Ethan

      701625




      701625










      asked Mar 29 at 16:40









      MaxxxMaxxx

      1273




      1273




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          High training score is not an indication of model performance, high test score is. Also, a faster convergence to the same, or better test score is an indication of optimizer performance.



          Therefore, if Momentum optimizer reaches a better test score faster, it definitely means that it is the best out of those tested optimizers.



          As a side note, be careful about the choice of "score", for example using "accuracy" for imbalanced classes is not a good choice, since it equates %1 error in a class with 100 members to 10% error in a class with 10 members. If classes are equally important, i.e. 1% error is equally bad for all classes, macro-f1 and AUC would be better replacements.



          An important note: when we use a score to select a hyper-parameter or an optimizer, final model will be affected, thus, that score would be called validation score, not test score.






          share|improve this answer











          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48223%2fwhat-should-i-observe-when-choosing-which-optimizer-suits-my-deep-neural-network%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2












            $begingroup$

            High training score is not an indication of model performance, high test score is. Also, a faster convergence to the same, or better test score is an indication of optimizer performance.



            Therefore, if Momentum optimizer reaches a better test score faster, it definitely means that it is the best out of those tested optimizers.



            As a side note, be careful about the choice of "score", for example using "accuracy" for imbalanced classes is not a good choice, since it equates %1 error in a class with 100 members to 10% error in a class with 10 members. If classes are equally important, i.e. 1% error is equally bad for all classes, macro-f1 and AUC would be better replacements.



            An important note: when we use a score to select a hyper-parameter or an optimizer, final model will be affected, thus, that score would be called validation score, not test score.






            share|improve this answer











            $endgroup$

















              2












              $begingroup$

              High training score is not an indication of model performance, high test score is. Also, a faster convergence to the same, or better test score is an indication of optimizer performance.



              Therefore, if Momentum optimizer reaches a better test score faster, it definitely means that it is the best out of those tested optimizers.



              As a side note, be careful about the choice of "score", for example using "accuracy" for imbalanced classes is not a good choice, since it equates %1 error in a class with 100 members to 10% error in a class with 10 members. If classes are equally important, i.e. 1% error is equally bad for all classes, macro-f1 and AUC would be better replacements.



              An important note: when we use a score to select a hyper-parameter or an optimizer, final model will be affected, thus, that score would be called validation score, not test score.






              share|improve this answer











              $endgroup$















                2












                2








                2





                $begingroup$

                High training score is not an indication of model performance, high test score is. Also, a faster convergence to the same, or better test score is an indication of optimizer performance.



                Therefore, if Momentum optimizer reaches a better test score faster, it definitely means that it is the best out of those tested optimizers.



                As a side note, be careful about the choice of "score", for example using "accuracy" for imbalanced classes is not a good choice, since it equates %1 error in a class with 100 members to 10% error in a class with 10 members. If classes are equally important, i.e. 1% error is equally bad for all classes, macro-f1 and AUC would be better replacements.



                An important note: when we use a score to select a hyper-parameter or an optimizer, final model will be affected, thus, that score would be called validation score, not test score.






                share|improve this answer











                $endgroup$



                High training score is not an indication of model performance, high test score is. Also, a faster convergence to the same, or better test score is an indication of optimizer performance.



                Therefore, if Momentum optimizer reaches a better test score faster, it definitely means that it is the best out of those tested optimizers.



                As a side note, be careful about the choice of "score", for example using "accuracy" for imbalanced classes is not a good choice, since it equates %1 error in a class with 100 members to 10% error in a class with 10 members. If classes are equally important, i.e. 1% error is equally bad for all classes, macro-f1 and AUC would be better replacements.



                An important note: when we use a score to select a hyper-parameter or an optimizer, final model will be affected, thus, that score would be called validation score, not test score.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 31 at 19:56

























                answered Mar 29 at 17:21









                EsmailianEsmailian

                3,001320




                3,001320



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48223%2fwhat-should-i-observe-when-choosing-which-optimizer-suits-my-deep-neural-network%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                    Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?