Scaling neural networks2019 Community Moderator ElectionHow to Scaling Out Artifical Neural Networks?Why is vanishing gradient a problem?Reporting test result for cross-validation with Neural NetworkHow to set the number of neurons and layers in neural networksIs it possible to use NEAT networks for solving video games?Neural network only converges when data cloud is close to 0How much neural network theory required to design one?Scaling features in artificial neural networksHow to train the generator in a recurrent GAN (Keras)What kinds of math do I need to know to construct graph that preserve its directed simplicies at each time step?

Customer Requests (Sometimes) Drive Me Bonkers!

when is out of tune ok?

Two monoidal structures and copowering

How to write papers efficiently when English isn't my first language?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

Increase performance creating Mandelbrot set in python

Lay out the Carpet

How do I rename a Linux host without needing to reboot for the rename to take effect?

Sort a list by elements of another list

Is there a korbon needed for conversion?

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

Class Action - which options I have?

Pre-amplifier input protection

How to pronounce the slash sign

What is the best translation for "slot" in the context of multiplayer video games?

How easy is it to start Magic from scratch?

How does the UK government determine the size of a mandate?

Term for the "extreme-extension" version of a straw man fallacy?

Is HostGator storing my password in plaintext?

Is expanding the research of a group into machine learning as a PhD student risky?

How to be diplomatic in refusing to write code that breaches the privacy of our users

Roman Numeral Treatment of Suspensions

How do scammers retract money, while you can’t?

What can we do to stop prior company from asking us questions?



Scaling neural networks



2019 Community Moderator ElectionHow to Scaling Out Artifical Neural Networks?Why is vanishing gradient a problem?Reporting test result for cross-validation with Neural NetworkHow to set the number of neurons and layers in neural networksIs it possible to use NEAT networks for solving video games?Neural network only converges when data cloud is close to 0How much neural network theory required to design one?Scaling features in artificial neural networksHow to train the generator in a recurrent GAN (Keras)What kinds of math do I need to know to construct graph that preserve its directed simplicies at each time step?










3












$begingroup$


While using Neural Networks (TensorFlow: Deep Neural Regressor), when increasing your training data size from a sample to the whole data (say a 10x larger dataset), what changes should you make to the model architecture (deeper/wider), learning rate and hyper parameters in general?



How much of trial and error how much of heuristic logic is involved in making these changes?










share|improve this question









New contributor




Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    3












    $begingroup$


    While using Neural Networks (TensorFlow: Deep Neural Regressor), when increasing your training data size from a sample to the whole data (say a 10x larger dataset), what changes should you make to the model architecture (deeper/wider), learning rate and hyper parameters in general?



    How much of trial and error how much of heuristic logic is involved in making these changes?










    share|improve this question









    New contributor




    Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      3












      3








      3





      $begingroup$


      While using Neural Networks (TensorFlow: Deep Neural Regressor), when increasing your training data size from a sample to the whole data (say a 10x larger dataset), what changes should you make to the model architecture (deeper/wider), learning rate and hyper parameters in general?



      How much of trial and error how much of heuristic logic is involved in making these changes?










      share|improve this question









      New contributor




      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      While using Neural Networks (TensorFlow: Deep Neural Regressor), when increasing your training data size from a sample to the whole data (say a 10x larger dataset), what changes should you make to the model architecture (deeper/wider), learning rate and hyper parameters in general?



      How much of trial and error how much of heuristic logic is involved in making these changes?







      machine-learning neural-network deep-learning tensorflow hyperparameter-tuning






      share|improve this question









      New contributor




      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Mar 21 at 9:00







      Sharan













      New contributor




      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 21 at 5:05









      SharanSharan

      163




      163




      New contributor




      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Sharan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          3 Answers
          3






          active

          oldest

          votes


















          1












          $begingroup$

          A good point is following rule:



          Your network should be capable of overfitting on your training data. When you can not not overfit on your training-data you should increase your depth/width. But it is hard to say by how much, it is sometimes more an art than a science.



          Of course it does not mean that you should overfit on your data.






          share|improve this answer









          $endgroup$




















            0












            $begingroup$

            I don't think you ought to change much in the model definition.



            You should, however, consider the amount of time it takes to train on the complete dataset. If it takes too long and you are still in the testing phase, you want to reduce the number of epochs to obtain results faster and make changes in the model accordingly.



            I suggest plotting all the metrics and try to understand if the trend is positive or negative. If it's positive the changes you are doing are correct of course!



            Then, once you are happy with the hyperparameters, put like epochs=100 and leave the model to train for a whole night, then plot again the learning curves and decide when to stop earlier, or use early_stopping.






            share|improve this answer









            $endgroup$




















              0












              $begingroup$

              The depth and width of your DNN are used to model the complexity and not the size of your data. So, if you are already in a situation where you have enough data to sufficiently train your model, increasing the size of the training data does not require you to change anything, except maybe reducing the number of epochs. For example, to model the data complexity of the MNIST dataset you will not need hundreds of layers, even if you would have billions of images to train on.



              However, there is a situation in which increasing the depth and width can make sense: If you first did not have a lot of data, and therefore you created a small DNN to prevent overfitting which does not sufficiently model the complexity of your data, and then you get a huge amount of additional data, it makes sense to increase the depth and/or width of your DNN.






              share|improve this answer









              $endgroup$












              • $begingroup$
                In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                $endgroup$
                – Sharan
                Mar 22 at 12:56






              • 1




                $begingroup$
                I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                $endgroup$
                – georg_un
                Mar 22 at 14:22











              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              Sharan is a new contributor. Be nice, and check out our Code of Conduct.









              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47712%2fscaling-neural-networks%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1












              $begingroup$

              A good point is following rule:



              Your network should be capable of overfitting on your training data. When you can not not overfit on your training-data you should increase your depth/width. But it is hard to say by how much, it is sometimes more an art than a science.



              Of course it does not mean that you should overfit on your data.






              share|improve this answer









              $endgroup$

















                1












                $begingroup$

                A good point is following rule:



                Your network should be capable of overfitting on your training data. When you can not not overfit on your training-data you should increase your depth/width. But it is hard to say by how much, it is sometimes more an art than a science.



                Of course it does not mean that you should overfit on your data.






                share|improve this answer









                $endgroup$















                  1












                  1








                  1





                  $begingroup$

                  A good point is following rule:



                  Your network should be capable of overfitting on your training data. When you can not not overfit on your training-data you should increase your depth/width. But it is hard to say by how much, it is sometimes more an art than a science.



                  Of course it does not mean that you should overfit on your data.






                  share|improve this answer









                  $endgroup$



                  A good point is following rule:



                  Your network should be capable of overfitting on your training data. When you can not not overfit on your training-data you should increase your depth/width. But it is hard to say by how much, it is sometimes more an art than a science.



                  Of course it does not mean that you should overfit on your data.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 21 at 9:17









                  Andreas LookAndreas Look

                  431110




                  431110





















                      0












                      $begingroup$

                      I don't think you ought to change much in the model definition.



                      You should, however, consider the amount of time it takes to train on the complete dataset. If it takes too long and you are still in the testing phase, you want to reduce the number of epochs to obtain results faster and make changes in the model accordingly.



                      I suggest plotting all the metrics and try to understand if the trend is positive or negative. If it's positive the changes you are doing are correct of course!



                      Then, once you are happy with the hyperparameters, put like epochs=100 and leave the model to train for a whole night, then plot again the learning curves and decide when to stop earlier, or use early_stopping.






                      share|improve this answer









                      $endgroup$

















                        0












                        $begingroup$

                        I don't think you ought to change much in the model definition.



                        You should, however, consider the amount of time it takes to train on the complete dataset. If it takes too long and you are still in the testing phase, you want to reduce the number of epochs to obtain results faster and make changes in the model accordingly.



                        I suggest plotting all the metrics and try to understand if the trend is positive or negative. If it's positive the changes you are doing are correct of course!



                        Then, once you are happy with the hyperparameters, put like epochs=100 and leave the model to train for a whole night, then plot again the learning curves and decide when to stop earlier, or use early_stopping.






                        share|improve this answer









                        $endgroup$















                          0












                          0








                          0





                          $begingroup$

                          I don't think you ought to change much in the model definition.



                          You should, however, consider the amount of time it takes to train on the complete dataset. If it takes too long and you are still in the testing phase, you want to reduce the number of epochs to obtain results faster and make changes in the model accordingly.



                          I suggest plotting all the metrics and try to understand if the trend is positive or negative. If it's positive the changes you are doing are correct of course!



                          Then, once you are happy with the hyperparameters, put like epochs=100 and leave the model to train for a whole night, then plot again the learning curves and decide when to stop earlier, or use early_stopping.






                          share|improve this answer









                          $endgroup$



                          I don't think you ought to change much in the model definition.



                          You should, however, consider the amount of time it takes to train on the complete dataset. If it takes too long and you are still in the testing phase, you want to reduce the number of epochs to obtain results faster and make changes in the model accordingly.



                          I suggest plotting all the metrics and try to understand if the trend is positive or negative. If it's positive the changes you are doing are correct of course!



                          Then, once you are happy with the hyperparameters, put like epochs=100 and leave the model to train for a whole night, then plot again the learning curves and decide when to stop earlier, or use early_stopping.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Mar 21 at 9:17









                          Francesco PegoraroFrancesco Pegoraro

                          60918




                          60918





















                              0












                              $begingroup$

                              The depth and width of your DNN are used to model the complexity and not the size of your data. So, if you are already in a situation where you have enough data to sufficiently train your model, increasing the size of the training data does not require you to change anything, except maybe reducing the number of epochs. For example, to model the data complexity of the MNIST dataset you will not need hundreds of layers, even if you would have billions of images to train on.



                              However, there is a situation in which increasing the depth and width can make sense: If you first did not have a lot of data, and therefore you created a small DNN to prevent overfitting which does not sufficiently model the complexity of your data, and then you get a huge amount of additional data, it makes sense to increase the depth and/or width of your DNN.






                              share|improve this answer









                              $endgroup$












                              • $begingroup$
                                In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                                $endgroup$
                                – Sharan
                                Mar 22 at 12:56






                              • 1




                                $begingroup$
                                I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                                $endgroup$
                                – georg_un
                                Mar 22 at 14:22
















                              0












                              $begingroup$

                              The depth and width of your DNN are used to model the complexity and not the size of your data. So, if you are already in a situation where you have enough data to sufficiently train your model, increasing the size of the training data does not require you to change anything, except maybe reducing the number of epochs. For example, to model the data complexity of the MNIST dataset you will not need hundreds of layers, even if you would have billions of images to train on.



                              However, there is a situation in which increasing the depth and width can make sense: If you first did not have a lot of data, and therefore you created a small DNN to prevent overfitting which does not sufficiently model the complexity of your data, and then you get a huge amount of additional data, it makes sense to increase the depth and/or width of your DNN.






                              share|improve this answer









                              $endgroup$












                              • $begingroup$
                                In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                                $endgroup$
                                – Sharan
                                Mar 22 at 12:56






                              • 1




                                $begingroup$
                                I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                                $endgroup$
                                – georg_un
                                Mar 22 at 14:22














                              0












                              0








                              0





                              $begingroup$

                              The depth and width of your DNN are used to model the complexity and not the size of your data. So, if you are already in a situation where you have enough data to sufficiently train your model, increasing the size of the training data does not require you to change anything, except maybe reducing the number of epochs. For example, to model the data complexity of the MNIST dataset you will not need hundreds of layers, even if you would have billions of images to train on.



                              However, there is a situation in which increasing the depth and width can make sense: If you first did not have a lot of data, and therefore you created a small DNN to prevent overfitting which does not sufficiently model the complexity of your data, and then you get a huge amount of additional data, it makes sense to increase the depth and/or width of your DNN.






                              share|improve this answer









                              $endgroup$



                              The depth and width of your DNN are used to model the complexity and not the size of your data. So, if you are already in a situation where you have enough data to sufficiently train your model, increasing the size of the training data does not require you to change anything, except maybe reducing the number of epochs. For example, to model the data complexity of the MNIST dataset you will not need hundreds of layers, even if you would have billions of images to train on.



                              However, there is a situation in which increasing the depth and width can make sense: If you first did not have a lot of data, and therefore you created a small DNN to prevent overfitting which does not sufficiently model the complexity of your data, and then you get a huge amount of additional data, it makes sense to increase the depth and/or width of your DNN.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Mar 21 at 9:32









                              georg_ungeorg_un

                              315




                              315











                              • $begingroup$
                                In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                                $endgroup$
                                – Sharan
                                Mar 22 at 12:56






                              • 1




                                $begingroup$
                                I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                                $endgroup$
                                – georg_un
                                Mar 22 at 14:22

















                              • $begingroup$
                                In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                                $endgroup$
                                – Sharan
                                Mar 22 at 12:56






                              • 1




                                $begingroup$
                                I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                                $endgroup$
                                – georg_un
                                Mar 22 at 14:22
















                              $begingroup$
                              In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                              $endgroup$
                              – Sharan
                              Mar 22 at 12:56




                              $begingroup$
                              In the case you mentioned, how much of change should be made? (Mainly to the learning rate). Trial and error?
                              $endgroup$
                              – Sharan
                              Mar 22 at 12:56




                              1




                              1




                              $begingroup$
                              I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                              $endgroup$
                              – georg_un
                              Mar 22 at 14:22





                              $begingroup$
                              I don't really see a possibility to express the learning rate as a function of training data size. There should not be a causal relationship between both values (e.g. if you lower your learning rate because you get new data, then the learning rate maybe shouldn't have been so high in the first place). So, it is mostly trial and error and the analysis of learning curves. Also, in general, I would recommend using the learning rate together with a learning rate decay, since it usually yields more stable results.
                              $endgroup$
                              – georg_un
                              Mar 22 at 14:22











                              Sharan is a new contributor. Be nice, and check out our Code of Conduct.









                              draft saved

                              draft discarded


















                              Sharan is a new contributor. Be nice, and check out our Code of Conduct.












                              Sharan is a new contributor. Be nice, and check out our Code of Conduct.











                              Sharan is a new contributor. Be nice, and check out our Code of Conduct.














                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47712%2fscaling-neural-networks%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                              Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                              Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?