How to balance Keras loss functions of different magnitudes The Next CEO of Stack Overflow2019 Community Moderator ElectionKeras' Evaluate function training model on test setKeras categorical_crossentropy loss (and accuracy)Do I Need Pretrained Weights For Keras VGG16?Accuracy drops if more layers trainable - weirdCustom loss function with additional parameter in KerasLosses of keras CNN model is not decreasingSample Importance (Training Weights) in KerasHow exactly does class_weight in Keras work?Beyond one-hot encoding for LSTM model in KerasLoss is bad, but accuracy increases?

square root of the periodic function need to be periodic?

I'm self employed. Can I contribute to my previous employers 401k?

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?

Is 'diverse range' a pleonastic phrase?

Why didn't Khan get resurrected in the Genesis Explosion?

The exact meaning of 'Mom made me a sandwich'

Not able to read bus schedule in France

What happens if you roll doubles 3 times then land on "Go to jail?"

How does Madhvacharya interpret Bhagavad Gita sloka 18.66?

What can we do to stop prior company from asking us questions?

What's the best way to handle refactoring a big file?

Solving system of ODEs with extra parameter

How to get the end in algorithm2e

If the updated MCAS software needs two AOA sensors, doesn't that introduce a new single point of failure?

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

Is it professional to write unrelated content in an almost-empty email?

Is micro rebar a better way to reinforce concrete than rebar?

Why has the US not been more assertive in confronting Russia in recent years?

How did the Bene Gesserit know how to make a Kwisatz Haderach?

In excess I'm lethal

Combine columns from several files into one

If/When UK leaves the EU, can a future goverment conduct a referendum to join the EU?

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

I want to delete every two lines after 3rd lines in file contain very large number of lines :



How to balance Keras loss functions of different magnitudes



The Next CEO of Stack Overflow
2019 Community Moderator ElectionKeras' Evaluate function training model on test setKeras categorical_crossentropy loss (and accuracy)Do I Need Pretrained Weights For Keras VGG16?Accuracy drops if more layers trainable - weirdCustom loss function with additional parameter in KerasLosses of keras CNN model is not decreasingSample Importance (Training Weights) in KerasHow exactly does class_weight in Keras work?Beyond one-hot encoding for LSTM model in KerasLoss is bad, but accuracy increases?










0












$begingroup$


I'm currently trying to do transfer learning using VGG16 with imagenet weights in Keras. I've taken the model up to the 5th block, and then built a two headed network on top of this to both classify images (the class branch, with 4 classes), and localise the item of interest in the image (the loc branch, with 4 outputs). Here's a quick overview of the top of my network: Fully connected network



I'm using categorical_crossentropy for the classification loss, and mean_squared_error for the localisation loss. However, these are of very different magnitudes (around 1, and around 1000, respectively), so I'm unsure of how to set my loss weights, and haven't been able to find much in the Keras manual on this. In this example, the weights are 0.05 for the classification loss, and 0.95 for the localisation loss, producing a combined loss that is dominated by the localisation loss:



Model training summary



I haven't had much luck googling this or looking in the Keras manual so wondered if anyone had any suggestions on how to set the loss weights for a case like this. So far, things have trained ok regardless of how I set them, but I'm curious if best practise it to weight them so they contribute equally (so perhaps 1 for the classification loss, and 0.001 for the localisation loss), or perhaps my weights should inform Keras about how much I care about minimising the loss. The localisation problem is harder than the classification problem so perhaps I should weight localisation more as it's harder, but then the two branches/losses don't actually share any layers (as the 5 blocks of VGG16 are frozen), so I'm still confused.



Any input on this would be gratefully received.










share|improve this question







New contributor




Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    0












    $begingroup$


    I'm currently trying to do transfer learning using VGG16 with imagenet weights in Keras. I've taken the model up to the 5th block, and then built a two headed network on top of this to both classify images (the class branch, with 4 classes), and localise the item of interest in the image (the loc branch, with 4 outputs). Here's a quick overview of the top of my network: Fully connected network



    I'm using categorical_crossentropy for the classification loss, and mean_squared_error for the localisation loss. However, these are of very different magnitudes (around 1, and around 1000, respectively), so I'm unsure of how to set my loss weights, and haven't been able to find much in the Keras manual on this. In this example, the weights are 0.05 for the classification loss, and 0.95 for the localisation loss, producing a combined loss that is dominated by the localisation loss:



    Model training summary



    I haven't had much luck googling this or looking in the Keras manual so wondered if anyone had any suggestions on how to set the loss weights for a case like this. So far, things have trained ok regardless of how I set them, but I'm curious if best practise it to weight them so they contribute equally (so perhaps 1 for the classification loss, and 0.001 for the localisation loss), or perhaps my weights should inform Keras about how much I care about minimising the loss. The localisation problem is harder than the classification problem so perhaps I should weight localisation more as it's harder, but then the two branches/losses don't actually share any layers (as the 5 blocks of VGG16 are frozen), so I'm still confused.



    Any input on this would be gratefully received.










    share|improve this question







    New contributor




    Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      0












      0








      0





      $begingroup$


      I'm currently trying to do transfer learning using VGG16 with imagenet weights in Keras. I've taken the model up to the 5th block, and then built a two headed network on top of this to both classify images (the class branch, with 4 classes), and localise the item of interest in the image (the loc branch, with 4 outputs). Here's a quick overview of the top of my network: Fully connected network



      I'm using categorical_crossentropy for the classification loss, and mean_squared_error for the localisation loss. However, these are of very different magnitudes (around 1, and around 1000, respectively), so I'm unsure of how to set my loss weights, and haven't been able to find much in the Keras manual on this. In this example, the weights are 0.05 for the classification loss, and 0.95 for the localisation loss, producing a combined loss that is dominated by the localisation loss:



      Model training summary



      I haven't had much luck googling this or looking in the Keras manual so wondered if anyone had any suggestions on how to set the loss weights for a case like this. So far, things have trained ok regardless of how I set them, but I'm curious if best practise it to weight them so they contribute equally (so perhaps 1 for the classification loss, and 0.001 for the localisation loss), or perhaps my weights should inform Keras about how much I care about minimising the loss. The localisation problem is harder than the classification problem so perhaps I should weight localisation more as it's harder, but then the two branches/losses don't actually share any layers (as the 5 blocks of VGG16 are frozen), so I'm still confused.



      Any input on this would be gratefully received.










      share|improve this question







      New contributor




      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I'm currently trying to do transfer learning using VGG16 with imagenet weights in Keras. I've taken the model up to the 5th block, and then built a two headed network on top of this to both classify images (the class branch, with 4 classes), and localise the item of interest in the image (the loc branch, with 4 outputs). Here's a quick overview of the top of my network: Fully connected network



      I'm using categorical_crossentropy for the classification loss, and mean_squared_error for the localisation loss. However, these are of very different magnitudes (around 1, and around 1000, respectively), so I'm unsure of how to set my loss weights, and haven't been able to find much in the Keras manual on this. In this example, the weights are 0.05 for the classification loss, and 0.95 for the localisation loss, producing a combined loss that is dominated by the localisation loss:



      Model training summary



      I haven't had much luck googling this or looking in the Keras manual so wondered if anyone had any suggestions on how to set the loss weights for a case like this. So far, things have trained ok regardless of how I set them, but I'm curious if best practise it to weight them so they contribute equally (so perhaps 1 for the classification loss, and 0.001 for the localisation loss), or perhaps my weights should inform Keras about how much I care about minimising the loss. The localisation problem is harder than the classification problem so perhaps I should weight localisation more as it's harder, but then the two branches/losses don't actually share any layers (as the 5 blocks of VGG16 are frozen), so I'm still confused.



      Any input on this would be gratefully received.







      keras loss-function






      share|improve this question







      New contributor




      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 25 at 10:53









      MatthewMatthew

      1




      1




      New contributor




      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Matthew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          0






          active

          oldest

          votes












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Matthew is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47936%2fhow-to-balance-keras-loss-functions-of-different-magnitudes%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Matthew is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          Matthew is a new contributor. Be nice, and check out our Code of Conduct.












          Matthew is a new contributor. Be nice, and check out our Code of Conduct.











          Matthew is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47936%2fhow-to-balance-keras-loss-functions-of-different-magnitudes%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Is flight data recorder erased after every flight?When are black boxes used?What protects the location beacon (pinger) of a flight data recorder?Is there anywhere I can pick up raw flight data recorder information?Who legally owns the Flight Data Recorder?Constructing flight recorder dataWhy are FDRs and CVRs still two separate physical devices?What are the data elements shown on the GE235 flight data recorder (FDR) plot?Are CVR and FDR reset after every flight?What is the format of data stored by a Flight Data Recorder?How much data is stored in the flight data recorder per hour in a typical flight of an A380?Is a smart flight data recorder possible?

          Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

          Is there a general name for the setup in which payoffs are not known exactly but players try to influence each other's perception of the payoffs?Osborne, Nash equilibria and the correctness of beliefsIs there a name for this family of games (Binomial games?)?Perfect Bayesian EquilibriumCalculating mixed strategy equilibrium in battle of sexesPure Strategy SPNEIs there a commitment mechanism which allows players to achieve pareto optimal solutions?Extensive Form GamesAn $n$-player prisoner's dilemma where a coalition of 2 players is better off defectingTit-For-Stat Strategy Best RepliesPotential solutions of the $n$-player Prisoner's Dilemma