CNN Back Propagation without Sigmoid Derivative The Next CEO of Stack Overflow2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)

Rotate a column

What benefits would be gained by using human laborers instead of drones in deep sea mining?

Is there a way to bypass a component in series in a circuit if that component fails?

How to get from Geneva Airport to Metabief?

What connection does MS Office have to Netscape Navigator?

The exact meaning of 'Mom made me a sandwich'

Do I need to write [sic] when a number is less than 10 but isn't written out?

is it ok to reduce charging current for li ion 18650 battery?

Why did CATV standarize in 75 ohms and everyone else in 50?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

How to scale a tikZ image which is within a figure environment

How to place nodes around a circle from some initial angle?

Combine columns from several files into one

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

WOW air has ceased operation, can I get my tickets refunded?

What was the first Unix version to run on a microcomputer?

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Won the lottery - how do I keep the money?

Arranging cats and dogs - what is wrong with my approach

What flight has the highest ratio of time difference to flight time?

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

What is the value of α and β in a triangle?

Does soap repel water?



CNN Back Propagation without Sigmoid Derivative



The Next CEO of Stack Overflow
2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)










3












$begingroup$


I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:



if cnn.loss_func == 'cros' 
if cnn.layerscnn.no_of_layers.act_func == 'soft'
cnn.CalcLastLayerActDerivative = 0;
elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
cnn.CalcLastLayerActDerivative = 0;
end
end


My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use



When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?



Thanks!










share|improve this question









New contributor




Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    3












    $begingroup$


    I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:



    if cnn.loss_func == 'cros' 
    if cnn.layerscnn.no_of_layers.act_func == 'soft'
    cnn.CalcLastLayerActDerivative = 0;
    elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
    cnn.CalcLastLayerActDerivative = 0;
    end
    end


    My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use



    When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?



    Thanks!










    share|improve this question









    New contributor




    Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      3












      3








      3


      1



      $begingroup$


      I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:



      if cnn.loss_func == 'cros' 
      if cnn.layerscnn.no_of_layers.act_func == 'soft'
      cnn.CalcLastLayerActDerivative = 0;
      elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
      cnn.CalcLastLayerActDerivative = 0;
      end
      end


      My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use



      When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?



      Thanks!










      share|improve this question









      New contributor




      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:



      if cnn.loss_func == 'cros' 
      if cnn.layerscnn.no_of_layers.act_func == 'soft'
      cnn.CalcLastLayerActDerivative = 0;
      elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
      cnn.CalcLastLayerActDerivative = 0;
      end
      end


      My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use



      When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?



      Thanks!







      cnn backpropagation






      share|improve this question









      New contributor




      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Mar 24 at 1:40









      Siong Thye Goh

      1,383520




      1,383520






      New contributor




      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 24 at 0:50









      SylviaSylvia

      161




      161




      New contributor




      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$


          error is defined just as (outputs - target)




          This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.



          For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
          since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
          $$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
          To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):



          ...
          else
          % error = (f(x) - y)
          er = ( cnn.layerscnn.no_of_layers.outputs - yy);
          ...
          if cnn.CalcLastLayerActDerivative ==1
          % change the error from (f(x) - y) to f'(x)(f(x) - y)
          er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
          end


          which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).



          As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).



          error('cross entropy is implemented only when last layer is sigmoid');


          EDIT



          Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.






          share|improve this answer











          $endgroup$




















            1












            $begingroup$

            Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.



            However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:



            > if cnn.loss_func == 'cros' %cross_entropy'
            > if cnn.layerscnn.no_of_layers.act_func == 'sigm'
            > er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
            > else
            > ...
            > end
            > cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
            >
            > else
            > er1 = er.^2;
            > cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
            >
            > end


            Thus, could you provide more detailed answer regarding to this?




            Thanks to @Esmailian! All the questions I had are now resolved.







            share|improve this answer










            New contributor




            Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "557"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );






              Sylvia is a new contributor. Be nice, and check out our Code of Conduct.









              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47870%2fcnn-back-propagation-without-sigmoid-derivative%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2












              $begingroup$


              error is defined just as (outputs - target)




              This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.



              For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
              since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
              $$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
              To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):



              ...
              else
              % error = (f(x) - y)
              er = ( cnn.layerscnn.no_of_layers.outputs - yy);
              ...
              if cnn.CalcLastLayerActDerivative ==1
              % change the error from (f(x) - y) to f'(x)(f(x) - y)
              er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
              end


              which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).



              As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).



              error('cross entropy is implemented only when last layer is sigmoid');


              EDIT



              Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.






              share|improve this answer











              $endgroup$

















                2












                $begingroup$


                error is defined just as (outputs - target)




                This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.



                For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
                since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
                $$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
                To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):



                ...
                else
                % error = (f(x) - y)
                er = ( cnn.layerscnn.no_of_layers.outputs - yy);
                ...
                if cnn.CalcLastLayerActDerivative ==1
                % change the error from (f(x) - y) to f'(x)(f(x) - y)
                er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
                end


                which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).



                As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).



                error('cross entropy is implemented only when last layer is sigmoid');


                EDIT



                Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.






                share|improve this answer











                $endgroup$















                  2












                  2








                  2





                  $begingroup$


                  error is defined just as (outputs - target)




                  This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.



                  For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
                  since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
                  $$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
                  To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):



                  ...
                  else
                  % error = (f(x) - y)
                  er = ( cnn.layerscnn.no_of_layers.outputs - yy);
                  ...
                  if cnn.CalcLastLayerActDerivative ==1
                  % change the error from (f(x) - y) to f'(x)(f(x) - y)
                  er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
                  end


                  which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).



                  As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).



                  error('cross entropy is implemented only when last layer is sigmoid');


                  EDIT



                  Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.






                  share|improve this answer











                  $endgroup$




                  error is defined just as (outputs - target)




                  This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.



                  For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
                  since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
                  $$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
                  To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):



                  ...
                  else
                  % error = (f(x) - y)
                  er = ( cnn.layerscnn.no_of_layers.outputs - yy);
                  ...
                  if cnn.CalcLastLayerActDerivative ==1
                  % change the error from (f(x) - y) to f'(x)(f(x) - y)
                  er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
                  end


                  which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).



                  As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).



                  error('cross entropy is implemented only when last layer is sigmoid');


                  EDIT



                  Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 25 at 13:04

























                  answered Mar 24 at 6:20









                  EsmailianEsmailian

                  2,212218




                  2,212218





















                      1












                      $begingroup$

                      Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.



                      However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:



                      > if cnn.loss_func == 'cros' %cross_entropy'
                      > if cnn.layerscnn.no_of_layers.act_func == 'sigm'
                      > er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
                      > else
                      > ...
                      > end
                      > cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
                      >
                      > else
                      > er1 = er.^2;
                      > cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
                      >
                      > end


                      Thus, could you provide more detailed answer regarding to this?




                      Thanks to @Esmailian! All the questions I had are now resolved.







                      share|improve this answer










                      New contributor




                      Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$

















                        1












                        $begingroup$

                        Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.



                        However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:



                        > if cnn.loss_func == 'cros' %cross_entropy'
                        > if cnn.layerscnn.no_of_layers.act_func == 'sigm'
                        > er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
                        > else
                        > ...
                        > end
                        > cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
                        >
                        > else
                        > er1 = er.^2;
                        > cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
                        >
                        > end


                        Thus, could you provide more detailed answer regarding to this?




                        Thanks to @Esmailian! All the questions I had are now resolved.







                        share|improve this answer










                        New contributor




                        Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$















                          1












                          1








                          1





                          $begingroup$

                          Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.



                          However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:



                          > if cnn.loss_func == 'cros' %cross_entropy'
                          > if cnn.layerscnn.no_of_layers.act_func == 'sigm'
                          > er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
                          > else
                          > ...
                          > end
                          > cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
                          >
                          > else
                          > er1 = er.^2;
                          > cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
                          >
                          > end


                          Thus, could you provide more detailed answer regarding to this?




                          Thanks to @Esmailian! All the questions I had are now resolved.







                          share|improve this answer










                          New contributor




                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$



                          Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.



                          However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:



                          > if cnn.loss_func == 'cros' %cross_entropy'
                          > if cnn.layerscnn.no_of_layers.act_func == 'sigm'
                          > er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
                          > else
                          > ...
                          > end
                          > cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
                          >
                          > else
                          > er1 = er.^2;
                          > cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
                          >
                          > end


                          Thus, could you provide more detailed answer regarding to this?




                          Thanks to @Esmailian! All the questions I had are now resolved.








                          share|improve this answer










                          New contributor




                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.









                          share|improve this answer



                          share|improve this answer








                          edited Mar 25 at 18:11





















                          New contributor




                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.









                          answered Mar 25 at 2:02









                          EdisonEdison

                          114




                          114




                          New contributor




                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.





                          New contributor





                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.




















                              Sylvia is a new contributor. Be nice, and check out our Code of Conduct.









                              draft saved

                              draft discarded


















                              Sylvia is a new contributor. Be nice, and check out our Code of Conduct.












                              Sylvia is a new contributor. Be nice, and check out our Code of Conduct.











                              Sylvia is a new contributor. Be nice, and check out our Code of Conduct.














                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47870%2fcnn-back-propagation-without-sigmoid-derivative%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                              Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                              Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High