CNN Back Propagation without Sigmoid Derivative The Next CEO of Stack Overflow2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)

Rotate a column

What benefits would be gained by using human laborers instead of drones in deep sea mining?

Is there a way to bypass a component in series in a circuit if that component fails?

How to get from Geneva Airport to Metabief?

What connection does MS Office have to Netscape Navigator?

The exact meaning of 'Mom made me a sandwich'

Do I need to write [sic] when a number is less than 10 but isn't written out?

is it ok to reduce charging current for li ion 18650 battery?

Why did CATV standarize in 75 ohms and everyone else in 50?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

How to scale a tikZ image which is within a figure environment

How to place nodes around a circle from some initial angle?

Combine columns from several files into one

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

WOW air has ceased operation, can I get my tickets refunded?

What was the first Unix version to run on a microcomputer?

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Won the lottery - how do I keep the money?

Arranging cats and dogs - what is wrong with my approach

What flight has the highest ratio of time difference to flight time?

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

What is the value of α and β in a triangle?

Does soap repel water?

CNN Back Propagation without Sigmoid Derivative

The Next CEO of Stack Overflow

2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)

I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:

if cnn.loss_func == 'cros' 
 if cnn.layerscnn.no_of_layers.act_func == 'soft'
 cnn.CalcLastLayerActDerivative = 0;
 elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
 cnn.CalcLastLayerActDerivative = 0;
 end 
end

My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use

When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?

Thanks!

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

add a comment |

if cnn.loss_func == 'cros' 
 if cnn.layerscnn.no_of_layers.act_func == 'soft'
 cnn.CalcLastLayerActDerivative = 0;
 elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
 cnn.CalcLastLayerActDerivative = 0;
 end 
end

My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use

Thanks!

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

add a comment |

if cnn.loss_func == 'cros' 
 if cnn.layerscnn.no_of_layers.act_func == 'soft'
 cnn.CalcLastLayerActDerivative = 0;
 elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
 cnn.CalcLastLayerActDerivative = 0;
 end 
end

My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use

Thanks!

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

if cnn.loss_func == 'cros' 
 if cnn.layerscnn.no_of_layers.act_func == 'soft'
 cnn.CalcLastLayerActDerivative = 0;
 elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
 cnn.CalcLastLayerActDerivative = 0;
 end 
end

My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use

Thanks!

cnn backpropagation

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

edited Mar 24 at 1:40

Siong Thye Goh

1,383520

asked Mar 24 at 0:50

Sylvia

161

New contributor

asked Mar 24 at 0:50

Sylvia

161

asked Mar 24 at 0:50

Sylvia

161

New contributor

Sylvia is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

error is defined just as (outputs - target)

This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.

For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0 to be checked later in an if statement in bpcnn.m file as follows (comments don't exist in the original code):

...
else
 % error = (f(x) - y)
 er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1 
 % change the error from (f(x) - y) to f'(x)(f(x) - y)
 er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end

which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).

As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).

error('cross entropy is implemented only when last layer is sigmoid');

EDIT

Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

add a comment |

Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.

However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x)) is not provided in bpcnn.m. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x)) is provided but sent to er1 only for plotting the losses:

> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
> 
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
> 
> end

Thus, could you provide more detailed answer regarding to this?

Thanks to @Esmailian! All the questions I had are now resolved.

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Sylvia is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47870%2fcnn-back-propagation-without-sigmoid-derivative%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

error is defined just as (outputs - target)

This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.

...
else
 % error = (f(x) - y)
 er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1 
 % change the error from (f(x) - y) to f'(x)(f(x) - y)
 er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end

which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).

As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).

error('cross entropy is implemented only when last layer is sigmoid');

EDIT

Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

add a comment |

error is defined just as (outputs - target)

This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.

...
else
 % error = (f(x) - y)
 er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1 
 % change the error from (f(x) - y) to f'(x)(f(x) - y)
 er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end

which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).

As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).

error('cross entropy is implemented only when last layer is sigmoid');

EDIT

Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

add a comment |

error is defined just as (outputs - target)

This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.

...
else
 % error = (f(x) - y)
 er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1 
 % change the error from (f(x) - y) to f'(x)(f(x) - y)
 er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end

which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).

As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).

error('cross entropy is implemented only when last layer is sigmoid');

EDIT

Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

error is defined just as (outputs - target)

This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.

...
else
 % error = (f(x) - y)
 er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1 
 % change the error from (f(x) - y) to f'(x)(f(x) - y)
 er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end

which means gradient is $(y-f(x))f'(x)$ for quad and $(y-f(x))$ for cros (bad variable name!).

As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).

error('cross entropy is implemented only when last layer is sigmoid');

EDIT

Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

edited Mar 25 at 13:04

answered Mar 24 at 6:20

Esmailian

2,212218

answered Mar 24 at 6:20

Esmailian

2,212218

answered Mar 24 at 6:20

Esmailian

2,212218

add a comment |

Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.

> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
> 
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
> 
> end

Thus, could you provide more detailed answer regarding to this?

Thanks to @Esmailian! All the questions I had are now resolved.

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

add a comment |

Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.

> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
> 
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
> 
> end

Thus, could you provide more detailed answer regarding to this?

Thanks to @Esmailian! All the questions I had are now resolved.

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

add a comment |

Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.

> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
> 
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
> 
> end

Thus, could you provide more detailed answer regarding to this?

Thanks to @Esmailian! All the questions I had are now resolved.

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1.

> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
> 
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
> 
> end

Thus, could you provide more detailed answer regarding to this?

Thanks to @Esmailian! All the questions I had are now resolved.

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

edited Mar 25 at 18:11

answered Mar 25 at 2:02

Edison

114

New contributor

answered Mar 25 at 2:02

Edison

114

answered Mar 25 at 2:02

Edison

114

New contributor

Edison is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

Sylvia is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sylvia is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

9V4peO1yGmmEEQiUuBeFnwbG7vr,eQIdfY,F 2l,V SGbb

搜尋此網誌

Trjtdtk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2