CNN Back Propagation without Sigmoid Derivative The Next CEO of Stack Overflow2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)
Rotate a column
What benefits would be gained by using human laborers instead of drones in deep sea mining?
Is there a way to bypass a component in series in a circuit if that component fails?
How to get from Geneva Airport to Metabief?
What connection does MS Office have to Netscape Navigator?
The exact meaning of 'Mom made me a sandwich'
Do I need to write [sic] when a number is less than 10 but isn't written out?
is it ok to reduce charging current for li ion 18650 battery?
Why did CATV standarize in 75 ohms and everyone else in 50?
Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?
How to scale a tikZ image which is within a figure environment
How to place nodes around a circle from some initial angle?
Combine columns from several files into one
Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?
WOW air has ceased operation, can I get my tickets refunded?
What was the first Unix version to run on a microcomputer?
I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin
Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis
Won the lottery - how do I keep the money?
Arranging cats and dogs - what is wrong with my approach
What flight has the highest ratio of time difference to flight time?
If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?
What is the value of α and β in a triangle?
Does soap repel water?
CNN Back Propagation without Sigmoid Derivative
The Next CEO of Stack Overflow2019 Community Moderator ElectionBack-propagation through max pooling layersSteps for back propagation of convolutional layer in CNNBasic backpropagation questionDeriving backpropagation equations “natively” in tensor formBack Propagation Using MATLABback propagation in CNNA good reference for the back propagation algorithm?Should there be 'total derivative' symbol in the mathematical representation of back-propagation algorithm's formula?Could someone explain to me how back-prop is done for the generator in a GAN?Questions about Neural Network training (back propagation) in the book PRML (Pattern Recognition and Machine Learning)
$begingroup$
I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:
if cnn.loss_func == 'cros'
if cnn.layerscnn.no_of_layers.act_func == 'soft'
cnn.CalcLastLayerActDerivative = 0;
elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
cnn.CalcLastLayerActDerivative = 0;
end
end
My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use
When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?
Thanks!
cnn backpropagation
New contributor
$endgroup$
add a comment |
$begingroup$
I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:
if cnn.loss_func == 'cros'
if cnn.layerscnn.no_of_layers.act_func == 'soft'
cnn.CalcLastLayerActDerivative = 0;
elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
cnn.CalcLastLayerActDerivative = 0;
end
end
My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use
When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?
Thanks!
cnn backpropagation
New contributor
$endgroup$
add a comment |
$begingroup$
I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:
if cnn.loss_func == 'cros'
if cnn.layerscnn.no_of_layers.act_func == 'soft'
cnn.CalcLastLayerActDerivative = 0;
elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
cnn.CalcLastLayerActDerivative = 0;
end
end
My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use
When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?
Thanks!
cnn backpropagation
New contributor
$endgroup$
I'm new to CNN and trying to study some MATLAB sample codes (cause I need to know the internal calculation). I recently realized that the sample code I'm using doesn't multiply error by sigmoid's derivative in back propagation. The feed forward process has sigmoid as last layer's activation function so from my understanding, back propagation error = (outputs - target) * sigmoid's derivative(outputs). However, the author intentionally disabled this multiplication with the following code:
if cnn.loss_func == 'cros'
if cnn.layerscnn.no_of_layers.act_func == 'soft'
cnn.CalcLastLayerActDerivative = 0;
elseif cnn.layerscnn.no_of_layers.act_func == 'sigm'
cnn.CalcLastLayerActDerivative = 0;
end
end
My reference code: https://www.mathworks.com/matlabcentral/fileexchange/59223-convolution-neural-network-simple-code-simple-to-use
When cnn.CalcLastLayerActDerivative = 0, error is defined just as (outputs - target). I tried to initialize cnn.CalcLastLayerActDerivative = 1 so that sigmoid's derivative is considered in back propagation but then I got worse error rate. I'm not sure whether it's just because sigmoid's derivative is in the range [0,0.25] or I'm not understanding back propagation correctly. Does anyone know why this is happening and whether I should add sigmoid's derivative in my calculation?
Thanks!
cnn backpropagation
cnn backpropagation
New contributor
New contributor
edited Mar 24 at 1:40
Siong Thye Goh
1,383520
1,383520
New contributor
asked Mar 24 at 0:50
SylviaSylvia
161
161
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
error is defined just as (outputs - target)
This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.
For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0
to be checked later in an if
statement in bpcnn.m
file as follows (comments don't exist in the original code):
...
else
% error = (f(x) - y)
er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1
% change the error from (f(x) - y) to f'(x)(f(x) - y)
er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end
which means gradient is $(y-f(x))f'(x)$ for quad
and $(y-f(x))$ for cros
(bad variable name!).
As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).
error('cross entropy is implemented only when last layer is sigmoid');
EDIT
Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.
$endgroup$
add a comment |
$begingroup$
Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1
.
However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x))
is not provided in bpcnn.m
. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x))
is provided but sent to er1
only for plotting the losses:
> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
>
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
>
> end
Thus, could you provide more detailed answer regarding to this?
Thanks to @Esmailian! All the questions I had are now resolved.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sylvia is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47870%2fcnn-back-propagation-without-sigmoid-derivative%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
error is defined just as (outputs - target)
This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.
For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0
to be checked later in an if
statement in bpcnn.m
file as follows (comments don't exist in the original code):
...
else
% error = (f(x) - y)
er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1
% change the error from (f(x) - y) to f'(x)(f(x) - y)
er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end
which means gradient is $(y-f(x))f'(x)$ for quad
and $(y-f(x))$ for cros
(bad variable name!).
As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).
error('cross entropy is implemented only when last layer is sigmoid');
EDIT
Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.
$endgroup$
add a comment |
$begingroup$
error is defined just as (outputs - target)
This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.
For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0
to be checked later in an if
statement in bpcnn.m
file as follows (comments don't exist in the original code):
...
else
% error = (f(x) - y)
er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1
% change the error from (f(x) - y) to f'(x)(f(x) - y)
er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end
which means gradient is $(y-f(x))f'(x)$ for quad
and $(y-f(x))$ for cros
(bad variable name!).
As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).
error('cross entropy is implemented only when last layer is sigmoid');
EDIT
Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.
$endgroup$
add a comment |
$begingroup$
error is defined just as (outputs - target)
This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.
For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0
to be checked later in an if
statement in bpcnn.m
file as follows (comments don't exist in the original code):
...
else
% error = (f(x) - y)
er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1
% change the error from (f(x) - y) to f'(x)(f(x) - y)
er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end
which means gradient is $(y-f(x))f'(x)$ for quad
and $(y-f(x))$ for cros
(bad variable name!).
As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).
error('cross entropy is implemented only when last layer is sigmoid');
EDIT
Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.
$endgroup$
error is defined just as (outputs - target)
This is the correct gradient for cross-entropy loss function with Sigmoid as the last layer.
For squared (quadratic) loss $$(y-f(x))^2,$$ the gradient is, as you said, $$(y-f(x))f'(x)$$ (constant $2$ is removed), but for binary cross-entropy loss $$ytextlogf(x) + (1-y)textlog(1-f(x)),$$the gradient is $$yf'(x)/f(x) - (1-y)f'(x)/(1-f(x)),$$
since for Sigmoid we have $f'(x)=f(x)(1-f(x))$, by substitution the gradient becomes
$$y(1-f(x)) - (1-y)f(x)=y-f(x)$$
To distinguish between these two gradients, author sets cnn.CalcLastLayerActDerivative = 0
to be checked later in an if
statement in bpcnn.m
file as follows (comments don't exist in the original code):
...
else
% error = (f(x) - y)
er = ( cnn.layerscnn.no_of_layers.outputs - yy);
...
if cnn.CalcLastLayerActDerivative ==1
% change the error from (f(x) - y) to f'(x)(f(x) - y)
er =applyactfunccnn(cnn.layerscnn.no_of_layers.outputs,cnn.layerscnn.no_of_layers.act_func, 1, er);
end
which means gradient is $(y-f(x))f'(x)$ for quad
and $(y-f(x))$ for cros
(bad variable name!).
As a side note, author only allows Sigmoid for cross entropy which means only binary classifier is supported (multi-class classifier requires SoftMax).
error('cross entropy is implemented only when last layer is sigmoid');
EDIT
Thanks to @Edison for pointing out that error and gradient were not handled the same as loss values in the code, which substantially changed the final answer.
edited Mar 25 at 13:04
answered Mar 24 at 6:20
EsmailianEsmailian
2,212218
2,212218
add a comment |
add a comment |
$begingroup$
Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1
.
However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x))
is not provided in bpcnn.m
. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x))
is provided but sent to er1
only for plotting the losses:
> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
>
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
>
> end
Thus, could you provide more detailed answer regarding to this?
Thanks to @Esmailian! All the questions I had are now resolved.
New contributor
$endgroup$
add a comment |
$begingroup$
Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1
.
However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x))
is not provided in bpcnn.m
. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x))
is provided but sent to er1
only for plotting the losses:
> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
>
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
>
> end
Thus, could you provide more detailed answer regarding to this?
Thanks to @Esmailian! All the questions I had are now resolved.
New contributor
$endgroup$
add a comment |
$begingroup$
Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1
.
However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x))
is not provided in bpcnn.m
. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x))
is provided but sent to er1
only for plotting the losses:
> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
>
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
>
> end
Thus, could you provide more detailed answer regarding to this?
Thanks to @Esmailian! All the questions I had are now resolved.
New contributor
$endgroup$
Thank you(Esmailian) so much for your answer. I agree with you that the author distinguished the two losses by the setting cnn.CalcLastLayerActDerivative=0/1
.
However, in the original codes, the calculation of gradient for corss-entropy: yf′(x)/f(x)−(1−y)f′(x)/(1−f(x))
is not provided in bpcnn.m
. Only the corss-entropy error ylogf(x)+(1−y)log(1−f(x))
is provided but sent to er1
only for plotting the losses:
> if cnn.loss_func == 'cros' %cross_entropy'
> if cnn.layerscnn.no_of_layers.act_func == 'sigm'
> er1 = -1.*sum((yy.*log(cnn.layerscnn.no_of_layers.outputs) + (1-yy).*log(1-cnn.layerscnn.no_of_layers.outputs)), 1);
> else
> ...
> end
> cnn.loss = sum(er1(:))/size(er1,2); %loss over all examples
>
> else
> er1 = er.^2;
> cnn.loss = sum(er1(:))/(2*size(er1,2)); %loss over all examples
>
> end
Thus, could you provide more detailed answer regarding to this?
Thanks to @Esmailian! All the questions I had are now resolved.
New contributor
edited Mar 25 at 18:11
New contributor
answered Mar 25 at 2:02
EdisonEdison
114
114
New contributor
New contributor
add a comment |
add a comment |
Sylvia is a new contributor. Be nice, and check out our Code of Conduct.
Sylvia is a new contributor. Be nice, and check out our Code of Conduct.
Sylvia is a new contributor. Be nice, and check out our Code of Conduct.
Sylvia is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47870%2fcnn-back-propagation-without-sigmoid-derivative%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown