Partial derviative of prediction (sigmoid applied) with respect to weightGradient Descent Step for word2vec negative samplingImplementing RMSProp, but finding differences between reference versionsWhy is vanishing gradient a problem?SGD learning gets stuck when using a max pooling layer (but it works fine with just conv + fc)Why are optimization algorithms slower at critical points?Why Gradient methods work in finding the parameters in Neural Networks?Is gradient descent slower for finite differences?Creating a convolutional layer with weight normalization?

How to terminate ping <dest> &

Optimising a list searching algorithm

In what cases must I use 了 and in what cases not?

How to generate binary array whose elements with values 1 are randomly drawn

I got the following comment from a reputed math journal. What does it mean?

Loading the leaflet Map in Lightning Web Component

Violin - Can double stops be played when the strings are not next to each other?

Should I use acronyms in dialogues before telling the readers what it stands for in fiction?

Suggestions on how to spend Shaabath (constructively) alone

What does "mu" mean as an interjection?

Describing a chess game in a novel

Relation between independence and correlation of uniform random variables

HP P840 HDD RAID 5 many strange drive failures

World War I as a war of liberals against authoritarians?

Existence of a celestial body big enough for early civilization to be thought of as a second moon

Are dual Irish/British citizens bound by the 90/180 day rule when travelling in the EU after Brexit?

Why are there no stars visible in cislunar space?

Does .bashrc contain syntax errors?

Probably overheated black color SMD pads

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

Geography in 3D perspective

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

Why didn't Héctor fade away after this character died in the movie Coco?

Do US professors/group leaders only get a salary, but no group budget?

Partial derviative of prediction (sigmoid applied) with respect to weight

Gradient Descent Step for word2vec negative samplingImplementing RMSProp, but finding differences between reference versionsWhy is vanishing gradient a problem?SGD learning gets stuck when using a max pooling layer (but it works fine with just conv + fc)Why are optimization algorithms slower at critical points?Why Gradient methods work in finding the parameters in Neural Networks?Is gradient descent slower for finite differences?Creating a convolutional layer with weight normalization?

I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.

from Udacity gradient descent intro

The above is taking the derivative of a sigmoid so why isn't it just

$$=sigma(Wx+b)(1-sigma(Wx+b)$$
but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?

edited yesterday

asked yesterday

flexitarian33

256

New contributor

add a comment |

I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.

from Udacity gradient descent intro

The above is taking the derivative of a sigmoid so why isn't it just

$$=sigma(Wx+b)(1-sigma(Wx+b)$$
but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?

edited yesterday

asked yesterday

flexitarian33

256

New contributor

add a comment |

I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.

from Udacity gradient descent intro

The above is taking the derivative of a sigmoid so why isn't it just

$$=sigma(Wx+b)(1-sigma(Wx+b)$$
but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?

edited yesterday

asked yesterday

flexitarian33

256

New contributor

I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.

from Udacity gradient descent intro

The above is taking the derivative of a sigmoid so why isn't it just

$$=sigma(Wx+b)(1-sigma(Wx+b)$$
but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?

gradient-descent

edited yesterday

asked yesterday

flexitarian33

256

New contributor

edited yesterday

asked yesterday

flexitarian33

256

New contributor

edited yesterday

asked yesterday

flexitarian33

256

New contributor

asked yesterday

flexitarian33

256

asked yesterday

flexitarian33

256

New contributor

flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$

For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,

hence that is why we have one more term.

answered yesterday

Siong Thye Goh

1,302418

$begingroup$
I can see now that it's a composite function since sigmoid itself is a function :)
$endgroup$
– flexitarian33
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47402%2fpartial-derviative-of-prediction-sigmoid-applied-with-respect-to-weight%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$

For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,

hence that is why we have one more term.

answered yesterday

Siong Thye Goh

1,302418

$begingroup$
I can see now that it's a composite function since sigmoid itself is a function :)
$endgroup$
– flexitarian33
yesterday

add a comment |

Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$

For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,

hence that is why we have one more term.

answered yesterday

Siong Thye Goh

1,302418

$begingroup$
I can see now that it's a composite function since sigmoid itself is a function :)
$endgroup$
– flexitarian33
yesterday

add a comment |

Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$

For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,

hence that is why we have one more term.

answered yesterday

Siong Thye Goh

1,302418

Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$

For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,

hence that is why we have one more term.

answered yesterday

Siong Thye Goh

1,302418

answered yesterday

Siong Thye Goh

1,302418

answered yesterday

Siong Thye Goh

1,302418

answered yesterday

Siong Thye Goh

1,302418

$begingroup$
I can see now that it's a composite function since sigmoid itself is a function :)
$endgroup$
– flexitarian33
yesterday

add a comment |

$begingroup$
I can see now that it's a composite function since sigmoid itself is a function :)
$endgroup$
– flexitarian33
yesterday

I can see now that it's a composite function since sigmoid itself is a function :)

– flexitarian33
yesterday

add a comment |

flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

q3wnYD53GdBYZ,CS ijhUFMRjQk 5sSGo9cY4,G0skmRGDWio5J f27nNzj7 cU

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1