Partial derviative of prediction (sigmoid applied) with respect to weightGradient Descent Step for word2vec negative samplingImplementing RMSProp, but finding differences between reference versionsWhy is vanishing gradient a problem?SGD learning gets stuck when using a max pooling layer (but it works fine with just conv + fc)Why are optimization algorithms slower at critical points?Why Gradient methods work in finding the parameters in Neural Networks?Is gradient descent slower for finite differences?Creating a convolutional layer with weight normalization?

How to terminate ping <dest> &

Optimising a list searching algorithm

In what cases must I use 了 and in what cases not?

How to generate binary array whose elements with values 1 are randomly drawn

I got the following comment from a reputed math journal. What does it mean?

Loading the leaflet Map in Lightning Web Component

Violin - Can double stops be played when the strings are not next to each other?

Should I use acronyms in dialogues before telling the readers what it stands for in fiction?

Suggestions on how to spend Shaabath (constructively) alone

What does "mu" mean as an interjection?

Describing a chess game in a novel

Relation between independence and correlation of uniform random variables

HP P840 HDD RAID 5 many strange drive failures

World War I as a war of liberals against authoritarians?

Existence of a celestial body big enough for early civilization to be thought of as a second moon

Are dual Irish/British citizens bound by the 90/180 day rule when travelling in the EU after Brexit?

Why are there no stars visible in cislunar space?

Does .bashrc contain syntax errors?

Probably overheated black color SMD pads

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

Geography in 3D perspective

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

Why didn't Héctor fade away after this character died in the movie Coco?

Do US professors/group leaders only get a salary, but no group budget?



Partial derviative of prediction (sigmoid applied) with respect to weight


Gradient Descent Step for word2vec negative samplingImplementing RMSProp, but finding differences between reference versionsWhy is vanishing gradient a problem?SGD learning gets stuck when using a max pooling layer (but it works fine with just conv + fc)Why are optimization algorithms slower at critical points?Why Gradient methods work in finding the parameters in Neural Networks?Is gradient descent slower for finite differences?Creating a convolutional layer with weight normalization?













1












$begingroup$


I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.



from Udacity gradient descent intro



The above is taking the derivative of a sigmoid so why isn't it just



$$=sigma(Wx+b)(1-sigma(Wx+b)$$
but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?










share|improve this question









New contributor




flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    1












    $begingroup$


    I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.



    from Udacity gradient descent intro



    The above is taking the derivative of a sigmoid so why isn't it just



    $$=sigma(Wx+b)(1-sigma(Wx+b)$$
    but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?










    share|improve this question









    New contributor




    flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      1












      1








      1





      $begingroup$


      I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.



      from Udacity gradient descent intro



      The above is taking the derivative of a sigmoid so why isn't it just



      $$=sigma(Wx+b)(1-sigma(Wx+b)$$
      but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?










      share|improve this question









      New contributor




      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I am very confused as to where a seemingly "extra" term is included in the above mentioned calculation in my Udacity course.



      from Udacity gradient descent intro



      The above is taking the derivative of a sigmoid so why isn't it just



      $$=sigma(Wx+b)(1-sigma(Wx+b)$$
      but rather has $fracpartialpartial w_j(Wx+b)$ tacked on the tail?







      gradient-descent






      share|improve this question









      New contributor




      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited yesterday







      flexitarian33













      New contributor




      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      flexitarian33flexitarian33

      256




      256




      New contributor




      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      flexitarian33 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$



          For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,



          hence that is why we have one more term.






          share|improve this answer









          $endgroup$












          • $begingroup$
            I can see now that it's a composite function since sigmoid itself is a function :)
            $endgroup$
            – flexitarian33
            yesterday










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47402%2fpartial-derviative-of-prediction-sigmoid-applied-with-respect-to-weight%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$



          For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,



          hence that is why we have one more term.






          share|improve this answer









          $endgroup$












          • $begingroup$
            I can see now that it's a composite function since sigmoid itself is a function :)
            $endgroup$
            – flexitarian33
            yesterday















          1












          $begingroup$

          Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$



          For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,



          hence that is why we have one more term.






          share|improve this answer









          $endgroup$












          • $begingroup$
            I can see now that it's a composite function since sigmoid itself is a function :)
            $endgroup$
            – flexitarian33
            yesterday













          1












          1








          1





          $begingroup$

          Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$



          For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,



          hence that is why we have one more term.






          share|improve this answer









          $endgroup$



          Recall that for chain rule, we have $$fracddwh(g(w))=h'(g(w))g'(w)$$



          For the context of your question, $h(t)=sigma(t)$ and $g(w)=Wx+b$,



          hence that is why we have one more term.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered yesterday









          Siong Thye GohSiong Thye Goh

          1,302418




          1,302418











          • $begingroup$
            I can see now that it's a composite function since sigmoid itself is a function :)
            $endgroup$
            – flexitarian33
            yesterday
















          • $begingroup$
            I can see now that it's a composite function since sigmoid itself is a function :)
            $endgroup$
            – flexitarian33
            yesterday















          $begingroup$
          I can see now that it's a composite function since sigmoid itself is a function :)
          $endgroup$
          – flexitarian33
          yesterday




          $begingroup$
          I can see now that it's a composite function since sigmoid itself is a function :)
          $endgroup$
          – flexitarian33
          yesterday










          flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.












          flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.











          flexitarian33 is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47402%2fpartial-derviative-of-prediction-sigmoid-applied-with-respect-to-weight%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

          Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?