An ambiguity in SVM equations about misclassified data The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow to apply AdaBoost to more “complex” (non-binary) classifications/data fitting?Does importance of SVM parameters vary for subsample of data?Understanding the math behind SVMDeriving backpropagation equations “natively” in tensor formAn formula derivation question about SMO algorithm of SVMSVM on sparse dataOne Class SVM for time series dataSVDD vs once Class SVMDoubt with SVM mathImplementing SVM from scratch?

Working through the single responsibility principle (SRP) in Python when calls are expensive

Does the AirPods case need to be around while listening via an iOS Device?

Segmentation fault output is suppressed when piping stdin into a function. Why?

Relations between two reciprocal partial derivatives?

A pet rabbit called Belle

Why can't wing-mounted spoilers be used to steepen approaches?

"... to apply for a visa" or "... and applied for a visa"?

Windows 10: How to Lock (not sleep) laptop on lid close?

He got a vote 80% that of Emmanuel Macron’s

Was credit for the black hole image misattributed?

University's motivation for having tenure-track positions

Can the prologue be the backstory of your main character?

Derivation tree not rendering

Would it be possible to rearrange a dragon's flight muscle to somewhat circumvent the square-cube law?

Python - Fishing Simulator

Why is the object placed in the middle of the sentence here?

Can a 1st-level character have an ability score above 18?

Road tyres vs "Street" tyres for charity ride on MTB Tandem

Match Roman Numerals

Do working physicists consider Newtonian mechanics to be "falsified"?

Finding degree of a finite field extension

Is this wall load bearing? Blueprints and photos attached

Why did all the guest students take carriages to the Yule Ball?

When did F become S in typeography, and why?



An ambiguity in SVM equations about misclassified data



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow to apply AdaBoost to more “complex” (non-binary) classifications/data fitting?Does importance of SVM parameters vary for subsample of data?Understanding the math behind SVMDeriving backpropagation equations “natively” in tensor formAn formula derivation question about SMO algorithm of SVMSVM on sparse dataOne Class SVM for time series dataSVDD vs once Class SVMDoubt with SVM mathImplementing SVM from scratch?










3












$begingroup$


I have encountered an ambiguity in SVM equations.
As is stated in Chris Bishop's machine learning book, the optimization goal in SVM is to maximize this function:



$$Csumlimits_n = 1^N xi _n + 1 over 2left$$



Subject to this constraints(*):



$$xi _n ge 0$$
$$t_ny(x_n) ge 1 - xi _n$$



where:



$$y(x_n) = w^Tx_n + b$$



so the corresponding Lagrangian function for this problem is:



$$L(w,b,a) = Csumlimits_n = 1^N xi _n + 1 over 2left - sumlimits_n = 1^N a_n t_ny(x_n) - 1 + xi _n - sumlimits_n = 1^N mu _nxi _n $$



and the corresponding KKT conditions are given by (**):



$$a _n ge 0$$



$$t_ny(x_n) - 1 + xi _n ge 0$$



$$a_n(t_ny(x_n) - 1 + xi _n) = 0$$



$$xi _n ge 0$$



$$mu _n ge 0$$



$$mu _nxi _n = 0$$



And if we set



$$partial L over partial xi _n = 0$$



we get (***)



$$a_n = C - mu _n$$



As we know, that subset of data points that have



$$a_n = 0$$



are not support vectors. But for this data points we have (from ***):



$$mu_n = C$$



and therefore (from **)



$$xi _n = 0$$



So here lies the problem. If a data point from this subset is in the wrong side of the decision boundary, then



$$t_ny(x_n) le 0$$



and we will have (from *)
$$xi _n ge 1$$



which is in an obvious conflict with



$$xi _n = 0$$










share|improve this question











$endgroup$
















    3












    $begingroup$


    I have encountered an ambiguity in SVM equations.
    As is stated in Chris Bishop's machine learning book, the optimization goal in SVM is to maximize this function:



    $$Csumlimits_n = 1^N xi _n + 1 over 2left$$



    Subject to this constraints(*):



    $$xi _n ge 0$$
    $$t_ny(x_n) ge 1 - xi _n$$



    where:



    $$y(x_n) = w^Tx_n + b$$



    so the corresponding Lagrangian function for this problem is:



    $$L(w,b,a) = Csumlimits_n = 1^N xi _n + 1 over 2left - sumlimits_n = 1^N a_n t_ny(x_n) - 1 + xi _n - sumlimits_n = 1^N mu _nxi _n $$



    and the corresponding KKT conditions are given by (**):



    $$a _n ge 0$$



    $$t_ny(x_n) - 1 + xi _n ge 0$$



    $$a_n(t_ny(x_n) - 1 + xi _n) = 0$$



    $$xi _n ge 0$$



    $$mu _n ge 0$$



    $$mu _nxi _n = 0$$



    And if we set



    $$partial L over partial xi _n = 0$$



    we get (***)



    $$a_n = C - mu _n$$



    As we know, that subset of data points that have



    $$a_n = 0$$



    are not support vectors. But for this data points we have (from ***):



    $$mu_n = C$$



    and therefore (from **)



    $$xi _n = 0$$



    So here lies the problem. If a data point from this subset is in the wrong side of the decision boundary, then



    $$t_ny(x_n) le 0$$



    and we will have (from *)
    $$xi _n ge 1$$



    which is in an obvious conflict with



    $$xi _n = 0$$










    share|improve this question











    $endgroup$














      3












      3








      3





      $begingroup$


      I have encountered an ambiguity in SVM equations.
      As is stated in Chris Bishop's machine learning book, the optimization goal in SVM is to maximize this function:



      $$Csumlimits_n = 1^N xi _n + 1 over 2left$$



      Subject to this constraints(*):



      $$xi _n ge 0$$
      $$t_ny(x_n) ge 1 - xi _n$$



      where:



      $$y(x_n) = w^Tx_n + b$$



      so the corresponding Lagrangian function for this problem is:



      $$L(w,b,a) = Csumlimits_n = 1^N xi _n + 1 over 2left - sumlimits_n = 1^N a_n t_ny(x_n) - 1 + xi _n - sumlimits_n = 1^N mu _nxi _n $$



      and the corresponding KKT conditions are given by (**):



      $$a _n ge 0$$



      $$t_ny(x_n) - 1 + xi _n ge 0$$



      $$a_n(t_ny(x_n) - 1 + xi _n) = 0$$



      $$xi _n ge 0$$



      $$mu _n ge 0$$



      $$mu _nxi _n = 0$$



      And if we set



      $$partial L over partial xi _n = 0$$



      we get (***)



      $$a_n = C - mu _n$$



      As we know, that subset of data points that have



      $$a_n = 0$$



      are not support vectors. But for this data points we have (from ***):



      $$mu_n = C$$



      and therefore (from **)



      $$xi _n = 0$$



      So here lies the problem. If a data point from this subset is in the wrong side of the decision boundary, then



      $$t_ny(x_n) le 0$$



      and we will have (from *)
      $$xi _n ge 1$$



      which is in an obvious conflict with



      $$xi _n = 0$$










      share|improve this question











      $endgroup$




      I have encountered an ambiguity in SVM equations.
      As is stated in Chris Bishop's machine learning book, the optimization goal in SVM is to maximize this function:



      $$Csumlimits_n = 1^N xi _n + 1 over 2left$$



      Subject to this constraints(*):



      $$xi _n ge 0$$
      $$t_ny(x_n) ge 1 - xi _n$$



      where:



      $$y(x_n) = w^Tx_n + b$$



      so the corresponding Lagrangian function for this problem is:



      $$L(w,b,a) = Csumlimits_n = 1^N xi _n + 1 over 2left - sumlimits_n = 1^N a_n t_ny(x_n) - 1 + xi _n - sumlimits_n = 1^N mu _nxi _n $$



      and the corresponding KKT conditions are given by (**):



      $$a _n ge 0$$



      $$t_ny(x_n) - 1 + xi _n ge 0$$



      $$a_n(t_ny(x_n) - 1 + xi _n) = 0$$



      $$xi _n ge 0$$



      $$mu _n ge 0$$



      $$mu _nxi _n = 0$$



      And if we set



      $$partial L over partial xi _n = 0$$



      we get (***)



      $$a_n = C - mu _n$$



      As we know, that subset of data points that have



      $$a_n = 0$$



      are not support vectors. But for this data points we have (from ***):



      $$mu_n = C$$



      and therefore (from **)



      $$xi _n = 0$$



      So here lies the problem. If a data point from this subset is in the wrong side of the decision boundary, then



      $$t_ny(x_n) le 0$$



      and we will have (from *)
      $$xi _n ge 1$$



      which is in an obvious conflict with



      $$xi _n = 0$$







      machine-learning svm theory






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 1 at 9:12









      Esmailian

      3,191320




      3,191320










      asked Mar 31 at 20:08









      pythinkerpythinker

      8291213




      8291213




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Good point! Interesting consequence!



          Problem is the $a_n=0$ assumption, i.e. assuming misclassified points are not support vectors.



          Here is the flow. Slack variable $xi_n$ is defined as
          $$xi_n := |t_n - y(boldsymbolx_n)|$$
          where $t_n in +1, -1$ is the true label, and $y(boldsymbolx_n)$ is the prediction. Therefore, for a misclassified point (on the wrong side) we have $$xi_n > 1$$ by definition. Given $mu_n xi_n = 0$, therefore$$mu_n=0$$
          and given $a_n=C - mu_n$, therefore $$a_n = C > 0$$ which means (given $a_n > 0$ only for support vectors)




          Every misclassified point is a support vector.




          This is a nice consequence and should have been stated in the book.



          Although, a remotely! related point has been stated in the book:




          Points with $a_n = C$ can lie inside the margin and can either be
          correctly classified if $xi_n leq 1$ or misclassified if $xi_n > 1$.







          share|improve this answer











          $endgroup$













            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48312%2fan-ambiguity-in-svm-equations-about-misclassified-data%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Good point! Interesting consequence!



            Problem is the $a_n=0$ assumption, i.e. assuming misclassified points are not support vectors.



            Here is the flow. Slack variable $xi_n$ is defined as
            $$xi_n := |t_n - y(boldsymbolx_n)|$$
            where $t_n in +1, -1$ is the true label, and $y(boldsymbolx_n)$ is the prediction. Therefore, for a misclassified point (on the wrong side) we have $$xi_n > 1$$ by definition. Given $mu_n xi_n = 0$, therefore$$mu_n=0$$
            and given $a_n=C - mu_n$, therefore $$a_n = C > 0$$ which means (given $a_n > 0$ only for support vectors)




            Every misclassified point is a support vector.




            This is a nice consequence and should have been stated in the book.



            Although, a remotely! related point has been stated in the book:




            Points with $a_n = C$ can lie inside the margin and can either be
            correctly classified if $xi_n leq 1$ or misclassified if $xi_n > 1$.







            share|improve this answer











            $endgroup$

















              1












              $begingroup$

              Good point! Interesting consequence!



              Problem is the $a_n=0$ assumption, i.e. assuming misclassified points are not support vectors.



              Here is the flow. Slack variable $xi_n$ is defined as
              $$xi_n := |t_n - y(boldsymbolx_n)|$$
              where $t_n in +1, -1$ is the true label, and $y(boldsymbolx_n)$ is the prediction. Therefore, for a misclassified point (on the wrong side) we have $$xi_n > 1$$ by definition. Given $mu_n xi_n = 0$, therefore$$mu_n=0$$
              and given $a_n=C - mu_n$, therefore $$a_n = C > 0$$ which means (given $a_n > 0$ only for support vectors)




              Every misclassified point is a support vector.




              This is a nice consequence and should have been stated in the book.



              Although, a remotely! related point has been stated in the book:




              Points with $a_n = C$ can lie inside the margin and can either be
              correctly classified if $xi_n leq 1$ or misclassified if $xi_n > 1$.







              share|improve this answer











              $endgroup$















                1












                1








                1





                $begingroup$

                Good point! Interesting consequence!



                Problem is the $a_n=0$ assumption, i.e. assuming misclassified points are not support vectors.



                Here is the flow. Slack variable $xi_n$ is defined as
                $$xi_n := |t_n - y(boldsymbolx_n)|$$
                where $t_n in +1, -1$ is the true label, and $y(boldsymbolx_n)$ is the prediction. Therefore, for a misclassified point (on the wrong side) we have $$xi_n > 1$$ by definition. Given $mu_n xi_n = 0$, therefore$$mu_n=0$$
                and given $a_n=C - mu_n$, therefore $$a_n = C > 0$$ which means (given $a_n > 0$ only for support vectors)




                Every misclassified point is a support vector.




                This is a nice consequence and should have been stated in the book.



                Although, a remotely! related point has been stated in the book:




                Points with $a_n = C$ can lie inside the margin and can either be
                correctly classified if $xi_n leq 1$ or misclassified if $xi_n > 1$.







                share|improve this answer











                $endgroup$



                Good point! Interesting consequence!



                Problem is the $a_n=0$ assumption, i.e. assuming misclassified points are not support vectors.



                Here is the flow. Slack variable $xi_n$ is defined as
                $$xi_n := |t_n - y(boldsymbolx_n)|$$
                where $t_n in +1, -1$ is the true label, and $y(boldsymbolx_n)$ is the prediction. Therefore, for a misclassified point (on the wrong side) we have $$xi_n > 1$$ by definition. Given $mu_n xi_n = 0$, therefore$$mu_n=0$$
                and given $a_n=C - mu_n$, therefore $$a_n = C > 0$$ which means (given $a_n > 0$ only for support vectors)




                Every misclassified point is a support vector.




                This is a nice consequence and should have been stated in the book.



                Although, a remotely! related point has been stated in the book:




                Points with $a_n = C$ can lie inside the margin and can either be
                correctly classified if $xi_n leq 1$ or misclassified if $xi_n > 1$.








                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 31 at 22:33

























                answered Mar 31 at 21:38









                EsmailianEsmailian

                3,191320




                3,191320



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48312%2fan-ambiguity-in-svm-equations-about-misclassified-data%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                    Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?