How will Occam's Razor principle work in Machine learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

What are the main differences between Stargate SG-1 cuts?

Can you force honesty by using the Speak with Dead and Zone of Truth spells together?

Is there hard evidence that the grant peer review system performs significantly better than random?

Is it dangerous to install hacking tools on my private linux machine?

Why datecode is SO IMPORTANT to chip manufacturers?

Understanding p-Values using an example

AppleTVs create a chatty alternate WiFi network

GDP with Intermediate Production

What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?

Connecting Mac Book Pro 2017 to 2 Projectors via USB C

Asymptotics question

Does any scripture mention that forms of God or Goddess are symbolic?

Can an iPhone 7 be made to function as a NFC Tag?

What is the difference between CTSS and ITS?

retrieve food groups from food item list

How to change the tick of the color bar legend to black

Co-worker has annoying ringtone

Universal covering space of the real projective line?

How would a mousetrap for use in space work?

Does silver oxide react with hydrogen sulfide?

Monty Hall Problem-Probability Paradox

Select every other edge (they share a common vertex)



How will Occam's Razor principle work in Machine learning



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?










11












$begingroup$


The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.



Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning



the question










share|improve this question









$endgroup$







  • 2




    $begingroup$
    3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
    $endgroup$
    – Jorge Barrios
    Mar 7 at 11:25















11












$begingroup$


The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.



Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning



the question










share|improve this question









$endgroup$







  • 2




    $begingroup$
    3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
    $endgroup$
    – Jorge Barrios
    Mar 7 at 11:25













11












11








11


4



$begingroup$


The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.



Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning



the question










share|improve this question









$endgroup$




The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.



Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning



the question







machine-learning classification






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 7 at 5:26









user1479198user1479198

5815




5815







  • 2




    $begingroup$
    3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
    $endgroup$
    – Jorge Barrios
    Mar 7 at 11:25












  • 2




    $begingroup$
    3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
    $endgroup$
    – Jorge Barrios
    Mar 7 at 11:25







2




2




$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25




$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25










4 Answers
4






active

oldest

votes


















13












$begingroup$

Occam’s razor principle:




Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.




In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.



What if training error is not the same?



If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.



As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.



Relation to validation set



In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.






share|improve this answer











$endgroup$




















    3












    $begingroup$

    Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
    Most algos work in this principal.



    In above question one has to think in designing the simple separable boundaries,



    like in first picture D1 answer is B.
    As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)



    similarly in figure 2 D2 answer is B.






    share|improve this answer









    $endgroup$




















      2












      $begingroup$

      Occam’s razor in data-fitting tasks :



      1. First try linear equation

      2. If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

      D2



      B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.



      D1



      However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :



      enter image description here



      So I opt for a C :-)






      share|improve this answer









      $endgroup$












      • $begingroup$
        I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
        $endgroup$
        – Delioth
        Mar 7 at 20:36










      • $begingroup$
        Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
        $endgroup$
        – Agnius Vasiliauskas
        Mar 8 at 9:39


















      0












      $begingroup$


      I am not sure if I have correctly understood the Occam's Razor principle or not.




      Let's first address Occam's razor:




      Occam's razor [..] states that "simpler solutions are more likely to
      be correct than complex ones." - Wiki




      Next, let's address your answer:




      Because as per Occam's Razor, choose the simpler classifier which does
      a decent job rather than the complex one.




      This is correct because, in machine learning, overfitting is a problem.
      If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
      This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.






      share|improve this answer









      $endgroup$













        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "557"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        13












        $begingroup$

        Occam’s razor principle:




        Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.




        In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.



        What if training error is not the same?



        If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.



        As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.



        Relation to validation set



        In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.






        share|improve this answer











        $endgroup$

















          13












          $begingroup$

          Occam’s razor principle:




          Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.




          In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.



          What if training error is not the same?



          If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.



          As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.



          Relation to validation set



          In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.






          share|improve this answer











          $endgroup$















            13












            13








            13





            $begingroup$

            Occam’s razor principle:




            Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.




            In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.



            What if training error is not the same?



            If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.



            As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.



            Relation to validation set



            In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.






            share|improve this answer











            $endgroup$



            Occam’s razor principle:




            Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.




            In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.



            What if training error is not the same?



            If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.



            As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.



            Relation to validation set



            In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 13 at 20:35

























            answered Mar 7 at 7:54









            EsmailianEsmailian

            3,546420




            3,546420





















                3












                $begingroup$

                Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
                Most algos work in this principal.



                In above question one has to think in designing the simple separable boundaries,



                like in first picture D1 answer is B.
                As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)



                similarly in figure 2 D2 answer is B.






                share|improve this answer









                $endgroup$

















                  3












                  $begingroup$

                  Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
                  Most algos work in this principal.



                  In above question one has to think in designing the simple separable boundaries,



                  like in first picture D1 answer is B.
                  As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)



                  similarly in figure 2 D2 answer is B.






                  share|improve this answer









                  $endgroup$















                    3












                    3








                    3





                    $begingroup$

                    Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
                    Most algos work in this principal.



                    In above question one has to think in designing the simple separable boundaries,



                    like in first picture D1 answer is B.
                    As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)



                    similarly in figure 2 D2 answer is B.






                    share|improve this answer









                    $endgroup$



                    Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
                    Most algos work in this principal.



                    In above question one has to think in designing the simple separable boundaries,



                    like in first picture D1 answer is B.
                    As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)



                    similarly in figure 2 D2 answer is B.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 7 at 7:58









                    Gaurav DograGaurav Dogra

                    312




                    312





















                        2












                        $begingroup$

                        Occam’s razor in data-fitting tasks :



                        1. First try linear equation

                        2. If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

                        D2



                        B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.



                        D1



                        However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :



                        enter image description here



                        So I opt for a C :-)






                        share|improve this answer









                        $endgroup$












                        • $begingroup$
                          I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                          $endgroup$
                          – Delioth
                          Mar 7 at 20:36










                        • $begingroup$
                          Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                          $endgroup$
                          – Agnius Vasiliauskas
                          Mar 8 at 9:39















                        2












                        $begingroup$

                        Occam’s razor in data-fitting tasks :



                        1. First try linear equation

                        2. If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

                        D2



                        B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.



                        D1



                        However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :



                        enter image description here



                        So I opt for a C :-)






                        share|improve this answer









                        $endgroup$












                        • $begingroup$
                          I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                          $endgroup$
                          – Delioth
                          Mar 7 at 20:36










                        • $begingroup$
                          Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                          $endgroup$
                          – Agnius Vasiliauskas
                          Mar 8 at 9:39













                        2












                        2








                        2





                        $begingroup$

                        Occam’s razor in data-fitting tasks :



                        1. First try linear equation

                        2. If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

                        D2



                        B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.



                        D1



                        However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :



                        enter image description here



                        So I opt for a C :-)






                        share|improve this answer









                        $endgroup$



                        Occam’s razor in data-fitting tasks :



                        1. First try linear equation

                        2. If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

                        D2



                        B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.



                        D1



                        However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :



                        enter image description here



                        So I opt for a C :-)







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Mar 7 at 13:53









                        Agnius VasiliauskasAgnius Vasiliauskas

                        1213




                        1213











                        • $begingroup$
                          I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                          $endgroup$
                          – Delioth
                          Mar 7 at 20:36










                        • $begingroup$
                          Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                          $endgroup$
                          – Agnius Vasiliauskas
                          Mar 8 at 9:39
















                        • $begingroup$
                          I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                          $endgroup$
                          – Delioth
                          Mar 7 at 20:36










                        • $begingroup$
                          Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                          $endgroup$
                          – Agnius Vasiliauskas
                          Mar 8 at 9:39















                        $begingroup$
                        I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                        $endgroup$
                        – Delioth
                        Mar 7 at 20:36




                        $begingroup$
                        I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
                        $endgroup$
                        – Delioth
                        Mar 7 at 20:36












                        $begingroup$
                        Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                        $endgroup$
                        – Agnius Vasiliauskas
                        Mar 8 at 9:39




                        $begingroup$
                        Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
                        $endgroup$
                        – Agnius Vasiliauskas
                        Mar 8 at 9:39











                        0












                        $begingroup$


                        I am not sure if I have correctly understood the Occam's Razor principle or not.




                        Let's first address Occam's razor:




                        Occam's razor [..] states that "simpler solutions are more likely to
                        be correct than complex ones." - Wiki




                        Next, let's address your answer:




                        Because as per Occam's Razor, choose the simpler classifier which does
                        a decent job rather than the complex one.




                        This is correct because, in machine learning, overfitting is a problem.
                        If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
                        This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.






                        share|improve this answer









                        $endgroup$

















                          0












                          $begingroup$


                          I am not sure if I have correctly understood the Occam's Razor principle or not.




                          Let's first address Occam's razor:




                          Occam's razor [..] states that "simpler solutions are more likely to
                          be correct than complex ones." - Wiki




                          Next, let's address your answer:




                          Because as per Occam's Razor, choose the simpler classifier which does
                          a decent job rather than the complex one.




                          This is correct because, in machine learning, overfitting is a problem.
                          If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
                          This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.






                          share|improve this answer









                          $endgroup$















                            0












                            0








                            0





                            $begingroup$


                            I am not sure if I have correctly understood the Occam's Razor principle or not.




                            Let's first address Occam's razor:




                            Occam's razor [..] states that "simpler solutions are more likely to
                            be correct than complex ones." - Wiki




                            Next, let's address your answer:




                            Because as per Occam's Razor, choose the simpler classifier which does
                            a decent job rather than the complex one.




                            This is correct because, in machine learning, overfitting is a problem.
                            If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
                            This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.






                            share|improve this answer









                            $endgroup$




                            I am not sure if I have correctly understood the Occam's Razor principle or not.




                            Let's first address Occam's razor:




                            Occam's razor [..] states that "simpler solutions are more likely to
                            be correct than complex ones." - Wiki




                            Next, let's address your answer:




                            Because as per Occam's Razor, choose the simpler classifier which does
                            a decent job rather than the complex one.




                            This is correct because, in machine learning, overfitting is a problem.
                            If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
                            This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Apr 3 at 10:51









                            Little HelperLittle Helper

                            101




                            101



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                                Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                                Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High