How will Occam's Razor principle work in Machine learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?

In musical terms, what properties are varied by the human voice to produce different words / syllables?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

What are the main differences between Stargate SG-1 cuts?

Can you force honesty by using the Speak with Dead and Zone of Truth spells together?

Is there hard evidence that the grant peer review system performs significantly better than random?

Is it dangerous to install hacking tools on my private linux machine?

Why datecode is SO IMPORTANT to chip manufacturers?

Understanding p-Values using an example

AppleTVs create a chatty alternate WiFi network

GDP with Intermediate Production

What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?

Connecting Mac Book Pro 2017 to 2 Projectors via USB C

Asymptotics question

Does any scripture mention that forms of God or Goddess are symbolic?

Can an iPhone 7 be made to function as a NFC Tag?

What is the difference between CTSS and ITS?

retrieve food groups from food item list

How to change the tick of the color bar legend to black

Co-worker has annoying ringtone

Universal covering space of the real projective line?

How would a mousetrap for use in space work?

Does silver oxide react with hydrogen sulfide?

Monty Hall Problem-Probability Paradox

Select every other edge (they share a common vertex)

How will Occam's Razor principle work in Machine learning

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?

The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.

Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning

the question

asked Mar 7 at 5:26

user1479198

5815

2

$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25

add a comment |

Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning

the question

asked Mar 7 at 5:26

user1479198

5815

2

$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25

add a comment |

Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning

the question

asked Mar 7 at 5:26

user1479198

5815

Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning

the question

machine-learning classification

asked Mar 7 at 5:26

user1479198

5815

asked Mar 7 at 5:26

user1479198

5815

asked Mar 7 at 5:26

user1479198

5815

asked Mar 7 at 5:26

user1479198

5815

asked Mar 7 at 5:26

user1479198

5815

2

$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25

add a comment |

2

$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25

3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein

– Jorge Barrios
Mar 7 at 11:25

add a comment |

4 Answers
4

active

oldest

votes

Occam’s razor principle:

Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.

In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.

What if training error is not the same?

If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.

As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.

Relation to validation set

In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

add a comment |

Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.

In above question one has to think in designing the simple separable boundaries,

like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)

similarly in figure 2 D2 answer is B.

answered Mar 7 at 7:58

Gaurav Dogra

312

add a comment |

Occam’s razor in data-fitting tasks :

First try linear equation

If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

D2

B clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A boundary is highly non-linear which seems like a jittered sine wave.

D1

However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :

enter image description here

So I opt for a C :-)

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36

$begingroup$
Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39

add a comment |

I am not sure if I have correctly understood the Occam's Razor principle or not.

Let's first address Occam's razor:

Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki

Next, let's address your answer:

Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.

This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.

answered Apr 3 at 10:51

Little Helper

101

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Occam’s razor principle:

Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.

In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.

What if training error is not the same?

As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.

Relation to validation set

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

add a comment |

Occam’s razor principle:

Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.

In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.

What if training error is not the same?

As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.

Relation to validation set

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

add a comment |

Occam’s razor principle:

Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.

In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.

What if training error is not the same?

As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.

Relation to validation set

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

Occam’s razor principle:

Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.

In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.

What if training error is not the same?

As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.

Relation to validation set

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

edited Mar 13 at 20:35

answered Mar 7 at 7:54

Esmailian

3,546420

answered Mar 7 at 7:54

Esmailian

3,546420

answered Mar 7 at 7:54

Esmailian

3,546420

add a comment |

Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.

In above question one has to think in designing the simple separable boundaries,

like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)

similarly in figure 2 D2 answer is B.

answered Mar 7 at 7:58

Gaurav Dogra

312

add a comment |

Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.

In above question one has to think in designing the simple separable boundaries,

like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)

similarly in figure 2 D2 answer is B.

answered Mar 7 at 7:58

Gaurav Dogra

312

add a comment |

Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.

In above question one has to think in designing the simple separable boundaries,

like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)

similarly in figure 2 D2 answer is B.

answered Mar 7 at 7:58

Gaurav Dogra

312

Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.

In above question one has to think in designing the simple separable boundaries,

like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)

similarly in figure 2 D2 answer is B.

answered Mar 7 at 7:58

Gaurav Dogra

312

answered Mar 7 at 7:58

Gaurav Dogra

312

answered Mar 7 at 7:58

Gaurav Dogra

312

answered Mar 7 at 7:58

Gaurav Dogra

312

add a comment |

Occam’s razor in data-fitting tasks :

First try linear equation

If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

D2

D1

However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :

enter image description here

So I opt for a C :-)

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36

$begingroup$
Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39

add a comment |

Occam’s razor in data-fitting tasks :

First try linear equation

If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

D2

D1

However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :

enter image description here

So I opt for a C :-)

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36

$begingroup$
Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39

add a comment |

Occam’s razor in data-fitting tasks :

First try linear equation

If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

D2

D1

However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :

enter image description here

So I opt for a C :-)

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

Occam’s razor in data-fitting tasks :

First try linear equation

If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.

D2

D1

However I am not sure about this one. A boundary is like a circle and B is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :

enter image description here

So I opt for a C :-)

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

answered Mar 7 at 13:53

Agnius Vasiliauskas

1213

$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36

$begingroup$
Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39

add a comment |

$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36

$begingroup$
Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39

I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.

– Delioth
Mar 7 at 20:36

Because there is a lot of empty space from B line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points

– Agnius Vasiliauskas
Mar 8 at 9:39

add a comment |

I am not sure if I have correctly understood the Occam's Razor principle or not.

Let's first address Occam's razor:

Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki

Next, let's address your answer:

Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.

answered Apr 3 at 10:51

Little Helper

101

add a comment |

I am not sure if I have correctly understood the Occam's Razor principle or not.

Let's first address Occam's razor:

Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki

Next, let's address your answer:

Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.

answered Apr 3 at 10:51

Little Helper

101

add a comment |

I am not sure if I have correctly understood the Occam's Razor principle or not.

Let's first address Occam's razor:

Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki

Next, let's address your answer:

Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.

answered Apr 3 at 10:51

Little Helper

101

I am not sure if I have correctly understood the Occam's Razor principle or not.

Let's first address Occam's razor:

Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki

Next, let's address your answer:

Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.

answered Apr 3 at 10:51

Little Helper

101

answered Apr 3 at 10:51

Little Helper

101

answered Apr 3 at 10:51

Little Helper

101

answered Apr 3 at 10:51

Little Helper

101

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Kr7S,0gffuD17,CKi RJl I2VX,q7ghBDtBd741CXIxKQrmuO30Dqsj,3X ITitunvQK0t5N4Qd 9,TsbJ NLTXoOc

搜尋此網誌

Trjtdtk

4 Answers
4

D2

D1

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

D2

D1

D2

D1

D2

D1

D2

D1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers 4

D2

D1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

D2

D1

D2

D1

D2

D1

D2

D1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

4 Answers
4

4 Answers
4

4 Answers
4