How will Occam's Razor principle work in Machine learning Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?
In musical terms, what properties are varied by the human voice to produce different words / syllables?
As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?
What are the main differences between Stargate SG-1 cuts?
Can you force honesty by using the Speak with Dead and Zone of Truth spells together?
Is there hard evidence that the grant peer review system performs significantly better than random?
Is it dangerous to install hacking tools on my private linux machine?
Why datecode is SO IMPORTANT to chip manufacturers?
Understanding p-Values using an example
AppleTVs create a chatty alternate WiFi network
GDP with Intermediate Production
What is the chair depicted in Cesare Maccari's 1889 painting "Cicerone denuncia Catilina"?
Connecting Mac Book Pro 2017 to 2 Projectors via USB C
Asymptotics question
Does any scripture mention that forms of God or Goddess are symbolic?
Can an iPhone 7 be made to function as a NFC Tag?
What is the difference between CTSS and ITS?
retrieve food groups from food item list
How to change the tick of the color bar legend to black
Co-worker has annoying ringtone
Universal covering space of the real projective line?
How would a mousetrap for use in space work?
Does silver oxide react with hydrogen sulfide?
Monty Hall Problem-Probability Paradox
Select every other edge (they share a common vertex)
How will Occam's Razor principle work in Machine learning
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow exactly does a validation data-set work work in machine learning?Explaining machine learning modelsPredict the date an item will be sold using machine learningMachine learning learn to work well on future data distribution?How does a feature learning component workQ learning neural network experience replay problemHow Do Machine Learning Models Work and Remember?Why neural networks do not perform well on structured data?Will reinforcement learning work if states wont get repeated again?What are Machine learning model characteristics?
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
$endgroup$
add a comment |
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
$endgroup$
2
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25
add a comment |
$begingroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
$endgroup$
The following question displayed in the image was asked during one of the exams recently. I am not sure if I have correctly understood the Occam's Razor principle or not. According to the distributions and decision boundaries given in the question and following the Occam's Razor the decision boundary B in both the cases should be the answer. Because as per Occam's Razor, choose the simpler classifier which does a decent job rather than the complex one.
Can someone please testify if my understanding is correct and the answer chosen is appropriate or not?
Please help as I am just a beginner in machine learning
machine-learning classification
machine-learning classification
asked Mar 7 at 5:26
user1479198user1479198
5815
5815
2
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25
add a comment |
2
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25
2
2
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to validation set
In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
$endgroup$
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space fromB
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus,B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
add a comment |
$begingroup$
I am not sure if I have correctly understood the Occam's Razor principle or not.
Let's first address Occam's razor:
Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki
Next, let's address your answer:
Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.
This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to validation set
In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to validation set
In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
add a comment |
$begingroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to validation set
In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
$endgroup$
Occam’s razor principle:
Having two hypotheses (here, decision boundaries) that has the same empirical risk (here, training error), a short explanation (here, a boundary with fewer parameters) tends to be more valid than a long explanation.
In your example, both A and B have zero training error, thus B (shorter explanation) is preferred.
What if training error is not the same?
If boundary A had a smaller training error than B, selecting becomes tricky. We need to quantify "explanation size" the same as "empirical risk" and combine the two in one scoring function, then proceed to compare A and B. An example would be Akaike Information Criterion (AIC) that combines empirical risk (measured with negative log-likelihood) and explanation size (measured with the number of parameters) in one score.
As a side note, AIC cannot be used for all models, there are many alternatives to AIC too.
Relation to validation set
In many practical cases, when model progresses toward more complexity (larger explanation) to reach a lower training error, AIC and the like can be replaced with a validation set (a set on which the model is not trained). We stop the progress when validation error (error of model on validation set) starts to increase. This way, we strike a balance between low training error and short explanation.
edited Mar 13 at 20:35
answered Mar 7 at 7:54
EsmailianEsmailian
3,546420
3,546420
add a comment |
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
$endgroup$
add a comment |
$begingroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
$endgroup$
Occam Razor is just a synonym to Parsimony principal. (KISS, Keep it simple and stupid.)
Most algos work in this principal.
In above question one has to think in designing the simple separable boundaries,
like in first picture D1 answer is B.
As it define the best line separating 2 samples, as a is polynomial and may end up in over-fitting. (if I would have used SVM that line would have come)
similarly in figure 2 D2 answer is B.
answered Mar 7 at 7:58
Gaurav DograGaurav Dogra
312
312
add a comment |
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space fromB
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus,B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
$endgroup$
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space fromB
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus,B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
add a comment |
$begingroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
$endgroup$
Occam’s razor in data-fitting tasks :
- First try linear equation
- If (1) don't helps much - choose a non-linear one with less terms and/or smaller degrees of variables.
D2
B
clearly wins, because it's linear boundary which nicely separates data. (What is "nicely" I can't currently define. You have to develop this feeling with experience). A
boundary is highly non-linear which seems like a jittered sine wave.
D1
However I am not sure about this one. A
boundary is like a circle and B
is strictly linear. IMHO, for me - boundary line is neither circle segment nor a line segment,- it's parabola-like curve :
So I opt for a C
:-)
answered Mar 7 at 13:53
Agnius VasiliauskasAgnius Vasiliauskas
1213
1213
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space fromB
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus,B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
add a comment |
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space fromB
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus,B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points
$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
I'm still unsure of why you want an in-between line for D1. Occam's Razor says to use the simple solution that works. Absent more data, B is a perfectly valid division that fits the data. If we received more data that suggests more of a curve to B's data set then I could see your argument, but requesting C goes against your point (1), since it's a linear boundary that works.
$endgroup$
– Delioth
Mar 7 at 20:36
$begingroup$
Because there is a lot of empty space from
B
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
$begingroup$
Because there is a lot of empty space from
B
line towards the left circular cluster of points. This means that any new random point arriving has a very high chance being assigned to circular cluster on the left and a very small chance for being assigned to the cluster in the right. Thus, B
line is not an optimal boundary in case of new random points on plane. And you can't ignore randomness of data, because usually there is always a random displacement of points$endgroup$
– Agnius Vasiliauskas
Mar 8 at 9:39
add a comment |
$begingroup$
I am not sure if I have correctly understood the Occam's Razor principle or not.
Let's first address Occam's razor:
Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki
Next, let's address your answer:
Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.
This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.
$endgroup$
add a comment |
$begingroup$
I am not sure if I have correctly understood the Occam's Razor principle or not.
Let's first address Occam's razor:
Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki
Next, let's address your answer:
Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.
This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.
$endgroup$
add a comment |
$begingroup$
I am not sure if I have correctly understood the Occam's Razor principle or not.
Let's first address Occam's razor:
Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki
Next, let's address your answer:
Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.
This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.
$endgroup$
I am not sure if I have correctly understood the Occam's Razor principle or not.
Let's first address Occam's razor:
Occam's razor [..] states that "simpler solutions are more likely to
be correct than complex ones." - Wiki
Next, let's address your answer:
Because as per Occam's Razor, choose the simpler classifier which does
a decent job rather than the complex one.
This is correct because, in machine learning, overfitting is a problem.
If you choose a more complex model, you are more likely to classify the test data and not the actual behavior of your problem.
This means that, when you use your complex classifier to make predictions on new data, it is more likely to be worse than the simple classifier.
answered Apr 3 at 10:51
Little HelperLittle Helper
101
101
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46831%2fhow-will-occams-razor-principle-work-in-machine-learning%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
3.328 "If a sign is not necessary then it is meaningless. That is the meaning of Occam's Razor." From the Tractatus Logico-Philosophicus by Wittgenstein
$endgroup$
– Jorge Barrios
Mar 7 at 11:25