Overfitting in an unsupervised technique2019 Community Moderator ElectionOverfitting in K-meansHow do I get Twitter Dataset for VisualizationUnsupervised Classification for documentsHow to test accuracy of an unsupervised clustering model output?Overfitting and COLT/Statistical Learning TheoryCannot underfit/overfit on the IRIS datasetUnsupervised text clustering using a driving listDifferences between applying KMeans over PCA and applying PCA over KMeansFinding outliers from multiple filesWhat are the possible approaches to fixing Overfitting on a CNN?Gaussian Mixture Models as a classifier?
files created then deleted at every second in tmp directory
What historical events would have to change in order to make 19th century "steampunk" technology possible?
How can a day be exactly 24 hours long?
Can we compute the area of a quadrilateral with one right angle when we only know the lengths of any three sides?
Why can't we play rap on piano?
Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?
One verb to replace 'be a member of' a club
How to show a landlord what we have in savings?
Theorists sure want true answers to this!
Placement of More Information/Help Icon button for Radio Buttons
How do conventional missiles fly?
Forgetting the musical notes while performing in concert
Is there an expression that means doing something right before you will need it rather than doing it in case you might need it?
Ambiguity in the definition of entropy
Can a virus destroy the BIOS of a modern computer?
GFCI outlets - can they be repaired? Are they really needed at the end of a circuit?
Is this draw by repetition?
Unlock My Phone! February 2018
Amending the P2P Layer
Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?
Is there a hemisphere-neutral way of specifying a season?
Why would the Red Woman birth a shadow if she worshipped the Lord of the Light?
How can saying a song's name be a copyright violation?
Processor speed limited at 0.4 GHz
Overfitting in an unsupervised technique
2019 Community Moderator ElectionOverfitting in K-meansHow do I get Twitter Dataset for VisualizationUnsupervised Classification for documentsHow to test accuracy of an unsupervised clustering model output?Overfitting and COLT/Statistical Learning TheoryCannot underfit/overfit on the IRIS datasetUnsupervised text clustering using a driving listDifferences between applying KMeans over PCA and applying PCA over KMeansFinding outliers from multiple filesWhat are the possible approaches to fixing Overfitting on a CNN?Gaussian Mixture Models as a classifier?
$begingroup$
I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..
Thanks!
clustering overfitting
$endgroup$
add a comment |
$begingroup$
I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..
Thanks!
clustering overfitting
$endgroup$
1
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33
add a comment |
$begingroup$
I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..
Thanks!
clustering overfitting
$endgroup$
I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..
Thanks!
clustering overfitting
clustering overfitting
asked Jul 10 '17 at 5:12
IndiIndi
12810
12810
1
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33
add a comment |
1
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33
1
1
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I'm not sure if this is valid but how about two trivial clustering examples:
- Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.
- On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.
Those will be valid clusters but obviously they will not give you any useful information.
$endgroup$
add a comment |
$begingroup$
Yes, overfitting occurs in unsupervised learning as well
Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.
Example for clustering
For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.
Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k
is too small for k-means) because you are finding groupings that are only noise.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20286%2foverfitting-in-an-unsupervised-technique%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I'm not sure if this is valid but how about two trivial clustering examples:
- Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.
- On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.
Those will be valid clusters but obviously they will not give you any useful information.
$endgroup$
add a comment |
$begingroup$
I'm not sure if this is valid but how about two trivial clustering examples:
- Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.
- On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.
Those will be valid clusters but obviously they will not give you any useful information.
$endgroup$
add a comment |
$begingroup$
I'm not sure if this is valid but how about two trivial clustering examples:
- Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.
- On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.
Those will be valid clusters but obviously they will not give you any useful information.
$endgroup$
I'm not sure if this is valid but how about two trivial clustering examples:
- Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.
- On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.
Those will be valid clusters but obviously they will not give you any useful information.
answered Jul 10 '17 at 6:58
Damian MelniczukDamian Melniczuk
442317
442317
add a comment |
add a comment |
$begingroup$
Yes, overfitting occurs in unsupervised learning as well
Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.
Example for clustering
For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.
Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k
is too small for k-means) because you are finding groupings that are only noise.
$endgroup$
add a comment |
$begingroup$
Yes, overfitting occurs in unsupervised learning as well
Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.
Example for clustering
For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.
Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k
is too small for k-means) because you are finding groupings that are only noise.
$endgroup$
add a comment |
$begingroup$
Yes, overfitting occurs in unsupervised learning as well
Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.
Example for clustering
For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.
Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k
is too small for k-means) because you are finding groupings that are only noise.
$endgroup$
Yes, overfitting occurs in unsupervised learning as well
Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.
Example for clustering
For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.
Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k
is too small for k-means) because you are finding groupings that are only noise.
answered Jul 10 '17 at 7:37
Simon BöhmSimon Böhm
218210
218210
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20286%2foverfitting-in-an-unsupervised-technique%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56
$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33