Overfitting in an unsupervised technique2019 Community Moderator ElectionOverfitting in K-meansHow do I get Twitter Dataset for VisualizationUnsupervised Classification for documentsHow to test accuracy of an unsupervised clustering model output?Overfitting and COLT/Statistical Learning TheoryCannot underfit/overfit on the IRIS datasetUnsupervised text clustering using a driving listDifferences between applying KMeans over PCA and applying PCA over KMeansFinding outliers from multiple filesWhat are the possible approaches to fixing Overfitting on a CNN?Gaussian Mixture Models as a classifier?

files created then deleted at every second in tmp directory

What historical events would have to change in order to make 19th century "steampunk" technology possible?

How can a day be exactly 24 hours long?

Can we compute the area of a quadrilateral with one right angle when we only know the lengths of any three sides?

Why can't we play rap on piano?

Why is it a bad idea to hire a hitman to eliminate most corrupt politicians?

One verb to replace 'be a member of' a club

How to show a landlord what we have in savings?

Theorists sure want true answers to this!

Placement of More Information/Help Icon button for Radio Buttons

How do conventional missiles fly?

Forgetting the musical notes while performing in concert

Is there an expression that means doing something right before you will need it rather than doing it in case you might need it?

Ambiguity in the definition of entropy

Can a virus destroy the BIOS of a modern computer?

GFCI outlets - can they be repaired? Are they really needed at the end of a circuit?

Is this draw by repetition?

Unlock My Phone! February 2018

Amending the P2P Layer

Is it "common practice in Fourier transform spectroscopy to multiply the measured interferogram by an apodizing function"? If so, why?

Is there a hemisphere-neutral way of specifying a season?

Why would the Red Woman birth a shadow if she worshipped the Lord of the Light?

How can saying a song's name be a copyright violation?

Processor speed limited at 0.4 GHz



Overfitting in an unsupervised technique



2019 Community Moderator ElectionOverfitting in K-meansHow do I get Twitter Dataset for VisualizationUnsupervised Classification for documentsHow to test accuracy of an unsupervised clustering model output?Overfitting and COLT/Statistical Learning TheoryCannot underfit/overfit on the IRIS datasetUnsupervised text clustering using a driving listDifferences between applying KMeans over PCA and applying PCA over KMeansFinding outliers from multiple filesWhat are the possible approaches to fixing Overfitting on a CNN?Gaussian Mixture Models as a classifier?










1












$begingroup$


I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..



Thanks!










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
    $endgroup$
    – Anony-Mousse
    Jul 10 '17 at 23:56










  • $begingroup$
    How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
    $endgroup$
    – guest
    Mar 26 at 19:33















1












$begingroup$


I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..



Thanks!










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
    $endgroup$
    – Anony-Mousse
    Jul 10 '17 at 23:56










  • $begingroup$
    How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
    $endgroup$
    – guest
    Mar 26 at 19:33













1












1








1





$begingroup$


I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..



Thanks!










share|improve this question









$endgroup$




I am trying to understand if over-fitting could happen in an unsupervised technique like kmeans clustering.Could someone help me understand if and how this would happen..



Thanks!







clustering overfitting






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jul 10 '17 at 5:12









IndiIndi

12810




12810







  • 1




    $begingroup$
    Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
    $endgroup$
    – Anony-Mousse
    Jul 10 '17 at 23:56










  • $begingroup$
    How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
    $endgroup$
    – guest
    Mar 26 at 19:33












  • 1




    $begingroup$
    Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
    $endgroup$
    – Anony-Mousse
    Jul 10 '17 at 23:56










  • $begingroup$
    How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
    $endgroup$
    – guest
    Mar 26 at 19:33







1




1




$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56




$begingroup$
Mostly if you allow a model to have too many parameters, then it will appear to fit the data well.
$endgroup$
– Anony-Mousse
Jul 10 '17 at 23:56












$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33




$begingroup$
How do you test your results for overfitting in a k-means run? Some people have said use a training set. I have about 1500 records and about 20 fields.
$endgroup$
– guest
Mar 26 at 19:33










2 Answers
2






active

oldest

votes


















2












$begingroup$

I'm not sure if this is valid but how about two trivial clustering examples:



  • Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.

  • On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.

Those will be valid clusters but obviously they will not give you any useful information.






share|improve this answer









$endgroup$




















    2












    $begingroup$

    Yes, overfitting occurs in unsupervised learning as well



    Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.



    Example for clustering



    For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.



    Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k is too small for k-means) because you are finding groupings that are only noise.






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "557"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20286%2foverfitting-in-an-unsupervised-technique%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2












      $begingroup$

      I'm not sure if this is valid but how about two trivial clustering examples:



      • Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.

      • On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.

      Those will be valid clusters but obviously they will not give you any useful information.






      share|improve this answer









      $endgroup$

















        2












        $begingroup$

        I'm not sure if this is valid but how about two trivial clustering examples:



        • Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.

        • On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.

        Those will be valid clusters but obviously they will not give you any useful information.






        share|improve this answer









        $endgroup$















          2












          2








          2





          $begingroup$

          I'm not sure if this is valid but how about two trivial clustering examples:



          • Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.

          • On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.

          Those will be valid clusters but obviously they will not give you any useful information.






          share|improve this answer









          $endgroup$



          I'm not sure if this is valid but how about two trivial clustering examples:



          • Every object belongs to cluster which contain only this object. So for example if you would like to cluster N cars, there will be N clusters - one for each car.

          • On the other hand there could be case when algorithm will pick one cluster which will contain all elements inside it - one cluster with all N cars.

          Those will be valid clusters but obviously they will not give you any useful information.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 10 '17 at 6:58









          Damian MelniczukDamian Melniczuk

          442317




          442317





















              2












              $begingroup$

              Yes, overfitting occurs in unsupervised learning as well



              Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.



              Example for clustering



              For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.



              Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k is too small for k-means) because you are finding groupings that are only noise.






              share|improve this answer









              $endgroup$

















                2












                $begingroup$

                Yes, overfitting occurs in unsupervised learning as well



                Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.



                Example for clustering



                For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.



                Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k is too small for k-means) because you are finding groupings that are only noise.






                share|improve this answer









                $endgroup$















                  2












                  2








                  2





                  $begingroup$

                  Yes, overfitting occurs in unsupervised learning as well



                  Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.



                  Example for clustering



                  For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.



                  Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k is too small for k-means) because you are finding groupings that are only noise.






                  share|improve this answer









                  $endgroup$



                  Yes, overfitting occurs in unsupervised learning as well



                  Overfitting means your algorithm is finding patterns in attributes that only exist in this dataset and don't generalize to new, unseen data. In addition to finding real patterns, when overfitting, the algorithm is also finding "patterns" that are only stochastic noise.



                  Example for clustering



                  For clustering this means the clusters you are finding only exist in your dataset and can't be seen in new data.



                  Your algorithm might find two clusters in the dataset that don't exist for new data, because both clusters are actually subset of one bigger cluster. Your algorithm is overfitting, your clustering is too fine (e.g. your k is too small for k-means) because you are finding groupings that are only noise.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jul 10 '17 at 7:37









                  Simon BöhmSimon Böhm

                  218210




                  218210



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f20286%2foverfitting-in-an-unsupervised-technique%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                      Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                      Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High