k modes: optimal kClustering not producing even clustersK-means incoherent behaviour choosing K with Elbow method, BIC, variance explained and silhouetteClustering users based on buying behaviourIs Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Clustering with cosine similarityClustering with multiple distance measuresHow to use cluster analysis with grouped data so one cluster may only have not more than one item from each group?clustering 2-dimensional euclidean vectors - appropriate dissimilarity measureLow silhouette coefficientK-modes implementation in pyspark

Optimising a list searching algorithm

How does 取材で訪れた integrate into this sentence?

Calculate the frequency of characters in a string

Practical application of matrices and determinants

If "dar" means "to give", what does "daros" mean?

Do I need to consider instance restrictions when showing a language is in P?

Do US professors/group leaders only get a salary, but no group budget?

Describing a chess game in a novel

What is the significance behind "40 days" that often appears in the Bible?

How is the partial sum of a geometric sequence calculated?

What can I do if I am asked to learn different programming languages very frequently?

Inhabiting Mars versus going straight for a Dyson swarm

World War I as a war of liberals against authoritarians?

What exactly term 'companion plants' means?

두음법칙 - When did North and South diverge in pronunciation of initial ㄹ?

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

PTIJ: Do Irish Jews have "the luck of the Irish"?

Why is indicated airspeed rather than ground speed used during the takeoff roll?

How can I create URL shortcuts/redirects for task/diff IDs in Phabricator?

Why are there no stars visible in cislunar space?

What is the English word for a graduation award?

How can add link in Header link Before the Welcome Message in magento 2

Why is there so much iron?

A Ri-diddley-iley Riddle



k modes: optimal k


Clustering not producing even clustersK-means incoherent behaviour choosing K with Elbow method, BIC, variance explained and silhouetteClustering users based on buying behaviourIs Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Clustering with cosine similarityClustering with multiple distance measuresHow to use cluster analysis with grouped data so one cluster may only have not more than one item from each group?clustering 2-dimensional euclidean vectors - appropriate dissimilarity measureLow silhouette coefficientK-modes implementation in pyspark













0












$begingroup$


I have categorical data and I'm trying to implement k-modes using the GitHub package available here. I am trying to create clusters in my (large) dataset of say, 5-7 records, each of most similar records.



However, as of now I have no means to select the optimal 'k' which would result in maximum silhouette score, ideally. This would be ideal as k-modes works on dissimilarity/similarity measure as a distance. So I would assume that silhouette distance would then measure how close/far the clusters are based on the distance metric defined by this dissimilarity and thus, establish the silhouette score. I'm not able to find an implementation of this.



Can I perhaps use the elbow method here? But then, I'm not able to understand how to programmatically determine this, without looking at a graph as I have to do this process repeatedly a large number of times. Currently, an idea is - find k where cost drops substantially. See if the next few values introduce a very less drop in cost or not. If yes, choose this as k, if no.. then what? I'm a little confused at this point.



I was looking online and also found this, which I'm not able to interpret in terms of k modes. I'm looking for any code/suggestions to start me off on the right path.










share|improve this question









New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 2




    $begingroup$
    Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
    $endgroup$
    – Anony-Mousse
    yesterday















0












$begingroup$


I have categorical data and I'm trying to implement k-modes using the GitHub package available here. I am trying to create clusters in my (large) dataset of say, 5-7 records, each of most similar records.



However, as of now I have no means to select the optimal 'k' which would result in maximum silhouette score, ideally. This would be ideal as k-modes works on dissimilarity/similarity measure as a distance. So I would assume that silhouette distance would then measure how close/far the clusters are based on the distance metric defined by this dissimilarity and thus, establish the silhouette score. I'm not able to find an implementation of this.



Can I perhaps use the elbow method here? But then, I'm not able to understand how to programmatically determine this, without looking at a graph as I have to do this process repeatedly a large number of times. Currently, an idea is - find k where cost drops substantially. See if the next few values introduce a very less drop in cost or not. If yes, choose this as k, if no.. then what? I'm a little confused at this point.



I was looking online and also found this, which I'm not able to interpret in terms of k modes. I'm looking for any code/suggestions to start me off on the right path.










share|improve this question









New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$







  • 2




    $begingroup$
    Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
    $endgroup$
    – Anony-Mousse
    yesterday













0












0








0





$begingroup$


I have categorical data and I'm trying to implement k-modes using the GitHub package available here. I am trying to create clusters in my (large) dataset of say, 5-7 records, each of most similar records.



However, as of now I have no means to select the optimal 'k' which would result in maximum silhouette score, ideally. This would be ideal as k-modes works on dissimilarity/similarity measure as a distance. So I would assume that silhouette distance would then measure how close/far the clusters are based on the distance metric defined by this dissimilarity and thus, establish the silhouette score. I'm not able to find an implementation of this.



Can I perhaps use the elbow method here? But then, I'm not able to understand how to programmatically determine this, without looking at a graph as I have to do this process repeatedly a large number of times. Currently, an idea is - find k where cost drops substantially. See if the next few values introduce a very less drop in cost or not. If yes, choose this as k, if no.. then what? I'm a little confused at this point.



I was looking online and also found this, which I'm not able to interpret in terms of k modes. I'm looking for any code/suggestions to start me off on the right path.










share|improve this question









New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have categorical data and I'm trying to implement k-modes using the GitHub package available here. I am trying to create clusters in my (large) dataset of say, 5-7 records, each of most similar records.



However, as of now I have no means to select the optimal 'k' which would result in maximum silhouette score, ideally. This would be ideal as k-modes works on dissimilarity/similarity measure as a distance. So I would assume that silhouette distance would then measure how close/far the clusters are based on the distance metric defined by this dissimilarity and thus, establish the silhouette score. I'm not able to find an implementation of this.



Can I perhaps use the elbow method here? But then, I'm not able to understand how to programmatically determine this, without looking at a graph as I have to do this process repeatedly a large number of times. Currently, an idea is - find k where cost drops substantially. See if the next few values introduce a very less drop in cost or not. If yes, choose this as k, if no.. then what? I'm a little confused at this point.



I was looking online and also found this, which I'm not able to interpret in terms of k modes. I'm looking for any code/suggestions to start me off on the right path.







machine-learning python clustering k-means






share|improve this question









New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago







user2816215













New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









user2816215user2816215

62




62




New contributor




user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user2816215 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 2




    $begingroup$
    Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
    $endgroup$
    – Anony-Mousse
    yesterday












  • 2




    $begingroup$
    Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
    $endgroup$
    – Anony-Mousse
    yesterday







2




2




$begingroup$
Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
$endgroup$
– Anony-Mousse
yesterday




$begingroup$
Please don't cross post duplicates: stackoverflow.com/q/55188965/1060350
$endgroup$
– Anony-Mousse
yesterday










1 Answer
1






active

oldest

votes


















1












$begingroup$

Instead of trying to find a place to download some source code, why don't you just implement, e.g., Silhouette yourself?



Plenty of the code you find online in blogs and repos is broken.



I've seen so many github repositories with bad code, and people like you wondering why it doesn't work. Relying on anonymous others to not have made mistakes is a bad idea. At some point you are better off writing the code yourself!



Of course it is okay to rely on large open-source projects like sklearn, R, ELKI, Weka. These have code-reviews, discuss pull requests, and dozens of people look at the code, use it, try to find and fix bugs (but even there are errors in the code).






share|improve this answer









$endgroup$












    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    user2816215 is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47373%2fk-modes-optimal-k%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    Instead of trying to find a place to download some source code, why don't you just implement, e.g., Silhouette yourself?



    Plenty of the code you find online in blogs and repos is broken.



    I've seen so many github repositories with bad code, and people like you wondering why it doesn't work. Relying on anonymous others to not have made mistakes is a bad idea. At some point you are better off writing the code yourself!



    Of course it is okay to rely on large open-source projects like sklearn, R, ELKI, Weka. These have code-reviews, discuss pull requests, and dozens of people look at the code, use it, try to find and fix bugs (but even there are errors in the code).






    share|improve this answer









    $endgroup$

















      1












      $begingroup$

      Instead of trying to find a place to download some source code, why don't you just implement, e.g., Silhouette yourself?



      Plenty of the code you find online in blogs and repos is broken.



      I've seen so many github repositories with bad code, and people like you wondering why it doesn't work. Relying on anonymous others to not have made mistakes is a bad idea. At some point you are better off writing the code yourself!



      Of course it is okay to rely on large open-source projects like sklearn, R, ELKI, Weka. These have code-reviews, discuss pull requests, and dozens of people look at the code, use it, try to find and fix bugs (but even there are errors in the code).






      share|improve this answer









      $endgroup$















        1












        1








        1





        $begingroup$

        Instead of trying to find a place to download some source code, why don't you just implement, e.g., Silhouette yourself?



        Plenty of the code you find online in blogs and repos is broken.



        I've seen so many github repositories with bad code, and people like you wondering why it doesn't work. Relying on anonymous others to not have made mistakes is a bad idea. At some point you are better off writing the code yourself!



        Of course it is okay to rely on large open-source projects like sklearn, R, ELKI, Weka. These have code-reviews, discuss pull requests, and dozens of people look at the code, use it, try to find and fix bugs (but even there are errors in the code).






        share|improve this answer









        $endgroup$



        Instead of trying to find a place to download some source code, why don't you just implement, e.g., Silhouette yourself?



        Plenty of the code you find online in blogs and repos is broken.



        I've seen so many github repositories with bad code, and people like you wondering why it doesn't work. Relying on anonymous others to not have made mistakes is a bad idea. At some point you are better off writing the code yourself!



        Of course it is okay to rely on large open-source projects like sklearn, R, ELKI, Weka. These have code-reviews, discuss pull requests, and dozens of people look at the code, use it, try to find and fix bugs (but even there are errors in the code).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered yesterday









        Anony-MousseAnony-Mousse

        4,975624




        4,975624




















            user2816215 is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            user2816215 is a new contributor. Be nice, and check out our Code of Conduct.












            user2816215 is a new contributor. Be nice, and check out our Code of Conduct.











            user2816215 is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47373%2fk-modes-optimal-k%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

            Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

            Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High