Clustering based on distance between points [on hold]Hierarchical Clustering customized Linkage functionclustering on multiple features and applying k-meansClustering documents - how to evaluate results?Customized function for Agglomerative ClusteringAgglomerative Hierarchial Clustering in python using DTW distancecalculate distance between each data point of a cluster to their respective cluster centroidsClustering with multiple distance measuresCluster algorithm which minimizes a distance while fulfilling a constraintClustering time series based on monotonic similarityLearning image embddings for clustering based on custom distance metric

How could a planet have erratic days?

sp_blitzCache against one stored procedure

Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?

Giving feedback to someone without sounding prejudiced

Doesn't the system of the Supreme Court oppose justice?

Does grappling negate Mirror Image?

Mimic lecturing on blackboard, facing audience

Why do Radio Buttons not fill the entire outer circle?

Why is so much work done on numerical verification of the Riemann Hypothesis?

In movies, why do people move so slowly in zero gravity?

Which was the first story featuring espers?

How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week

Make a Bowl of Alphabet Soup

Devil Fruit Question

Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?

Circuit Analysis: Obtaining Close Loop OP - AMP Transfer function

Is this toilet slogan correct usage of the English language?

Does Doodling or Improvising on the Piano Have Any Benefits?

Quoting Keynes in a lecture

What is going on with gets(stdin) on the site coderbyte?

Shouldn’t conservatives embrace universal basic income?

Stack Interview Code methods made from class Node and Smart Pointers

Pre-mixing cryogenic fuels and using only one fuel tank

Can I say "fingers" when referring to toes?



Clustering based on distance between points [on hold]


Hierarchical Clustering customized Linkage functionclustering on multiple features and applying k-meansClustering documents - how to evaluate results?Customized function for Agglomerative ClusteringAgglomerative Hierarchial Clustering in python using DTW distancecalculate distance between each data point of a cluster to their respective cluster centroidsClustering with multiple distance measuresCluster algorithm which minimizes a distance while fulfilling a constraintClustering time series based on monotonic similarityLearning image embddings for clustering based on custom distance metric













1












$begingroup$


I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.



from scipy.cluster.hierarchy import fclusterdata 
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates

fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')









share|improve this question







New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$



put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday



  • This question does not appear to be about data science, within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.











  • 4




    $begingroup$
    Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
    $endgroup$
    – Anony-Mousse
    2 days ago










  • $begingroup$
    If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
    $endgroup$
    – Valentas
    2 days ago















1












$begingroup$


I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.



from scipy.cluster.hierarchy import fclusterdata 
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates

fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')









share|improve this question







New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$



put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday



  • This question does not appear to be about data science, within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.











  • 4




    $begingroup$
    Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
    $endgroup$
    – Anony-Mousse
    2 days ago










  • $begingroup$
    If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
    $endgroup$
    – Valentas
    2 days ago













1












1








1





$begingroup$


I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.



from scipy.cluster.hierarchy import fclusterdata 
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates

fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')









share|improve this question







New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.



from scipy.cluster.hierarchy import fclusterdata 
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates

fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')






python clustering unsupervised-learning






share|improve this question







New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Karthik KatragaddaKarthik Katragadda

143




143




New contributor




Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday



  • This question does not appear to be about data science, within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.







put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday



  • This question does not appear to be about data science, within the scope defined in the help center.
If this question can be reworded to fit the rules in the help center, please edit the question.







  • 4




    $begingroup$
    Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
    $endgroup$
    – Anony-Mousse
    2 days ago










  • $begingroup$
    If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
    $endgroup$
    – Valentas
    2 days ago












  • 4




    $begingroup$
    Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
    $endgroup$
    – Anony-Mousse
    2 days ago










  • $begingroup$
    If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
    $endgroup$
    – Valentas
    2 days ago







4




4




$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago




$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago












$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago




$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago










1 Answer
1






active

oldest

votes


















1












$begingroup$

I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".



import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm

# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()

# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')


The clusters is an array of cluster ids, which is what you want.



There is a very helpful (yet kinda long) post on HAC worth reading.






share|improve this answer








New contributor




XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$



















    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".



    import matplotlib.pyplot as plt
    from scipy.cluster.hierarchy import dendrogram, linkage

    # generate the linkage matrix
    X = locations_in_RI[['Latitude', 'Longitude']].values
    Z = linkage(X,
    method='complete', # dissimilarity metric: max distance across all pairs of
    # records between two clusters
    metric='euclidean'
    ) # you can peek into the Z matrix to see how clusters are
    # merged at each iteration of the algorithm

    # calculate full dendrogram and visualize it
    plt.figure(figsize=(30, 10))
    dendrogram(Z)
    plt.show()

    # retrive clusters with `max_d`
    from scipy.cluster.hierarchy import fcluster
    max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
    # units of miles
    clusters = fcluster(Z, max_d, criterion='distance')


    The clusters is an array of cluster ids, which is what you want.



    There is a very helpful (yet kinda long) post on HAC worth reading.






    share|improve this answer








    New contributor




    XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$

















      1












      $begingroup$

      I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".



      import matplotlib.pyplot as plt
      from scipy.cluster.hierarchy import dendrogram, linkage

      # generate the linkage matrix
      X = locations_in_RI[['Latitude', 'Longitude']].values
      Z = linkage(X,
      method='complete', # dissimilarity metric: max distance across all pairs of
      # records between two clusters
      metric='euclidean'
      ) # you can peek into the Z matrix to see how clusters are
      # merged at each iteration of the algorithm

      # calculate full dendrogram and visualize it
      plt.figure(figsize=(30, 10))
      dendrogram(Z)
      plt.show()

      # retrive clusters with `max_d`
      from scipy.cluster.hierarchy import fcluster
      max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
      # units of miles
      clusters = fcluster(Z, max_d, criterion='distance')


      The clusters is an array of cluster ids, which is what you want.



      There is a very helpful (yet kinda long) post on HAC worth reading.






      share|improve this answer








      New contributor




      XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$















        1












        1








        1





        $begingroup$

        I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".



        import matplotlib.pyplot as plt
        from scipy.cluster.hierarchy import dendrogram, linkage

        # generate the linkage matrix
        X = locations_in_RI[['Latitude', 'Longitude']].values
        Z = linkage(X,
        method='complete', # dissimilarity metric: max distance across all pairs of
        # records between two clusters
        metric='euclidean'
        ) # you can peek into the Z matrix to see how clusters are
        # merged at each iteration of the algorithm

        # calculate full dendrogram and visualize it
        plt.figure(figsize=(30, 10))
        dendrogram(Z)
        plt.show()

        # retrive clusters with `max_d`
        from scipy.cluster.hierarchy import fcluster
        max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
        # units of miles
        clusters = fcluster(Z, max_d, criterion='distance')


        The clusters is an array of cluster ids, which is what you want.



        There is a very helpful (yet kinda long) post on HAC worth reading.






        share|improve this answer








        New contributor




        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$



        I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".



        import matplotlib.pyplot as plt
        from scipy.cluster.hierarchy import dendrogram, linkage

        # generate the linkage matrix
        X = locations_in_RI[['Latitude', 'Longitude']].values
        Z = linkage(X,
        method='complete', # dissimilarity metric: max distance across all pairs of
        # records between two clusters
        metric='euclidean'
        ) # you can peek into the Z matrix to see how clusters are
        # merged at each iteration of the algorithm

        # calculate full dendrogram and visualize it
        plt.figure(figsize=(30, 10))
        dendrogram(Z)
        plt.show()

        # retrive clusters with `max_d`
        from scipy.cluster.hierarchy import fcluster
        max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
        # units of miles
        clusters = fcluster(Z, max_d, criterion='distance')


        The clusters is an array of cluster ids, which is what you want.



        There is a very helpful (yet kinda long) post on HAC worth reading.







        share|improve this answer








        New contributor




        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 2 days ago









        XiUpsilonXiUpsilon

        261




        261




        New contributor




        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.