Clustering based on distance between points [on hold]Hierarchical Clustering customized Linkage functionclustering on multiple features and applying k-meansClustering documents - how to evaluate results?Customized function for Agglomerative ClusteringAgglomerative Hierarchial Clustering in python using DTW distancecalculate distance between each data point of a cluster to their respective cluster centroidsClustering with multiple distance measuresCluster algorithm which minimizes a distance while fulfilling a constraintClustering time series based on monotonic similarityLearning image embddings for clustering based on custom distance metric
How could a planet have erratic days?
sp_blitzCache against one stored procedure
Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?
Giving feedback to someone without sounding prejudiced
Doesn't the system of the Supreme Court oppose justice?
Does grappling negate Mirror Image?
Mimic lecturing on blackboard, facing audience
Why do Radio Buttons not fill the entire outer circle?
Why is so much work done on numerical verification of the Riemann Hypothesis?
In movies, why do people move so slowly in zero gravity?
Which was the first story featuring espers?
How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week
Make a Bowl of Alphabet Soup
Devil Fruit Question
Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?
Circuit Analysis: Obtaining Close Loop OP - AMP Transfer function
Is this toilet slogan correct usage of the English language?
Does Doodling or Improvising on the Piano Have Any Benefits?
Quoting Keynes in a lecture
What is going on with gets(stdin) on the site coderbyte?
Shouldn’t conservatives embrace universal basic income?
Stack Interview Code methods made from class Node and Smart Pointers
Pre-mixing cryogenic fuels and using only one fuel tank
Can I say "fingers" when referring to toes?
Clustering based on distance between points [on hold]
Hierarchical Clustering customized Linkage functionclustering on multiple features and applying k-meansClustering documents - how to evaluate results?Customized function for Agglomerative ClusteringAgglomerative Hierarchial Clustering in python using DTW distancecalculate distance between each data point of a cluster to their respective cluster centroidsClustering with multiple distance measuresCluster algorithm which minimizes a distance while fulfilling a constraintClustering time series based on monotonic similarityLearning image embddings for clustering based on custom distance metric
$begingroup$
I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.
from scipy.cluster.hierarchy import fclusterdata
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates
fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')
python clustering unsupervised-learning
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday
- This question does not appear to be about data science, within the scope defined in the help center.
add a comment |
$begingroup$
I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.
from scipy.cluster.hierarchy import fclusterdata
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates
fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')
python clustering unsupervised-learning
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday
- This question does not appear to be about data science, within the scope defined in the help center.
4
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago
add a comment |
$begingroup$
I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.
from scipy.cluster.hierarchy import fclusterdata
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates
fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')
python clustering unsupervised-learning
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I am trying to cluster geographical locations in such a way that all the locations inside each cluster are at max within 25 miles of each other. For this, I am using Agglomerative clustering. I am using a custom distance function to calculate the distances between each location. I do not want to specify the number of clusters. Instead, I want the model to cluster until all the locations within each cluster are within 25 miles of each other. I have tried doing this in both Scipy and Sklearn but haven't made any progress. Below is the approach that I have tried. It only gives me one cluster. Please help. Thanks in advance.
from scipy.cluster.hierarchy import fclusterdata
max_dist = 25
# dist is a custom function that calculates the distance (in miles) between two locations using the geographical coordinates
fclusterdata(locations_in_RI[['Latitude', 'Longitude']].values, t=max_dist, metric=dist, criterion='distance')
python clustering unsupervised-learning
python clustering unsupervised-learning
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 days ago
Karthik KatragaddaKarthik Katragadda
143
143
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Karthik Katragadda is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday
- This question does not appear to be about data science, within the scope defined in the help center.
put on hold as off-topic by Anony-Mousse, Ethan, Siong Thye Goh, Mark.F, tuomastik yesterday
- This question does not appear to be about data science, within the scope defined in the help center.
4
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago
add a comment |
4
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago
4
4
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm
# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()
# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')
The clusters is an array of cluster ids, which is what you want.
There is a very helpful (yet kinda long) post on HAC worth reading.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm
# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()
# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')
The clusters is an array of cluster ids, which is what you want.
There is a very helpful (yet kinda long) post on HAC worth reading.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm
# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()
# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')
The clusters is an array of cluster ids, which is what you want.
There is a very helpful (yet kinda long) post on HAC worth reading.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm
# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()
# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')
The clusters is an array of cluster ids, which is what you want.
There is a very helpful (yet kinda long) post on HAC worth reading.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I think for HAC (Hierachical Aglomeritive Clustering) it's always helpful to obtain the linkage matrix first which can give you some insight on how the clusters are formed iteratively. Besides that scipy also provides a dendrogram method for you to visualize the cluster formation, which can help you avoid treating the clustering process as a "black box".
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# generate the linkage matrix
X = locations_in_RI[['Latitude', 'Longitude']].values
Z = linkage(X,
method='complete', # dissimilarity metric: max distance across all pairs of
# records between two clusters
metric='euclidean'
) # you can peek into the Z matrix to see how clusters are
# merged at each iteration of the algorithm
# calculate full dendrogram and visualize it
plt.figure(figsize=(30, 10))
dendrogram(Z)
plt.show()
# retrive clusters with `max_d`
from scipy.cluster.hierarchy import fcluster
max_d = 25 # I assume that your `Latitude` and `Longitude` columns are both in
# units of miles
clusters = fcluster(Z, max_d, criterion='distance')
The clusters is an array of cluster ids, which is what you want.
There is a very helpful (yet kinda long) post on HAC worth reading.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 2 days ago
XiUpsilonXiUpsilon
261
261
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
XiUpsilon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
4
$begingroup$
Please don't cross-post duplicates! stats.stackexchange.com/q/398336/7828
$endgroup$
– Anony-Mousse
2 days ago
$begingroup$
If anyone is interested in theoretical solutions, one reference is Clark, Colbourn and Johnson's paper "Unit Disk Graphs" core.ac.uk/download/pdf/82543588.pdf
$endgroup$
– Valentas
2 days ago