Are there algorithms for clustering objects with pairwise distances, without computing all pairwise distances?2019 Community Moderator ElectionClustering pair-wise distance datasetAlgorithms for text clusteringHow to deal with time series which change in seasonality or other patterns?R: Comparing dissimilarity between metabolic models with discrete wavelet transformationAgglomerative Clustering Stopping CriteriaWhat methods exist for distance calculation in clustering? when we should use each of them?Clustering objects defined by vectorWhat is stored in heap structure in the following example?Multidimensional Dynamic Time Warping Implementation in Python - confirm?Clustering with cosine similarity

Can I convert a rim brake wheel to a disc brake wheel?

What defines a dissertation?

At which point does a character regain all their Hit Dice?

Go Pregnant or Go Home

Everything Bob says is false. How does he get people to trust him?

Can I Retrieve Email Addresses from BCC?

How do I rename a LINUX host without needing to reboot for the rename to take effect?

How does it work when somebody invests in my business?

Can somebody explain Brexit in a few child-proof sentences?

Star/Wye electrical connection math symbol

Personal Teleportation as a Weapon

Is there any reason not to eat food that's been dropped on the surface of the moon?

If a character can use a +X magic weapon as a spellcasting focus, does it add the bonus to spell attacks or spell save DCs?

Print name if parameter passed to function

Why Were Madagascar and New Zealand Discovered So Late?

What would happen if the UK refused to take part in EU Parliamentary elections?

Minimal reference content

Cynical novel that describes an America ruled by the media, arms manufacturers, and ethnic figureheads

Bash method for viewing beginning and end of file

Valid Badminton Score?

Time travel short story where a man arrives in the late 19th century in a time machine and then sends the machine back into the past

Finding all intervals that match predicate in vector

Applicability of Single Responsibility Principle

Why does John Bercow say “unlock” after reading out the results of a vote?

Are there algorithms for clustering objects with pairwise distances, without computing all pairwise distances?

2019 Community Moderator ElectionClustering pair-wise distance datasetAlgorithms for text clusteringHow to deal with time series which change in seasonality or other patterns?R: Comparing dissimilarity between metabolic models with discrete wavelet transformationAgglomerative Clustering Stopping CriteriaWhat methods exist for distance calculation in clustering? when we should use each of them?Clustering objects defined by vectorWhat is stored in heap structure in the following example?Multidimensional Dynamic Time Warping Implementation in Python - confirm?Clustering with cosine similarity

I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.

Normally pairwise clustering is done like this: (see here)

Compute full distance matrix between all pairwise combination of objects

Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.

Here comes the however:

Computing the full distance matrix for all pairwise combination of objects is computationally very expensive. So my though was, whether there are some clustering algorithms that only do lookups on a subset of the pairwise distances, so it is not necessary to compute the full matrix?

I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?

Exited to hear your ideas, thanks!

asked Mar 8 at 16:34

Taizame

132

add a comment |

I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.

Normally pairwise clustering is done like this: (see here)

Compute full distance matrix between all pairwise combination of objects

Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.

Here comes the however:

I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?

Exited to hear your ideas, thanks!

asked Mar 8 at 16:34

Taizame

132

add a comment |

I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.

Normally pairwise clustering is done like this: (see here)

Compute full distance matrix between all pairwise combination of objects

Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.

Here comes the however:

I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?

Exited to hear your ideas, thanks!

asked Mar 8 at 16:34

Taizame

132

I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.

Normally pairwise clustering is done like this: (see here)

Compute full distance matrix between all pairwise combination of objects

Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.

Here comes the however:

I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?

Exited to hear your ideas, thanks!

clustering similarity graphs distance

asked Mar 8 at 16:34

Taizame

132

asked Mar 8 at 16:34

Taizame

132

asked Mar 8 at 16:34

Taizame

132

asked Mar 8 at 16:34

Taizame

132

asked Mar 8 at 16:34

Taizame

132

add a comment |

3 Answers
3

active

oldest

votes

Well, one may argue that DBSCAN is based on all pairwise distances, but it uses data indexing to avoid computing all of them using geometric bounds.

And there are other examples if you browse through literature.

For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.

And there are many more such techniques.

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

add a comment |

Quadtree can be used for this purpose.

enter image description here

This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'

http://wiki.gis.com/wiki/index.php/Quadtree

DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN

answered Mar 8 at 18:26

Shamit Verma

1,00929

add a comment |

You can use Locality Sensitive Hashing technique Wiki article

With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.

There is a great example with Python code for MinHash. What I get from the article is the bellow quote

In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.

MinHash explanation with Python code

answered Mar 8 at 19:46

Tasos

1,005731

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46950%2fare-there-algorithms-for-clustering-objects-with-pairwise-distances-without-com%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Well, one may argue that DBSCAN is based on all pairwise distances, but it uses data indexing to avoid computing all of them using geometric bounds.

And there are other examples if you browse through literature.

For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.

And there are many more such techniques.

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

add a comment |

Well, one may argue that DBSCAN is based on all pairwise distances, but it uses data indexing to avoid computing all of them using geometric bounds.

And there are other examples if you browse through literature.

For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.

And there are many more such techniques.

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

add a comment |

Well, one may argue that DBSCAN is based on all pairwise distances, but it uses data indexing to avoid computing all of them using geometric bounds.

And there are other examples if you browse through literature.

For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.

And there are many more such techniques.

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

Well, one may argue that DBSCAN is based on all pairwise distances, but it uses data indexing to avoid computing all of them using geometric bounds.

And there are other examples if you browse through literature.

For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.

And there are many more such techniques.

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

edited Mar 21 at 18:06

answered Mar 9 at 8:43

Anony-Mousse

5,030625

answered Mar 9 at 8:43

Anony-Mousse

5,030625

answered Mar 9 at 8:43

Anony-Mousse

5,030625

add a comment |

Quadtree can be used for this purpose.

enter image description here

This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'

http://wiki.gis.com/wiki/index.php/Quadtree

DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN

answered Mar 8 at 18:26

Shamit Verma

1,00929

add a comment |

Quadtree can be used for this purpose.

enter image description here

This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'

http://wiki.gis.com/wiki/index.php/Quadtree

DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN

answered Mar 8 at 18:26

Shamit Verma

1,00929

add a comment |

Quadtree can be used for this purpose.

enter image description here

This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'

http://wiki.gis.com/wiki/index.php/Quadtree

DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN

answered Mar 8 at 18:26

Shamit Verma

1,00929

Quadtree can be used for this purpose.

enter image description here

This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'

http://wiki.gis.com/wiki/index.php/Quadtree

DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN

answered Mar 8 at 18:26

Shamit Verma

1,00929

answered Mar 8 at 18:26

Shamit Verma

1,00929

answered Mar 8 at 18:26

Shamit Verma

1,00929

answered Mar 8 at 18:26

Shamit Verma

1,00929

add a comment |

You can use Locality Sensitive Hashing technique Wiki article

With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.

There is a great example with Python code for MinHash. What I get from the article is the bellow quote

In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.

MinHash explanation with Python code

answered Mar 8 at 19:46

Tasos

1,005731

add a comment |

You can use Locality Sensitive Hashing technique Wiki article

With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.

There is a great example with Python code for MinHash. What I get from the article is the bellow quote

In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.

MinHash explanation with Python code

answered Mar 8 at 19:46

Tasos

1,005731

add a comment |

You can use Locality Sensitive Hashing technique Wiki article

With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.

There is a great example with Python code for MinHash. What I get from the article is the bellow quote

In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.

MinHash explanation with Python code

answered Mar 8 at 19:46

Tasos

1,005731

You can use Locality Sensitive Hashing technique Wiki article

With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.

There is a great example with Python code for MinHash. What I get from the article is the bellow quote

In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.

MinHash explanation with Python code

answered Mar 8 at 19:46

Tasos

1,005731

answered Mar 8 at 19:46

Tasos

1,005731

answered Mar 8 at 19:46

Tasos

1,005731

answered Mar 8 at 19:46

Tasos

1,005731

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3