Is there any clustering algorithm to find longest continuous subsequences?2019 Community Moderator ElectionMusic corpus sentence level clusteringHow does Elastic's Prelert (formerly Splunk Anomaly Detective App) work?Is Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Seeking Appropriate Clustering AlgorithmAppropriate Clustering AlgorithmClustering customer dataset to find customer patternsClustering with multiple distance measuresExtract Pattern using Short Text ProcessingHow does ML Clustering put to a practical real-world use?Is there an oriented clustering algorithm?

Are objects structures and/or vice versa?

I see my dog run

How can I fix this gap between bookcases I made?

A poker game description that does not feel gimmicky

Finding files for which a command fails

Where else does the Shulchan Aruch quote an authority by name?

Is it possible to make sharp wind that can cut stuff from afar?

Why doesn't a const reference extend the life of a temporary object passed via a function?

Why do UK politicians seemingly ignore opinion polls on Brexit?

Need help identifying/translating a plaque in Tangier, Morocco

If a centaur druid Wild Shapes into a Giant Elk, do their Charge features stack?

Does it makes sense to buy a new cycle to learn riding?

Calculate Levenshtein distance between two strings in Python

Where to refill my bottle in India?

Why airport relocation isn't done gradually?

Can a planet have a different gravitational pull depending on its location in orbit around its sun?

Unbreakable Formation vs. Cry of the Carnarium

Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?

Filling an area between two curves

What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?

How to make payment on the internet without leaving a money trail?

New order #4: World

Does the average primeness of natural numbers tend to zero?

Domain expired, GoDaddy holds it and is asking more money

Is there any clustering algorithm to find longest continuous subsequences?

2019 Community Moderator ElectionMusic corpus sentence level clusteringHow does Elastic's Prelert (formerly Splunk Anomaly Detective App) work?Is Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Seeking Appropriate Clustering AlgorithmAppropriate Clustering AlgorithmClustering customer dataset to find customer patternsClustering with multiple distance measuresExtract Pattern using Short Text ProcessingHow does ML Clustering put to a practical real-world use?Is there an oriented clustering algorithm?

I have data which contains access duration of some items.

Example:

t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.

ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1

In the above example, groups ID=0,1 are what I want.

ID=3,4 aren't because their distance is short but they are not continuous.

I tried KMeans and DBSCAN, they all cluster ID=3,4 as one group and it makes sense. But it doesn't do what I want.

Is there any possible pre-processing of data to reach what I want ?

Or I should use other analytic tool?

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

add a comment |

I have data which contains access duration of some items.

Example:

t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.

ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1

In the above example, groups ID=0,1 are what I want.

ID=3,4 aren't because their distance is short but they are not continuous.

I tried KMeans and DBSCAN, they all cluster ID=3,4 as one group and it makes sense. But it doesn't do what I want.

Is there any possible pre-processing of data to reach what I want ?

Or I should use other analytic tool?

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

add a comment |

I have data which contains access duration of some items.

Example:

t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.

ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1

In the above example, groups ID=0,1 are what I want.

ID=3,4 aren't because their distance is short but they are not continuous.

I tried KMeans and DBSCAN, they all cluster ID=3,4 as one group and it makes sense. But it doesn't do what I want.

Is there any possible pre-processing of data to reach what I want ?

Or I should use other analytic tool?

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

I have data which contains access duration of some items.

Example:

t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.

ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1

In the above example, groups ID=0,1 are what I want.

ID=3,4 aren't because their distance is short but they are not continuous.

I tried KMeans and DBSCAN, they all cluster ID=3,4 as one group and it makes sense. But it doesn't do what I want.

Is there any possible pre-processing of data to reach what I want ?

Or I should use other analytic tool?

machine-learning python clustering

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

edited Apr 1 at 7:53

asked Mar 29 at 9:13

code_worker

186

asked Mar 29 at 9:13

code_worker

186

asked Mar 29 at 9:13

code_worker

186

add a comment |

2 Answers
2

active

oldest

votes

What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.

You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.

answered Mar 29 at 17:34

raghu

45633

$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45

$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16

$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26

add a comment |

It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.

K-means minimizes the sum of squares. I don't see how that is beneficial here.

There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.

My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.

answered Mar 30 at 9:59

Anony-Mousse

5,165625

$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48201%2fis-there-any-clustering-algorithm-to-find-longest-continuous-subsequences%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.

answered Mar 29 at 17:34

raghu

45633

$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45

$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16

$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26

add a comment |

What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.

answered Mar 29 at 17:34

raghu

45633

$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45

$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16

$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26

add a comment |

What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.

answered Mar 29 at 17:34

raghu

45633

What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.

answered Mar 29 at 17:34

raghu

45633

answered Mar 29 at 17:34

raghu

45633

answered Mar 29 at 17:34

raghu

45633

answered Mar 29 at 17:34

raghu

45633

$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45

$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16

$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26

add a comment |

$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45

$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16

$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26

What does "custom distance matrix" mean? thx

– code_worker
Mar 30 at 14:45

This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.

– raghu
Mar 31 at 4:16

So I need to define the matrix myself. thx. I originally thought there's a common way to define it.

– code_worker
Mar 31 at 8:26

add a comment |

K-means minimizes the sum of squares. I don't see how that is beneficial here.

answered Mar 30 at 9:59

Anony-Mousse

5,165625

$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59

add a comment |

K-means minimizes the sum of squares. I don't see how that is beneficial here.

answered Mar 30 at 9:59

Anony-Mousse

5,165625

$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59

add a comment |

K-means minimizes the sum of squares. I don't see how that is beneficial here.

answered Mar 30 at 9:59

Anony-Mousse

5,165625

K-means minimizes the sum of squares. I don't see how that is beneficial here.

answered Mar 30 at 9:59

Anony-Mousse

5,165625

answered Mar 30 at 9:59

Anony-Mousse

5,165625

answered Mar 30 at 9:59

Anony-Mousse

5,165625

answered Mar 30 at 9:59

Anony-Mousse

5,165625

$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59

add a comment |

$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59

Now I don't know how to transform data and cluster it to reach what I want.

– code_worker
Apr 1 at 1:59

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

dOAvONo8ZJPz8Zt2Vi323gWG

搜尋此網誌

Trjtdtk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2