Is there any clustering algorithm to find longest continuous subsequences?2019 Community Moderator ElectionMusic corpus sentence level clusteringHow does Elastic's Prelert (formerly Splunk Anomaly Detective App) work?Is Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Seeking Appropriate Clustering AlgorithmAppropriate Clustering AlgorithmClustering customer dataset to find customer patternsClustering with multiple distance measuresExtract Pattern using Short Text ProcessingHow does ML Clustering put to a practical real-world use?Is there an oriented clustering algorithm?
Are objects structures and/or vice versa?
I see my dog run
How can I fix this gap between bookcases I made?
A poker game description that does not feel gimmicky
Finding files for which a command fails
Where else does the Shulchan Aruch quote an authority by name?
Is it possible to make sharp wind that can cut stuff from afar?
Why doesn't a const reference extend the life of a temporary object passed via a function?
Why do UK politicians seemingly ignore opinion polls on Brexit?
Need help identifying/translating a plaque in Tangier, Morocco
If a centaur druid Wild Shapes into a Giant Elk, do their Charge features stack?
Does it makes sense to buy a new cycle to learn riding?
Calculate Levenshtein distance between two strings in Python
Where to refill my bottle in India?
Why airport relocation isn't done gradually?
Can a planet have a different gravitational pull depending on its location in orbit around its sun?
Unbreakable Formation vs. Cry of the Carnarium
Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?
Filling an area between two curves
What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?
How to make payment on the internet without leaving a money trail?
New order #4: World
Does the average primeness of natural numbers tend to zero?
Domain expired, GoDaddy holds it and is asking more money
Is there any clustering algorithm to find longest continuous subsequences?
2019 Community Moderator ElectionMusic corpus sentence level clusteringHow does Elastic's Prelert (formerly Splunk Anomaly Detective App) work?Is Clustering used in real world systems/products involving large amounts of data? How are the nuances taken care of?Seeking Appropriate Clustering AlgorithmAppropriate Clustering AlgorithmClustering customer dataset to find customer patternsClustering with multiple distance measuresExtract Pattern using Short Text ProcessingHow does ML Clustering put to a practical real-world use?Is there an oriented clustering algorithm?
$begingroup$
I have data which contains access duration of some items.
Example:
t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.
ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1
In the above example, groups ID=0,1
are what I want.
ID=3,4
aren't because their distance is short but they are not continuous.
I tried KMeans
and DBSCAN
, they all cluster ID=3,4
as one group and it makes sense. But it doesn't do what I want.
Is there any possible pre-processing of data to reach what I want ?
Or I should use other analytic tool?
machine-learning python clustering
$endgroup$
add a comment |
$begingroup$
I have data which contains access duration of some items.
Example:
t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.
ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1
In the above example, groups ID=0,1
are what I want.
ID=3,4
aren't because their distance is short but they are not continuous.
I tried KMeans
and DBSCAN
, they all cluster ID=3,4
as one group and it makes sense. But it doesn't do what I want.
Is there any possible pre-processing of data to reach what I want ?
Or I should use other analytic tool?
machine-learning python clustering
$endgroup$
add a comment |
$begingroup$
I have data which contains access duration of some items.
Example:
t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.
ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1
In the above example, groups ID=0,1
are what I want.
ID=3,4
aren't because their distance is short but they are not continuous.
I tried KMeans
and DBSCAN
, they all cluster ID=3,4
as one group and it makes sense. But it doesn't do what I want.
Is there any possible pre-processing of data to reach what I want ?
Or I should use other analytic tool?
machine-learning python clustering
$endgroup$
I have data which contains access duration of some items.
Example:
t0~t5 is the access time duration, 1 means the items was accessed in the time duration, 0 means it wasn't.
ID,t0,t1,t2,t3,t4
0,0,0,1,1,1
1,0,1,1,1,1
2,0,1,1,0,0
3,1,1,0,0,1
4,1,1,0,0,1
In the above example, groups ID=0,1
are what I want.
ID=3,4
aren't because their distance is short but they are not continuous.
I tried KMeans
and DBSCAN
, they all cluster ID=3,4
as one group and it makes sense. But it doesn't do what I want.
Is there any possible pre-processing of data to reach what I want ?
Or I should use other analytic tool?
machine-learning python clustering
machine-learning python clustering
edited Apr 1 at 7:53
code_worker
asked Mar 29 at 9:13
code_workercode_worker
186
186
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.
You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.
$endgroup$
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
add a comment |
$begingroup$
It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.
K-means minimizes the sum of squares. I don't see how that is beneficial here.
There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.
My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.
$endgroup$
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48201%2fis-there-any-clustering-algorithm-to-find-longest-continuous-subsequences%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.
You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.
$endgroup$
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
add a comment |
$begingroup$
What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.
You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.
$endgroup$
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
add a comment |
$begingroup$
What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.
You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.
$endgroup$
What might help is a custom distance computation as input to the clustering algorithm. These algorithms usually take Euclidean distance as a measure of dissimilarity.
You can try DBSCAN (in Python scikit-learn), with metric='precomputed' and 'X' as a custom distance matrix. You can construct this distance matrix to conform to your requirement. Eg: specify that nodes 3 and 4 have a large distance, even though they are equal.
answered Mar 29 at 17:34
raghuraghu
45633
45633
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
add a comment |
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
What does "custom distance matrix" mean? thx
$endgroup$
– code_worker
Mar 30 at 14:45
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
This is the matrix of pairwise distances that you will compute in the way that you want. Eg: 3 and 4 will have a large distance. You then pass this as input to the clustering algorithm.
$endgroup$
– raghu
Mar 31 at 4:16
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
$begingroup$
So I need to define the matrix myself. thx. I originally thought there's a common way to define it.
$endgroup$
– code_worker
Mar 31 at 8:26
add a comment |
$begingroup$
It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.
K-means minimizes the sum of squares. I don't see how that is beneficial here.
There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.
My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.
$endgroup$
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
add a comment |
$begingroup$
It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.
K-means minimizes the sum of squares. I don't see how that is beneficial here.
There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.
My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.
$endgroup$
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
add a comment |
$begingroup$
It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.
K-means minimizes the sum of squares. I don't see how that is beneficial here.
There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.
My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.
$endgroup$
It is probably worth to transform the data, such that each record is "continuous" (call it differently - for example "contiguous" because the term continuous has a widely known mathematical meaning), and if necessary make multiple copies.
K-means minimizes the sum of squares. I don't see how that is beneficial here.
There is generalized DBSCAN. You can define arbitrary neighbor predicates for it. For example, you could the define that neighbors (candidates for merging into the same cluster) must have a contiguous overlap of at least two active timepoints. Then consider whether this satisfies your notion of clusters because of the transitivity computed by DBSCAN.
My guess is that you'll rather want to e.g. extract all contiguous subsequences of a minimum length - say, 2 - of all records and simply count them to identify the most frequent subsequences. If you implement this with an efficient bit representation, then it will be very fast.
answered Mar 30 at 9:59
Anony-MousseAnony-Mousse
5,165625
5,165625
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
add a comment |
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
$begingroup$
Now I don't know how to transform data and cluster it to reach what I want.
$endgroup$
– code_worker
Apr 1 at 1:59
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48201%2fis-there-any-clustering-algorithm-to-find-longest-continuous-subsequences%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown