How to create new feature based on clustering resultWhere in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python

Query about absorption line spectra

Can I use my Chinese passport to enter China after I acquired another citizenship?

Why is so much work done on numerical verification of the Riemann Hypothesis?

Reply 'no position' while the job posting is still there

Store Credit Card Information in Password Manager?

Is there an efficient solution to the travelling salesman problem with binary edge weights?

On a tidally locked planet, would time be quantized?

How do you respond to a colleague from another team when they're wrongly expecting that you'll help them?

Does the expansion of the universe explain why the universe doesn't collapse?

Need a math help for the Cagan's model in macroeconomics

Is it improper etiquette to ask your opponent what his/her rating is before the game?

Should I stop contributing to retirement accounts?

Creepy dinosaur pc game identification

Is this toilet slogan correct usage of the English language?

By means of an example, show that P(A) + P(B) = 1 does not mean that B is the complement of A.

If an object with more mass experiences a greater gravitational force, why don't more massive objects fall faster?

Why does the Sun have different day lengths, but not the gas giants?

Flux received by a negative charge

When quoting, must I also copy hyphens used to divide words that continue on the next line?

How to explain what's wrong with this application of the chain rule?

The Staircase of Paint

How can "mimic phobia" be cured or prevented?

Can not upgrade Kali,not enough space in /var/cache/apt/archives

How do I nest cases?

How to create new feature based on clustering result

Where in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python

I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).

I have data like:

id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh

I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:

id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2 
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46

$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17

add a comment |

I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).

I have data like:

id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh

I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:

id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2 
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46

$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17

add a comment |

I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).

I have data like:

id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh

I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:

id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2 
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).

I have data like:

id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh

I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:

id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2 
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3

python pandas numpy

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

edited Mar 20 at 9:39

Blenzus

446

edited Mar 20 at 9:39

Blenzus

446

edited Mar 20 at 9:39

Blenzus

446

asked Mar 13 at 19:20

Nirmine

276

asked Mar 13 at 19:20

Nirmine

276

asked Mar 13 at 19:20

Nirmine

276

$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46

$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17

add a comment |

$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46

$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17

Are you applying a clustering model or just making clusters based on specific range of values?

– bkshi
Mar 14 at 4:46

I want to apply clustering but i don't know how to programme it. with the number of centroids =3

– Nirmine
Mar 14 at 10:17

add a comment |

1 Answer
1

active

oldest

votes

To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)

To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:

>>> df['categories'] = kmeans.labels_

answered Mar 17 at 6:18

bkshi

638111

$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47264%2fhow-to-create-new-feature-based-on-clustering-result%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)

To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:

>>> df['categories'] = kmeans.labels_

answered Mar 17 at 6:18

bkshi

638111

$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

add a comment |

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)

To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:

>>> df['categories'] = kmeans.labels_

answered Mar 17 at 6:18

bkshi

638111

$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

add a comment |

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)

To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:

>>> df['categories'] = kmeans.labels_

answered Mar 17 at 6:18

bkshi

638111

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)

To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:

>>> df['categories'] = kmeans.labels_

answered Mar 17 at 6:18

bkshi

638111

answered Mar 17 at 6:18

bkshi

638111

answered Mar 17 at 6:18

bkshi

638111

answered Mar 17 at 6:18

bkshi

638111

$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

add a comment |

$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

Thanks you very much it is helpful

– Nirmine
Mar 17 at 19:11

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1