How to create new feature based on clustering resultWhere in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python

Query about absorption line spectra

Can I use my Chinese passport to enter China after I acquired another citizenship?

Why is so much work done on numerical verification of the Riemann Hypothesis?

Reply 'no position' while the job posting is still there

Store Credit Card Information in Password Manager?

Is there an efficient solution to the travelling salesman problem with binary edge weights?

On a tidally locked planet, would time be quantized?

How do you respond to a colleague from another team when they're wrongly expecting that you'll help them?

Does the expansion of the universe explain why the universe doesn't collapse?

Need a math help for the Cagan's model in macroeconomics

Is it improper etiquette to ask your opponent what his/her rating is before the game?

Should I stop contributing to retirement accounts?

Creepy dinosaur pc game identification

Is this toilet slogan correct usage of the English language?

By means of an example, show that P(A) + P(B) = 1 does not mean that B is the complement of A.

If an object with more mass experiences a greater gravitational force, why don't more massive objects fall faster?

Why does the Sun have different day lengths, but not the gas giants?

Flux received by a negative charge

When quoting, must I also copy hyphens used to divide words that continue on the next line?

How to explain what's wrong with this application of the chain rule?

The Staircase of Paint

How can "mimic phobia" be cured or prevented?

Can not upgrade Kali,not enough space in /var/cache/apt/archives

How do I nest cases?



How to create new feature based on clustering result


Where in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python













0












$begingroup$


I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).



I have data like:



id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh


I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:



id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3









share|improve this question











$endgroup$











  • $begingroup$
    Are you applying a clustering model or just making clusters based on specific range of values?
    $endgroup$
    – bkshi
    Mar 14 at 4:46










  • $begingroup$
    I want to apply clustering but i don't know how to programme it. with the number of centroids =3
    $endgroup$
    – Nirmine
    Mar 14 at 10:17
















0












$begingroup$


I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).



I have data like:



id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh


I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:



id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3









share|improve this question











$endgroup$











  • $begingroup$
    Are you applying a clustering model or just making clusters based on specific range of values?
    $endgroup$
    – bkshi
    Mar 14 at 4:46










  • $begingroup$
    I want to apply clustering but i don't know how to programme it. with the number of centroids =3
    $endgroup$
    – Nirmine
    Mar 14 at 10:17














0












0








0





$begingroup$


I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).



I have data like:



id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh


I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:



id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3









share|improve this question











$endgroup$




I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).



I have data like:



id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh


I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:



id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3






python pandas numpy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 20 at 9:39









Blenzus

446




446










asked Mar 13 at 19:20









NirmineNirmine

276




276











  • $begingroup$
    Are you applying a clustering model or just making clusters based on specific range of values?
    $endgroup$
    – bkshi
    Mar 14 at 4:46










  • $begingroup$
    I want to apply clustering but i don't know how to programme it. with the number of centroids =3
    $endgroup$
    – Nirmine
    Mar 14 at 10:17

















  • $begingroup$
    Are you applying a clustering model or just making clusters based on specific range of values?
    $endgroup$
    – bkshi
    Mar 14 at 4:46










  • $begingroup$
    I want to apply clustering but i don't know how to programme it. with the number of centroids =3
    $endgroup$
    – Nirmine
    Mar 14 at 10:17
















$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46




$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46












$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17





$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17











1 Answer
1






active

oldest

votes


















2












$begingroup$

To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:



>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)


To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:



>>> df['categories'] = kmeans.labels_





share|improve this answer









$endgroup$












  • $begingroup$
    Thanks you very much it is helpful
    $endgroup$
    – Nirmine
    Mar 17 at 19:11










Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47264%2fhow-to-create-new-feature-based-on-clustering-result%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:



>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)


To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:



>>> df['categories'] = kmeans.labels_





share|improve this answer









$endgroup$












  • $begingroup$
    Thanks you very much it is helpful
    $endgroup$
    – Nirmine
    Mar 17 at 19:11















2












$begingroup$

To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:



>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)


To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:



>>> df['categories'] = kmeans.labels_





share|improve this answer









$endgroup$












  • $begingroup$
    Thanks you very much it is helpful
    $endgroup$
    – Nirmine
    Mar 17 at 19:11













2












2








2





$begingroup$

To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:



>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)


To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:



>>> df['categories'] = kmeans.labels_





share|improve this answer









$endgroup$



To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3 and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_ attribute of the trained model to classify every example. You can do this in the following way:



>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)


To create a new column based on category cluster you can simply add the kmeans.labels_ array as a column to your original dataframe:



>>> df['categories'] = kmeans.labels_






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 17 at 6:18









bkshibkshi

638111




638111











  • $begingroup$
    Thanks you very much it is helpful
    $endgroup$
    – Nirmine
    Mar 17 at 19:11
















  • $begingroup$
    Thanks you very much it is helpful
    $endgroup$
    – Nirmine
    Mar 17 at 19:11















$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11




$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47264%2fhow-to-create-new-feature-based-on-clustering-result%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High