How to create new feature based on clustering resultWhere in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python
Query about absorption line spectra
Can I use my Chinese passport to enter China after I acquired another citizenship?
Why is so much work done on numerical verification of the Riemann Hypothesis?
Reply 'no position' while the job posting is still there
Store Credit Card Information in Password Manager?
Is there an efficient solution to the travelling salesman problem with binary edge weights?
On a tidally locked planet, would time be quantized?
How do you respond to a colleague from another team when they're wrongly expecting that you'll help them?
Does the expansion of the universe explain why the universe doesn't collapse?
Need a math help for the Cagan's model in macroeconomics
Is it improper etiquette to ask your opponent what his/her rating is before the game?
Should I stop contributing to retirement accounts?
Creepy dinosaur pc game identification
Is this toilet slogan correct usage of the English language?
By means of an example, show that P(A) + P(B) = 1 does not mean that B is the complement of A.
If an object with more mass experiences a greater gravitational force, why don't more massive objects fall faster?
Why does the Sun have different day lengths, but not the gas giants?
Flux received by a negative charge
When quoting, must I also copy hyphens used to divide words that continue on the next line?
How to explain what's wrong with this application of the chain rule?
The Staircase of Paint
How can "mimic phobia" be cured or prevented?
Can not upgrade Kali,not enough space in /var/cache/apt/archives
How do I nest cases?
How to create new feature based on clustering result
Where in the workflow should we deal with missing data?MovieLens data setCalculation and Visualization of Correlation Matrix with PandasClassification problem approach with PythonCreate top 10 index fund based on >100 stocksHow do I get a count of values based on custom bucket-ranges I create for a select column in dataframe?Improve results of a clusteringGrouping/clustering similar words pythonHow do I add a column to a Pandas dataframe based on other rows and columns in the dataframe?Clustering geodata into same size group with K-means in Python
$begingroup$
I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).
I have data like:
id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh
I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:
id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3
python pandas numpy
$endgroup$
add a comment |
$begingroup$
I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).
I have data like:
id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh
I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:
id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3
python pandas numpy
$endgroup$
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17
add a comment |
$begingroup$
I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).
I have data like:
id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh
I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:
id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3
python pandas numpy
$endgroup$
I'm still a beginner in machine learning and I want to know how to code this situation based on python and machine learning (clustering).
I have data like:
id Column1 duration(seconde) column3
1 aaa 20 bbb
2 ccc 01 ddd
3 eee 150 fff
4 ggg 25 hhh
I want to group my data according to the duration column value and create new column containing a category name based on duration cluster. I want to get this result:
id Column1 duration(seconde) column3 NewColCategorie
1 aaa 20 bbb Cat2
2 ccc 01 ddd Cat1
3 eee 150 fff Cat3
4 ggg 25 hhh Cat2
5 iii 175 jjj Cat3
python pandas numpy
python pandas numpy
edited Mar 20 at 9:39
Blenzus
446
446
asked Mar 13 at 19:20
NirmineNirmine
276
276
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17
add a comment |
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3
and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_
attribute of the trained model to classify every example. You can do this in the following way:
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
To create a new column based on category cluster you can simply add the kmeans.labels_
array as a column to your original dataframe:
>>> df['categories'] = kmeans.labels_
$endgroup$
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47264%2fhow-to-create-new-feature-based-on-clustering-result%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3
and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_
attribute of the trained model to classify every example. You can do this in the following way:
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
To create a new column based on category cluster you can simply add the kmeans.labels_
array as a column to your original dataframe:
>>> df['categories'] = kmeans.labels_
$endgroup$
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
add a comment |
$begingroup$
To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3
and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_
attribute of the trained model to classify every example. You can do this in the following way:
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
To create a new column based on category cluster you can simply add the kmeans.labels_
array as a column to your original dataframe:
>>> df['categories'] = kmeans.labels_
$endgroup$
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
add a comment |
$begingroup$
To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3
and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_
attribute of the trained model to classify every example. You can do this in the following way:
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
To create a new column based on category cluster you can simply add the kmeans.labels_
array as a column to your original dataframe:
>>> df['categories'] = kmeans.labels_
$endgroup$
To do clustering you can use sklearn's KMeans Clustering function - sklearn.cluster.KMeans with n_clusters=3
and other parameters as default. This will give you 3 clusters. After you have trained your model you can use the .labels_
attribute of the trained model to classify every example. You can do this in the following way:
>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
To create a new column based on category cluster you can simply add the kmeans.labels_
array as a column to your original dataframe:
>>> df['categories'] = kmeans.labels_
answered Mar 17 at 6:18
bkshibkshi
638111
638111
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
add a comment |
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
$begingroup$
Thanks you very much it is helpful
$endgroup$
– Nirmine
Mar 17 at 19:11
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47264%2fhow-to-create-new-feature-based-on-clustering-result%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Are you applying a clustering model or just making clusters based on specific range of values?
$endgroup$
– bkshi
Mar 14 at 4:46
$begingroup$
I want to apply clustering but i don't know how to programme it. with the number of centroids =3
$endgroup$
– Nirmine
Mar 14 at 10:17