Randomstate and kmeans issues The Next CEO of Stack Overflow2019 Community Moderator ElectionCouple PCA plot and clusters to labelsKmeans: Between class intertiaQuick start using python and sklearn kmeans?Predicting contract churn/cancellation: Great model results does not work in the real worldConfused by kmeans resultshow to compare different sets of time series dataBinary classification, precision-recall curve and thresholdsHow to get data back into two separate audio files after successfully applying kmeans clustering on an audio file?Accuracy for Kmeans clusteringKmeans large dataset
Why is information "lost" when it got into a black hole?
Define command that accepts \ in arguments
Can Sneak Attack be used when hitting with an improvised weapon?
Pulling the principal components out of a DimensionReducerFunction?
Are the names of these months realistic?
What was Carter Burkes job for "the company" in "Aliens"?
What was the first Unix version to run on a microcomputer?
Expressing the idea of having a very busy time
What day is it again?
Is there an equivalent of cd - for cp or mv
Is dried pee considered dirt?
Can I calculate next year's exemptions based on this year's refund/amount owed?
How to count occurrences of text in a file?
IC has pull-down resistors on SMBus lines?
Is there such a thing as a proper verb, like a proper noun?
Help/tips for a first time writer?
Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis
Audio Conversion With ADS1243
Is a distribution that is normal, but highly skewed, considered Gaussian?
Can someone explain this formula for calculating Manhattan distance?
What is the difference between Statistical Mechanics and Quantum Mechanics
What difference does it make using sed with/without whitespaces?
what's the use of '% to gdp' type of variables?
Can you teleport closer to a creature you are Frightened of?
Randomstate and kmeans issues
The Next CEO of Stack Overflow2019 Community Moderator ElectionCouple PCA plot and clusters to labelsKmeans: Between class intertiaQuick start using python and sklearn kmeans?Predicting contract churn/cancellation: Great model results does not work in the real worldConfused by kmeans resultshow to compare different sets of time series dataBinary classification, precision-recall curve and thresholdsHow to get data back into two separate audio files after successfully applying kmeans clustering on an audio file?Accuracy for Kmeans clusteringKmeans large dataset
$begingroup$
I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.
Someone told me to use the parameter: randomstate to have a reproductility in my results. I did. I have the same clusters but still not the same label. Is it normal? Is there a way to get the same labels ?
below my code:
Test sur 5 clusters
# Data
X = df.iloc[:,1:]
myseed = 10
# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30)
# Fitting du modèle aux points
km = km.fit(X)
y_km = km.predict(X)
python k-means
$endgroup$
add a comment |
$begingroup$
I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.
Someone told me to use the parameter: randomstate to have a reproductility in my results. I did. I have the same clusters but still not the same label. Is it normal? Is there a way to get the same labels ?
below my code:
Test sur 5 clusters
# Data
X = df.iloc[:,1:]
myseed = 10
# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30)
# Fitting du modèle aux points
km = km.fit(X)
y_km = km.predict(X)
python k-means
$endgroup$
add a comment |
$begingroup$
I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.
Someone told me to use the parameter: randomstate to have a reproductility in my results. I did. I have the same clusters but still not the same label. Is it normal? Is there a way to get the same labels ?
below my code:
Test sur 5 clusters
# Data
X = df.iloc[:,1:]
myseed = 10
# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30)
# Fitting du modèle aux points
km = km.fit(X)
y_km = km.predict(X)
python k-means
$endgroup$
I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.
Someone told me to use the parameter: randomstate to have a reproductility in my results. I did. I have the same clusters but still not the same label. Is it normal? Is there a way to get the same labels ?
below my code:
Test sur 5 clusters
# Data
X = df.iloc[:,1:]
myseed = 10
# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30)
# Fitting du modèle aux points
km = km.fit(X)
y_km = km.predict(X)
python k-means
python k-means
asked Mar 23 at 11:57
Aurelie GiraudAurelie Giraud
112
112
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?
import numpy as np
import sklearn
from sklearn.cluster import KMeans
feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])
feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])
feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])
X = np.vstack([feature1, feature2, feature3]).T
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()
Output:
sklearn version 0.19.1
labels
[1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
6]
centers
[[0.12537286 0.08008719 0.14751347]
[0.07862348 0.10700498 0.14324586]
[0.05816043 0.1390434 0.05774016]
[0.03826417 0.16771717 0.13497945]
[0.16179372 0.10948558 0.07821981]
[0.13565386 0.17577117 0.05940923]
[0.10607841 0.15867572 0.15851362]
[0.03953882 0.06560014 0.14738586]
[0.17440804 0.126004 0.14779245]
[0.04972248 0.06661689 0.04531995]]
$endgroup$
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47842%2frandomstate-and-kmeans-issues%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?
import numpy as np
import sklearn
from sklearn.cluster import KMeans
feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])
feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])
feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])
X = np.vstack([feature1, feature2, feature3]).T
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()
Output:
sklearn version 0.19.1
labels
[1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
6]
centers
[[0.12537286 0.08008719 0.14751347]
[0.07862348 0.10700498 0.14324586]
[0.05816043 0.1390434 0.05774016]
[0.03826417 0.16771717 0.13497945]
[0.16179372 0.10948558 0.07821981]
[0.13565386 0.17577117 0.05940923]
[0.10607841 0.15867572 0.15851362]
[0.03953882 0.06560014 0.14738586]
[0.17440804 0.126004 0.14779245]
[0.04972248 0.06661689 0.04531995]]
$endgroup$
add a comment |
$begingroup$
The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?
import numpy as np
import sklearn
from sklearn.cluster import KMeans
feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])
feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])
feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])
X = np.vstack([feature1, feature2, feature3]).T
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()
Output:
sklearn version 0.19.1
labels
[1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
6]
centers
[[0.12537286 0.08008719 0.14751347]
[0.07862348 0.10700498 0.14324586]
[0.05816043 0.1390434 0.05774016]
[0.03826417 0.16771717 0.13497945]
[0.16179372 0.10948558 0.07821981]
[0.13565386 0.17577117 0.05940923]
[0.10607841 0.15867572 0.15851362]
[0.03953882 0.06560014 0.14738586]
[0.17440804 0.126004 0.14779245]
[0.04972248 0.06661689 0.04531995]]
$endgroup$
add a comment |
$begingroup$
The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?
import numpy as np
import sklearn
from sklearn.cluster import KMeans
feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])
feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])
feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])
X = np.vstack([feature1, feature2, feature3]).T
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()
Output:
sklearn version 0.19.1
labels
[1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
6]
centers
[[0.12537286 0.08008719 0.14751347]
[0.07862348 0.10700498 0.14324586]
[0.05816043 0.1390434 0.05774016]
[0.03826417 0.16771717 0.13497945]
[0.16179372 0.10948558 0.07821981]
[0.13565386 0.17577117 0.05940923]
[0.10607841 0.15867572 0.15851362]
[0.03953882 0.06560014 0.14738586]
[0.17440804 0.126004 0.14779245]
[0.04972248 0.06661689 0.04531995]]
$endgroup$
The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?
import numpy as np
import sklearn
from sklearn.cluster import KMeans
feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])
feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])
feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])
X = np.vstack([feature1, feature2, feature3]).T
kmeans = KMeans(n_clusters=10, random_state=0).fit(X)
print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()
Output:
sklearn version 0.19.1
labels
[1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
6]
centers
[[0.12537286 0.08008719 0.14751347]
[0.07862348 0.10700498 0.14324586]
[0.05816043 0.1390434 0.05774016]
[0.03826417 0.16771717 0.13497945]
[0.16179372 0.10948558 0.07821981]
[0.13565386 0.17577117 0.05940923]
[0.10607841 0.15867572 0.15851362]
[0.03953882 0.06560014 0.14738586]
[0.17440804 0.126004 0.14779245]
[0.04972248 0.06661689 0.04531995]]
edited Mar 23 at 13:49
answered Mar 23 at 12:16
EsmailianEsmailian
2,232218
2,232218
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47842%2frandomstate-and-kmeans-issues%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
