Randomstate and kmeans issues The Next CEO of Stack Overflow2019 Community Moderator ElectionCouple PCA plot and clusters to labelsKmeans: Between class intertiaQuick start using python and sklearn kmeans?Predicting contract churn/cancellation: Great model results does not work in the real worldConfused by kmeans resultshow to compare different sets of time series dataBinary classification, precision-recall curve and thresholdsHow to get data back into two separate audio files after successfully applying kmeans clustering on an audio file?Accuracy for Kmeans clusteringKmeans large dataset

Why is information "lost" when it got into a black hole?

Define command that accepts \ in arguments

Can Sneak Attack be used when hitting with an improvised weapon?

Pulling the principal components out of a DimensionReducerFunction?

Are the names of these months realistic?

What was Carter Burkes job for "the company" in "Aliens"?

What was the first Unix version to run on a microcomputer?

Expressing the idea of having a very busy time

What day is it again?

Is there an equivalent of cd - for cp or mv

Is dried pee considered dirt?

Can I calculate next year's exemptions based on this year's refund/amount owed?

How to count occurrences of text in a file?

IC has pull-down resistors on SMBus lines?

Is there such a thing as a proper verb, like a proper noun?

Help/tips for a first time writer?

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Audio Conversion With ADS1243

Is a distribution that is normal, but highly skewed, considered Gaussian?

Can someone explain this formula for calculating Manhattan distance?

What is the difference between Statistical Mechanics and Quantum Mechanics

What difference does it make using sed with/without whitespaces?

what's the use of '% to gdp' type of variables?

Can you teleport closer to a creature you are Frightened of?

Randomstate and kmeans issues

The Next CEO of Stack Overflow

2019 Community Moderator ElectionCouple PCA plot and clusters to labelsKmeans: Between class intertiaQuick start using python and sklearn kmeans?Predicting contract churn/cancellation: Great model results does not work in the real worldConfused by kmeans resultshow to compare different sets of time series dataBinary classification, precision-recall curve and thresholdsHow to get data back into two separate audio files after successfully applying kmeans clustering on an audio file?Accuracy for Kmeans clusteringKmeans large dataset

I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.

Someone told me to use the parameter: randomstate to have a reproductility in my results. I did. I have the same clusters but still not the same label. Is it normal? Is there a way to get the same labels ?

below my code:

Test sur 5 clusters

# Data
X = df.iloc[:,1:]
myseed = 10

# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30) 

# Fitting du modèle aux points 
km = km.fit(X)
y_km = km.predict(X)

asked Mar 23 at 11:57

Aurelie Giraud

112

add a comment |

I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.

below my code:

Test sur 5 clusters

# Data
X = df.iloc[:,1:]
myseed = 10

# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30) 

# Fitting du modèle aux points 
km = km.fit(X)
y_km = km.predict(X)

asked Mar 23 at 11:57

Aurelie Giraud

112

add a comment |

I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.

below my code:

Test sur 5 clusters

# Data
X = df.iloc[:,1:]
myseed = 10

# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30) 

# Fitting du modèle aux points 
km = km.fit(X)
y_km = km.predict(X)

asked Mar 23 at 11:57

Aurelie Giraud

112

I try to cluster a dataframe of 227 rows in 5 clusters using kmeans algorithm. Each time I run my code I got different labels and different clusters which make my analysis afterwards a bit tricky.

below my code:

Test sur 5 clusters

# Data
X = df.iloc[:,1:]
myseed = 10

# Modèle kmeans à 5 clusters
km = KMeans(n_clusters=5, random_state=myseed, n_init=30) 

# Fitting du modèle aux points 
km = km.fit(X)
y_km = km.predict(X)

python k-means

asked Mar 23 at 11:57

Aurelie Giraud

112

asked Mar 23 at 11:57

Aurelie Giraud

112

asked Mar 23 at 11:57

Aurelie Giraud

112

asked Mar 23 at 11:57

Aurelie Giraud

112

asked Mar 23 at 11:57

Aurelie Giraud

112

add a comment |

1 Answer
1

active

oldest

votes

The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?

import numpy as np
import sklearn
from sklearn.cluster import KMeans

feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])

feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])

feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])

X = np.vstack([feature1, feature2, feature3]).T

kmeans = KMeans(n_clusters=10, random_state=0).fit(X)

print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()

Output:

sklearn version 0.19.1
labels
 [1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
 6]
centers
 [[0.12537286 0.08008719 0.14751347]
 [0.07862348 0.10700498 0.14324586]
 [0.05816043 0.1390434 0.05774016]
 [0.03826417 0.16771717 0.13497945]
 [0.16179372 0.10948558 0.07821981]
 [0.13565386 0.17577117 0.05940923]
 [0.10607841 0.15867572 0.15851362]
 [0.03953882 0.06560014 0.14738586]
 [0.17440804 0.126004 0.14779245]
 [0.04972248 0.06661689 0.04531995]]

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47842%2frandomstate-and-kmeans-issues%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?

import numpy as np
import sklearn
from sklearn.cluster import KMeans

feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])

feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])

feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])

X = np.vstack([feature1, feature2, feature3]).T

kmeans = KMeans(n_clusters=10, random_state=0).fit(X)

print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()

Output:

sklearn version 0.19.1
labels
 [1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
 6]
centers
 [[0.12537286 0.08008719 0.14751347]
 [0.07862348 0.10700498 0.14324586]
 [0.05816043 0.1390434 0.05774016]
 [0.03826417 0.16771717 0.13497945]
 [0.16179372 0.10948558 0.07821981]
 [0.13565386 0.17577117 0.05940923]
 [0.10607841 0.15867572 0.15851362]
 [0.03953882 0.06560014 0.14738586]
 [0.17440804 0.126004 0.14779245]
 [0.04972248 0.06661689 0.04531995]]

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

add a comment |

The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?

import numpy as np
import sklearn
from sklearn.cluster import KMeans

feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])

feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])

feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])

X = np.vstack([feature1, feature2, feature3]).T

kmeans = KMeans(n_clusters=10, random_state=0).fit(X)

print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()

Output:

sklearn version 0.19.1
labels
 [1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
 6]
centers
 [[0.12537286 0.08008719 0.14751347]
 [0.07862348 0.10700498 0.14324586]
 [0.05816043 0.1390434 0.05774016]
 [0.03826417 0.16771717 0.13497945]
 [0.16179372 0.10948558 0.07821981]
 [0.13565386 0.17577117 0.05940923]
 [0.10607841 0.15867572 0.15851362]
 [0.03953882 0.06560014 0.14738586]
 [0.17440804 0.126004 0.14779245]
 [0.04972248 0.06661689 0.04531995]]

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

add a comment |

The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?

import numpy as np
import sklearn
from sklearn.cluster import KMeans

feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])

feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])

feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])

X = np.vstack([feature1, feature2, feature3]).T

kmeans = KMeans(n_clusters=10, random_state=0).fit(X)

print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()

Output:

sklearn version 0.19.1
labels
 [1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
 6]
centers
 [[0.12537286 0.08008719 0.14751347]
 [0.07862348 0.10700498 0.14324586]
 [0.05816043 0.1390434 0.05774016]
 [0.03826417 0.16771717 0.13497945]
 [0.16179372 0.10948558 0.07821981]
 [0.13565386 0.17577117 0.05940923]
 [0.10607841 0.15867572 0.15851362]
 [0.03953882 0.06560014 0.14738586]
 [0.17440804 0.126004 0.14779245]
 [0.04972248 0.06661689 0.04531995]]

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

The code below produces the same labels and centers on multiple runs. Is the order and value of rows exactly the same?

import numpy as np
import sklearn
from sklearn.cluster import KMeans

feature1 = np.array([0.06899715, 0.06241017, 0.05136961, 0.08888344, 0.02369817, 0.05132511
, 0.07644885, 0.05571872, 0.1181635, 0.11287314, 0.15657083, 0.02658089
, 0.09810791, 0.16733219, 0.0374563, 0.08576906, 0.09522029, 0.04036745
, 0.1771768, 0.02325055, 0.13287777, 0.17448146, 0.07643926, 0.11694316
, 0.05478085, 0.17871513, 0.12706873, 0.13088636, 0.04807535, 0.15287181
, 0.05939004, 0.11667131, 0.15096193, 0.08683943, 0.02983505, 0.16516065
, 0.13741847, 0.08085856])

feature2 = np.array([0.10912874,0.18179051,0.06677442,0.11514302,0.13528425,0.05294313
,0.104772,0.12043084,0.08678998,0.13244747,0.11542028,0.18976266
,0.09423382,0.1131851,0.08747229,0.11630518,0.13750788,0.16403124
,0.16001422,0.15831517,0.16077575,0.12676131,0.08902124,0.16560226
,0.12596398,0.10481269,0.07881513,0.07465646,0.06645936,0.15950977
,0.13438658,0.18380235,0.07926124,0.18421547,0.05638499,0.11649947
,0.18400138,0.15033764])

feature3 = np.array([0.14816871, 0.1242456, 0.05020879, 0.12977452, 0.11865668, 0.1240002
, 0.16643243, 0.14401847, 0.17220796, 0.1708265, 0.04874987, 0.13442849
, 0.1375112, 0.15013606, 0.16671397, 0.13733997, 0.0516441, 0.16258701
, 0.13466661, 0.05516904, 0.14082673, 0.10032826, 0.13947572, 0.16405601
, 0.04752982, 0.15857467, 0.11730741, 0.15302504, 0.0404311, 0.03593672
, 0.07661769, 0.07276992, 0.08319156, 0.14247431, 0.1514434, 0.08060953
, 0.06952104, 0.17438457])

X = np.vstack([feature1, feature2, feature3]).T

kmeans = KMeans(n_clusters=10, random_state=0).fit(X)

print('sklearn version', sklearn.__version__)
print('labelsn', kmeans.labels_)
print('centersn', kmeans.cluster_centers_)
exit()

Output:

sklearn version 0.19.1
labels
 [1 3 9 1 3 7 1 1 0 6 4 3 1 8 7 1 2 3 8 2 6 4 1 6 2 8 0 0 9 5 2 5 4 6 7 4 5
 6]
centers
 [[0.12537286 0.08008719 0.14751347]
 [0.07862348 0.10700498 0.14324586]
 [0.05816043 0.1390434 0.05774016]
 [0.03826417 0.16771717 0.13497945]
 [0.16179372 0.10948558 0.07821981]
 [0.13565386 0.17577117 0.05940923]
 [0.10607841 0.15867572 0.15851362]
 [0.03953882 0.06560014 0.14738586]
 [0.17440804 0.126004 0.14779245]
 [0.04972248 0.06661689 0.04531995]]

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

edited Mar 23 at 13:49

answered Mar 23 at 12:16

Esmailian

2,232218

answered Mar 23 at 12:16

Esmailian

2,232218

answered Mar 23 at 12:16

Esmailian

2,232218

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

Test sur 5 clusters

Test sur 5 clusters

Test sur 5 clusters

Test sur 5 clusters

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Test sur 5 clusters

Test sur 5 clusters

Test sur 5 clusters

Test sur 5 clusters

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1