Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R

Why is black pepper both grey and black?

When is phishing education going too far?

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

What do you call a phrase that's not an idiom yet?

What does '1 unit of lemon juice' mean in a grandma's drink recipe?

How can players work together to take actions that are otherwise impossible?

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

What is the longest distance a 13th-level monk can jump while attacking on the same turn?

The logistics of corpse disposal

What are the motives behind Cersei's orders given to Bronn?

When to stop saving and start investing?

Determinant is linear as a function of each of the rows of the matrix.

Super Attribute Position on Product Page Magento 1

Why was the term "discrete" used in discrete logarithm?

Why constant symbols in a language?

Why does Python start at index -1 (as opposed to 0) when indexing a list from the end?

ListPlot join points by nearest neighbor rather than order

Is the address of a local variable a constexpr?

Bonus calculation: Am I making a mountain out of a molehill?

Can an alien society believe that their star system is the universe?

Dating a Former Employee

How widely used is the term Treppenwitz? Is it something that most Germans know?

How to recreate this effect in Photoshop?

Antler Helmet: Can it work?

Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R

I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.

Here is what I have thus far:

# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)

output

> names(A)
 [1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc" 
> # view the data
> head(A)
 wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)

Output

 > summary(C)
 Call:
 polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

 Coefficients:
 Value Std. Error t value
 num_child 0.5272 0.02618 20.140
 cmc2 -0.1603 0.12776 -1.254
 cmc3 -1.0119 0.11673 -8.669

 Intercepts:
 Value Std. Error t value
 23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
 30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
 37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

 Residual Deviance: 3446.081 
 AIC: 3458.081 
 > predict(C)
 [1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
 [991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
 [997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
 [ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588

What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19

$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52

$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46

add a comment |

Here is what I have thus far:

# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)

output

> names(A)
 [1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc" 
> # view the data
> head(A)
 wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)

Output

 > summary(C)
 Call:
 polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

 Coefficients:
 Value Std. Error t value
 num_child 0.5272 0.02618 20.140
 cmc2 -0.1603 0.12776 -1.254
 cmc3 -1.0119 0.11673 -8.669

 Intercepts:
 Value Std. Error t value
 23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
 30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
 37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

 Residual Deviance: 3446.081 
 AIC: 3458.081 
 > predict(C)
 [1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
 [991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
 [997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
 [ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588

What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19

$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52

$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46

add a comment |

Here is what I have thus far:

# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)

output

> names(A)
 [1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc" 
> # view the data
> head(A)
 wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)

Output

 > summary(C)
 Call:
 polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

 Coefficients:
 Value Std. Error t value
 num_child 0.5272 0.02618 20.140
 cmc2 -0.1603 0.12776 -1.254
 cmc3 -1.0119 0.11673 -8.669

 Intercepts:
 Value Std. Error t value
 23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
 30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
 37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

 Residual Deviance: 3446.081 
 AIC: 3458.081 
 > predict(C)
 [1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
 [991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
 [997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
 [ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588

What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

Here is what I have thus far:

# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)

output

> names(A)
 [1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc" 
> # view the data
> head(A)
 wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)

Output

 > summary(C)
 Call:
 polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

 Coefficients:
 Value Std. Error t value
 num_child 0.5272 0.02618 20.140
 cmc2 -0.1603 0.12776 -1.254
 cmc3 -1.0119 0.11673 -8.669

 Intercepts:
 Value Std. Error t value
 23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
 30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
 37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

 Residual Deviance: 3446.081 
 AIC: 3458.081 
 > predict(C)
 [1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
 [991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
 [997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
 [ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588

What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?

clustering k-means rstudio

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

edited Apr 2 at 23:10

asked Apr 1 at 13:58

Chris Kehl

579

asked Apr 1 at 13:58

Chris Kehl

579

asked Apr 1 at 13:58

Chris Kehl

579

$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19

$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52

$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46

add a comment |

$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19

$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52

$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46

Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.

– Anony-Mousse
Apr 1 at 18:19

broken into levels, if there is a better way please share.

– Chris Kehl
Apr 1 at 18:52

@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example

– mnm
Apr 2 at 7:46

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

output

Output

output

Output

output

Output

output

Output

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

output

Output

output

Output

output

Output

output

Output

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog