Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R

Why is black pepper both grey and black?

When is phishing education going too far?

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

What do you call a phrase that's not an idiom yet?

What does '1 unit of lemon juice' mean in a grandma's drink recipe?

How can players work together to take actions that are otherwise impossible?

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

What is the longest distance a 13th-level monk can jump while attacking on the same turn?

The logistics of corpse disposal

What are the motives behind Cersei's orders given to Bronn?

When to stop saving and start investing?

Determinant is linear as a function of each of the rows of the matrix.

Super Attribute Position on Product Page Magento 1

Why was the term "discrete" used in discrete logarithm?

Why constant symbols in a language?

Why does Python start at index -1 (as opposed to 0) when indexing a list from the end?

ListPlot join points by nearest neighbor rather than order

Is the address of a local variable a constexpr?

Bonus calculation: Am I making a mountain out of a molehill?

Can an alien society believe that their star system is the universe?

Dating a Former Employee

How widely used is the term Treppenwitz? Is it something that most Germans know?

How to recreate this effect in Photoshop?

Antler Helmet: Can it work?



Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R










0












$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46
















0












$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46














0












0








0





$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$




I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?







clustering k-means rstudio






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 2 at 23:10







Chris Kehl

















asked Apr 1 at 13:58









Chris KehlChris Kehl

579




579











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46

















  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46
















$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19





$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19













$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52




$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52












$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46





$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46











0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High