Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R

Why is black pepper both grey and black?

When is phishing education going too far?

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

What do you call a phrase that's not an idiom yet?

What does '1 unit of lemon juice' mean in a grandma's drink recipe?

How can players work together to take actions that are otherwise impossible?

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

What is the longest distance a 13th-level monk can jump while attacking on the same turn?

The logistics of corpse disposal

What are the motives behind Cersei's orders given to Bronn?

When to stop saving and start investing?

Determinant is linear as a function of each of the rows of the matrix.

Super Attribute Position on Product Page Magento 1

Why was the term "discrete" used in discrete logarithm?

Why constant symbols in a language?

Why does Python start at index -1 (as opposed to 0) when indexing a list from the end?

ListPlot join points by nearest neighbor rather than order

Is the address of a local variable a constexpr?

Bonus calculation: Am I making a mountain out of a molehill?

Can an alien society believe that their star system is the universe?

Dating a Former Employee

How widely used is the term Treppenwitz? Is it something that most Germans know?

How to recreate this effect in Photoshop?

Antler Helmet: Can it work?



Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R










0












$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46
















0












$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46














0












0








0





$begingroup$


I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?










share|improve this question











$endgroup$




I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.



Here is what I have thus far:



# install packages Note# tidyverse installs packages for dplyr 
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")

#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)

# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)


output



> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1


# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$
tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)

B<-factor(B)

A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)


Output



 > summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)

Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669

Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142

Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174

[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588


What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?







clustering k-means rstudio






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 2 at 23:10







Chris Kehl

















asked Apr 1 at 13:58









Chris KehlChris Kehl

579




579











  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46

















  • $begingroup$
    Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
    $endgroup$
    – Anony-Mousse
    Apr 1 at 18:19











  • $begingroup$
    broken into levels, if there is a better way please share.
    $endgroup$
    – Chris Kehl
    Apr 1 at 18:52










  • $begingroup$
    @ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
    $endgroup$
    – mnm
    Apr 2 at 7:46
















$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19





$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19













$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52




$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52












$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46





$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46











0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Marja Vauras Lähteet | Aiheesta muualla | NavigointivalikkoMarja Vauras Turun yliopiston tutkimusportaalissaInfobox OKSuomalaisen Tiedeakatemian varsinaiset jäsenetKasvatustieteiden tiedekunnan dekaanit ja muu johtoMarja VaurasKoulutusvienti on kestävyys- ja ketteryyslaji (2.5.2017)laajentamallaWorldCat Identities0000 0001 0855 9405n86069603utb201588738523620927

Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task

Is this part of the description of the Archfey warlock's Misty Escape feature redundant?When is entropic ward considered “used”?How does the reaction timing work for Wrath of the Storm? Can it potentially prevent the damage from the triggering attack?Does the Dark Arts Archlich warlock patrons's Arcane Invisibility activate every time you cast a level 1+ spell?When attacking while invisible, when exactly does invisibility break?Can I cast Hellish Rebuke on my turn?Do I have to “pre-cast” a reaction spell in order for it to be triggered?What happens if a Player Misty Escapes into an Invisible CreatureCan a reaction interrupt multiattack?Does the Fiend-patron warlock's Hurl Through Hell feature dispel effects that require the target to be on the same plane as the caster?What are you allowed to do while using the Warlock's Eldritch Master feature?