Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R
Why is black pepper both grey and black?
When is phishing education going too far?
Is above average number of years spent on PhD considered a red flag in future academia or industry positions?
What do you call a phrase that's not an idiom yet?
What does '1 unit of lemon juice' mean in a grandma's drink recipe?
How can players work together to take actions that are otherwise impossible?
Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?
What is the longest distance a 13th-level monk can jump while attacking on the same turn?
The logistics of corpse disposal
What are the motives behind Cersei's orders given to Bronn?
When to stop saving and start investing?
Determinant is linear as a function of each of the rows of the matrix.
Super Attribute Position on Product Page Magento 1
Why was the term "discrete" used in discrete logarithm?
Why constant symbols in a language?
Why does Python start at index -1 (as opposed to 0) when indexing a list from the end?
ListPlot join points by nearest neighbor rather than order
Is the address of a local variable a constexpr?
Bonus calculation: Am I making a mountain out of a molehill?
Can an alien society believe that their star system is the universe?
Dating a Former Employee
How widely used is the term Treppenwitz? Is it something that most Germans know?
How to recreate this effect in Photoshop?
Antler Helmet: Can it work?
Uniformly quantize the dependent variable into 4, 8, and 16 levels / R Programming
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsWhat is the best Data Mining algorithm for prediction based on a single variable?How to extract features and classify alert emails coming from monitoring tools into proper category?How to get the inertia at the begining when using sklearn.cluster.KMeans and MiniBatchKMeanshow to sum a variable by group in r and display in a tabular form?Cluster a categorical variable without breaking the existing categoriesConceptual Question about finding relation between one categorical variable and one numeric variableWhat's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?What's the difference between finding the average Euclidean distance and using inertia_ in KMeans in sklearn?Hierarchical Clustering and Variable SelectionHow to validate clusters after calculating Gower distances and Ward's clustering in R
$begingroup$
I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.
Here is what I have thus far:
# install packages Note# tidyverse installs packages for dplyr
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")
#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)
# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)
output
> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1
# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)
B<-factor(B)
A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)
Output
> summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)
Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669
Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142
Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174
[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588
What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?
clustering k-means rstudio
$endgroup$
add a comment |
$begingroup$
I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.
Here is what I have thus far:
# install packages Note# tidyverse installs packages for dplyr
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")
#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)
# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)
output
> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1
# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)
B<-factor(B)
A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)
Output
> summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)
Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669
Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142
Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174
[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588
What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?
clustering k-means rstudio
$endgroup$
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46
add a comment |
$begingroup$
I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.
Here is what I have thus far:
# install packages Note# tidyverse installs packages for dplyr
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")
#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)
# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)
output
> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1
# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)
B<-factor(B)
A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)
Output
> summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)
Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669
Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142
Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174
[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588
What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?
clustering k-means rstudio
$endgroup$
I have a question pertaining to the quantizing my dependent variable into 4, 8, and 16 levels. I will attempt to use kmeans to do this. Here is where I am lost. I need to perform these steps after I break my dependent variable into the 3 levels. Predict the numerical variable "wife_age" using the independent variables: numerical "num_child" and categorical "cmc". Compute the MSE or mean-square error, confusion table, and probability of misclassification. Here is my data set.
Here is what I have thus far:
# install packages Note# tidyverse installs packages for dplyr
install.packages("tidyverse")
install.packages("factoextra")
install.packages("cluster")
#open the library
library(tidyverse)
library(cluster)
library(factoextra)
library(MASS)
# Load dataset
A <- read.table("H5.txt", header = TRUE)
# list variable names
names(A)
# view the data
head(A)
output
> names(A)
[1] "wife_age" "wife_edu" "hus_edu" "num_child" "wife_rel" "wife_work" "hus_occu" "sli" "media_exp"
[10] "cmc"
> # view the data
> head(A)
wife_age wife_edu hus_edu num_child wife_rel wife_work hus_occu sli media_exp cmc
1 24 2 3 3 1 1 2 3 0 1
2 45 1 3 10 1 1 3 4 0 1
3 43 2 3 7 1 1 3 4 0 1
4 42 3 2 9 1 1 3 3 0 1
5 36 3 3 8 1 1 3 2 0 1
6 19 4 4 0 1 1 3 3 0 1
# table broken into 4 clusters
L4 <- kmeans(A$wife_age, 4)
## Total Within cluster sum of square
L4$tot.withinss
L4
B=L4$centers[L4$cluster]
head(B, 4)
hist(B)
mean(B)
sd(B)
B<-factor(B)
A$cmc <- factor(A$cmc)
C = polr(B ~ num_child + cmc, data = A, Hess = TRUE)
summary(C)
predict(C)
Output
> summary(C)
Call:
polr(formula = B ~ num_child + cmc, data = A, Hess = TRUE)
Coefficients:
Value Std. Error t value
num_child 0.5272 0.02618 20.140
cmc2 -0.1603 0.12776 -1.254
cmc3 -1.0119 0.11673 -8.669
Intercepts:
Value Std. Error t value
23.5144628099174|30.8827930174564 0.3663 0.1040 3.5218
30.8827930174564|37.5348101265823 1.8538 0.1167 15.8816
37.5348101265823|45.2316176470588 3.2798 0.1377 23.8142
Residual Deviance: 3446.081
AIC: 3458.081
> predict(C)
[1] 30.8827930174564 45.2316176470588 45.2316176470588 45.2316176470588 45.2316176470588 23.5144628099174
[985] 23.5144628099174 30.8827930174564 23.5144628099174 23.5144628099174 30.8827930174564 30.8827930174564
[991] 23.5144628099174 23.5144628099174 23.5144628099174 45.2316176470588 23.5144628099174 23.5144628099174
[997] 30.8827930174564 37.5348101265823 45.2316176470588 45.2316176470588
[ reached getOption("max.print") -- omitted 473 entries ]
Levels: 23.5144628099174 30.8827930174564 37.5348101265823 45.2316176470588
What I am wanting to do is a prediction. From the predict C. am I actually predicting the wife's age? Is there a better way to do this?
clustering k-means rstudio
clustering k-means rstudio
edited Apr 2 at 23:10
Chris Kehl
asked Apr 1 at 13:58
Chris KehlChris Kehl
579
579
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46
add a comment |
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48354%2funiformly-quantize-the-dependent-variable-into-4-8-and-16-levels-r-programmi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Define "uniform". I don't see how kmeans could be "uniform" in any standard meaning of the word.
$endgroup$
– Anony-Mousse
Apr 1 at 18:19
$begingroup$
broken into levels, if there is a better way please share.
$endgroup$
– Chris Kehl
Apr 1 at 18:52
$begingroup$
@ChrisKehl please do not post pictures of the data analytical activity. It's difficult to reproduce. Also do not BOLD the sentences.. In netspeak it indicates shouting. Suggest to revise the question, state clearly what you have tried, the problem. add bullet points wherever applicable. See here on how to create a minimum reproducible example
$endgroup$
– mnm
Apr 2 at 7:46