categorizing a variable turns it from insignificant to significantVariable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression
Did arcade monitors have same pixel aspect ratio as TV sets?
What does chmod -u do?
Does an advisor owe his/her student anything? Will an advisor keep a PhD student only out of pity?
Why Shazam when there is already Superman?
photorec photo recovery software not seeing my mounted filesystem - trying to use photorec to recover lost jpegs
Fear of getting stuck on one programming language / technology that is not used in my country
Keeping a ball lost forever
Temporarily disable WLAN internet access for children, but allow it for adults
14 year old daughter buying thongs
Has any country ever had 2 former presidents in jail simultaneously?
Why is this estimator biased?
How to hide some fields of struct in C?
Title 53, why is it reserved?
Why does the Sun have different day lengths, but not the gas giants?
Lowest total scrabble score
Why would a new[] expression ever invoke a destructor?
Is there an injective, monotonically increasing, strictly concave function from the reals, to the reals?
Multiplicative persistence
What should you do if you miss a job interview (deliberately)?
What is the evidence for the "tyranny of the majority problem" in a direct democracy context?
How do you make your own symbol when Detexify fails?
What's the difference between releasing hormones and tropic hormones?
Can I visit Japan without a visa?
Why can Carol Danvers change her suit colours in the first place?
categorizing a variable turns it from insignificant to significant
Variable entered in logistic regression model is part of another variable entered in the same modelHow to modify variables to be significant in logistic regression?Why does adding independent variables make all independent variables insignificant?Can a variable become statistically significant after the addition of another variable?Can a previously insignificant variable become significant in forward stepwise regressionSignificance of variable but low impact on log likelihood?Categorizing Continuous Random Variable in Logistic RegressionHow can a predictor be significant, only on the presence of non-significant ones?Variable changes from not significant to significant, don't know why, please helpLinear Regression in groups / Multivariate regression
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
add a comment |
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
add a comment |
$begingroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
$endgroup$
I have a numeric variable which turns out not significant in a multivariate logistic regression model.
However, when I categorize it into groups, suddenly it becomes significant.
This is very counter-intuitive to me: when categorizing a variable, we give some information up.
How can this be?
regression logistic statistical-significance multivariate-analysis
regression logistic statistical-significance multivariate-analysis
edited Mar 19 at 9:53
kjetil b halvorsen
31.3k984224
31.3k984224
asked Mar 19 at 5:58
Omry AtiaOmry Atia
30510
30510
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398273%2fcategorizing-a-variable-turns-it-from-insignificant-to-significant%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
$endgroup$
One possible explanation would be nonlinearities in the relationship between your outcome and the predictor.
Here is a little example. We use a predictor that is uniform on $[-1,1]$. The outcome, however, does not linearly depend on the predictor, but on the square of the predictor: TRUE is more likely for both $xapprox-1$ and $xapprox 1$, but less likely for $xapprox 0$. In this case, a linear model will come up insignificant, but cutting the predictor into intervals makes it significant.
> set.seed(1)
> nn <- 1e3
> xx <- runif(nn,-1,1)
> yy <- runif(nn)<1/(1+exp(-xx^2))
>
> library(lmtest)
>
> model_0 <- glm(yy~1,family="binomial")
> model_1 <- glm(yy~xx,family="binomial")
> lrtest(model_1,model_0)
Likelihood ratio test
Model 1: yy ~ xx
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 2 -676.72
2 1 -677.22 -1 0.9914 0.3194
>
> xx_cut <- cut(xx,c(-1,-0.3,0.3,1))
> model_2 <- glm(yy~xx_cut,family="binomial")
> lrtest(model_2,model_0)
Likelihood ratio test
Model 1: yy ~ xx_cut
Model 2: yy ~ 1
#Df LogLik Df Chisq Pr(>Chisq)
1 3 -673.65
2 1 -677.22 -2 7.1362 0.02821 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this does not mean that discretizing the predictor is the best approach. (It almost never is.) Much better to model the nonlinearity using splines or similar.
answered Mar 19 at 6:22
Stephan KolassaStephan Kolassa
47k7100175
47k7100175
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
$begingroup$
Are there some examples where discretizing might be sensible? For example, if you have a specific threshold (e.g. age 18) at which a binary switch in outcomes occurs. Numeric age in the 18+ range might not be significant, but binary age >18 might be significant?
$endgroup$
– ajrwhite
Mar 19 at 18:40
3
3
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
$begingroup$
@ajrwhite: it depends on the field. Anywhere that thresholds are codified in law discretization might make sense. E.g., if you model voting behavior, it makes sense to check whether someone is actually eligible to vote at age 18. Similarly, in Germany, your vehicle tax depends on your engine displacement and jumps at 1700, 1800, 1900, ... ccm, so pretty much all cars have displacements of 1699, 1799, ... ccm (kind of self-discretizing). In the natural sciences like biology, medicine, psychology etc., I struggle to find an example where discretization makes sense.
$endgroup$
– Stephan Kolassa
Mar 20 at 6:03
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
add a comment |
$begingroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
$endgroup$
One possible way is if the relationship is distinctly nonlinear. It's not possible to tell (given the lack of detail) whether this really explains what's going on.
You can check for yourself. First, you could do an added variable plot for the variable as itself, and you could also plot the fitted effects in the factor-version of the model. If the explanation is right, both should see a distinctly nonlinear pattern.
edited Mar 19 at 14:58
answered Mar 19 at 6:23
Glen_b♦Glen_b
214k23415765
214k23415765
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398273%2fcategorizing-a-variable-turns-it-from-insignificant-to-significant%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown