In which cases shouldn't we drop the first level of categorical variables?Why do we need to discard one dummy variable?sklearn.naive_bayes VS categorical variablesPandas categorical variables encoding for regression (one-hot encoding vs dummy encoding)Categorical Variables - ClassificationTransform Categorical Variables into NumericalAlways drop the first column after performing One Hot Encoding?How to quantify the numerical influence of categorical variables?Need input on which features to drop in classification modelTransformation of categorical variables (binary vs numerical)What is the the cost of combining categorical variables?Expanding mean (target) encoding utilized by CatBoost to deal with high cardinal categorical variables?
Stack Interview Code methods made from class Node and Smart Pointers
How to get directions in deep space?
What is the difference between lands and mana?
Devil Fruit Question
Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?
Giving feedback to someone without sounding prejudiced
How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week
What (the heck) is a Super Worm Equinox Moon?
In movies, why do people move so slowly in zero gravity?
Find the next value of this number series
Temporarily disable WLAN internet access for children, but allow it for adults
Which Article Helped Get Rid of Technobabble in RPGs?
Change the color of a single dot in `ddot` symbol
What kind of floor tile is this?
How much theory knowledge is actually used while playing?
How would you translate "more" for use as an interface button?
Which was the first story featuring espers?
Microchip documentation does not label CAN buss pins on micro controller pinout diagram
Why do Radio Buttons not fill the entire outer circle?
Is this toilet slogan correct usage of the English language?
When were female captains banned from Starfleet?
Make a Bowl of Alphabet Soup
15% tax on $7.5k earnings. Is that right?
Biological Blimps: Propulsion
In which cases shouldn't we drop the first level of categorical variables?
Why do we need to discard one dummy variable?sklearn.naive_bayes VS categorical variablesPandas categorical variables encoding for regression (one-hot encoding vs dummy encoding)Categorical Variables - ClassificationTransform Categorical Variables into NumericalAlways drop the first column after performing One Hot Encoding?How to quantify the numerical influence of categorical variables?Need input on which features to drop in classification modelTransformation of categorical variables (binary vs numerical)What is the the cost of combining categorical variables?Expanding mean (target) encoding utilized by CatBoost to deal with high cardinal categorical variables?
$begingroup$
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.
I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.
But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.
In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?
EDIT: @EliasStrehle's comment on above-mentioned link states that this is only true if the model has an intercept. Is this rule generalizable? What about algorithms like KNN or trees which are not exactly models in the statistic definition?
machine-learning algorithms encoding dummy-variables
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.
I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.
But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.
In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?
EDIT: @EliasStrehle's comment on above-mentioned link states that this is only true if the model has an intercept. Is this rule generalizable? What about algorithms like KNN or trees which are not exactly models in the statistic definition?
machine-learning algorithms encoding dummy-variables
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
1
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
1
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
$begingroup$
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.
I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.
But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.
In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?
EDIT: @EliasStrehle's comment on above-mentioned link states that this is only true if the model has an intercept. Is this rule generalizable? What about algorithms like KNN or trees which are not exactly models in the statistic definition?
machine-learning algorithms encoding dummy-variables
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.
I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.
But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.
In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?
EDIT: @EliasStrehle's comment on above-mentioned link states that this is only true if the model has an intercept. Is this rule generalizable? What about algorithms like KNN or trees which are not exactly models in the statistic definition?
machine-learning algorithms encoding dummy-variables
machine-learning algorithms encoding dummy-variables
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited yesterday
Dan Chaltiel
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 days ago
Dan ChaltielDan Chaltiel
1335
1335
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
1
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
1
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
1
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday
1
1
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
1
1
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.
In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."
Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.
K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.
But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.
See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)
$endgroup$
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python'sscikitwhich apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47638%2fin-which-cases-shouldnt-we-drop-the-first-level-of-categorical-variables%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.
In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."
Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.
K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.
But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.
See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)
$endgroup$
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python'sscikitwhich apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
$begingroup$
First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.
In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."
Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.
K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.
But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.
See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)
$endgroup$
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python'sscikitwhich apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
$begingroup$
First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.
In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."
Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.
K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.
But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.
See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)
$endgroup$
First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.
In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."
Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.
K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.
But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.
See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding
(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)
edited yesterday
answered yesterday
Ben ReinigerBen Reiniger
30319
30319
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python'sscikitwhich apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python'sscikitwhich apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python's
scikit which apparently needs one-hot encoding beforehand.$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
Additionnaly, I'm using python's
scikit which apparently needs one-hot encoding beforehand.$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday
add a comment |
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47638%2fin-which-cases-shouldnt-we-drop-the-first-level-of-categorical-variables%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday
1
$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday