In which cases shouldn't we drop the first level of categorical variables?Why do we need to discard one dummy variable?sklearn.naive_bayes VS categorical variablesPandas categorical variables encoding for regression (one-hot encoding vs dummy encoding)Categorical Variables - ClassificationTransform Categorical Variables into NumericalAlways drop the first column after performing One Hot Encoding?How to quantify the numerical influence of categorical variables?Need input on which features to drop in classification modelTransformation of categorical variables (binary vs numerical)What is the the cost of combining categorical variables?Expanding mean (target) encoding utilized by CatBoost to deal with high cardinal categorical variables?

Stack Interview Code methods made from class Node and Smart Pointers

How to get directions in deep space?

What is the difference between lands and mana?

Devil Fruit Question

Are Captain Marvel's powers affected by Thanos breaking the Tesseract and claiming the stone?

Giving feedback to someone without sounding prejudiced

How do I tell my boss that I'm quitting soon, especially given that a colleague just left this week

What (the heck) is a Super Worm Equinox Moon?

In movies, why do people move so slowly in zero gravity?

Find the next value of this number series

Temporarily disable WLAN internet access for children, but allow it for adults

Which Article Helped Get Rid of Technobabble in RPGs?

Change the color of a single dot in `ddot` symbol

What kind of floor tile is this?

How much theory knowledge is actually used while playing?

How would you translate "more" for use as an interface button?

Which was the first story featuring espers?

Microchip documentation does not label CAN buss pins on micro controller pinout diagram

Why do Radio Buttons not fill the entire outer circle?

Is this toilet slogan correct usage of the English language?

When were female captains banned from Starfleet?

Make a Bowl of Alphabet Soup

15% tax on $7.5k earnings. Is that right?

Biological Blimps: Propulsion

In which cases shouldn't we drop the first level of categorical variables?

Why do we need to discard one dummy variable?sklearn.naive_bayes VS categorical variablesPandas categorical variables encoding for regression (one-hot encoding vs dummy encoding)Categorical Variables - ClassificationTransform Categorical Variables into NumericalAlways drop the first column after performing One Hot Encoding?How to quantify the numerical influence of categorical variables?Need input on which features to drop in classification modelTransformation of categorical variables (binary vs numerical)What is the the cost of combining categorical variables?Expanding mean (target) encoding utilized by CatBoost to deal with high cardinal categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept.

Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.

I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.

But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.

In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?

EDIT: @EliasStrehle's comment on above-mentioned link states that this is only true if the model has an intercept. Is this rule generalizable? What about algorithms like KNN or trees which are not exactly models in the statistic definition?

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

1

$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday

1

$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

Beginner in machine learning, I'm looking into the one-hot encoding concept.

Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.

I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.

But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.

In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

1

$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday

1

$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

Beginner in machine learning, I'm looking into the one-hot encoding concept.

Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.

I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.

But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.

In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

Beginner in machine learning, I'm looking into the one-hot encoding concept.

Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k dummies.

I know that having k levels could lead to collinearity problems, but I'm not aware of any problem caused by having k-1 levels.

But since pandas.get_dummies() has its drop_first argument to false by default, this definitely has to be useful sometimes.

In which cases (algorithms, parameters...) would I want to keep the 1st level and fit with k levels for each categorical variable?

machine-learning algorithms encoding dummy-variables

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

edited yesterday

asked 2 days ago

Dan Chaltiel

1335

New contributor

asked 2 days ago

Dan Chaltiel

1335

asked 2 days ago

Dan Chaltiel

1335

New contributor

Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday

1

$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

1

$begingroup$
Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).
$endgroup$
– Alex L
yesterday

1

$begingroup$
Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?
$endgroup$
– Dan Chaltiel
yesterday

Do you have any specific algorithms you're interested in? I'd can see this question being answered differently depending on the algorithm (i.e. regression vs. decision tree).

– Alex L
yesterday

Actually, this is my point. How could I know that a given algorithm needs to drop the first level of its categorical variables or not ?

– Dan Chaltiel
yesterday

add a comment |

1 Answer
1

active

oldest

votes

First, if your data has missing values, get_dummies by default will produce all zeros, so that perfect multicollinearity doesn't actually hold. Also, from a data manipulation standpoint (without regard for modeling), it makes some sense to keep the symmetry of having a dummy for every value of the categorical variable.

In a decision tree (and various ensembles thereof), keeping all the dummies is beneficial: if you remove the first dummy, then the model can only select on that level by selecting (through several steps in the tree, rather unlikely!) "not this other dummy."

Then again, it's probably better not to one-hot encode at all for decision trees, but for now some packages don't deal innately with categorical variables.

K-nearest neighbors seems like it would also benefit from keeping all levels. The taxicab distance, limited to the dummies of one feature, between two points with different values is 1 if one of their values was the removed dummy, otherwise 2.

But again, it seems like KNN would be better off without one-hot encoding, but instead some more informed measure of distances between the category's values if you can come up with them.

See also https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding

(In particular, when using regularization in a linear model, it may be worth keeping all dummies.)

edited yesterday

answered yesterday

Ben Reiniger

30319

$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday

$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47638%2fin-which-cases-shouldnt-we-drop-the-first-level-of-categorical-variables%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

edited yesterday

answered yesterday

Ben Reiniger

30319

$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday

$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

edited yesterday

answered yesterday

Ben Reiniger

30319

$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday

$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

edited yesterday

answered yesterday

Ben Reiniger

30319

edited yesterday

answered yesterday

Ben Reiniger

30319

edited yesterday

answered yesterday

Ben Reiniger

30319

answered yesterday

Ben Reiniger

30319

answered yesterday

Ben Reiniger

30319

$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday

$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday

add a comment |

$begingroup$
Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.
$endgroup$
– Dan Chaltiel
yesterday

$begingroup$
I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.
$endgroup$
– Ben Reiniger
yesterday

$begingroup$
+1 Thanks, your answer was definitely helping
$endgroup$
– Dan Chaltiel
yesterday

Very interesting but you only answered on my examples. If there is no general rule on this matter, what concepts should I learn to be able to tell ?

– Dan Chaltiel
yesterday

Additionnaly, I'm using python's scikit which apparently needs one-hot encoding beforehand.

– Dan Chaltiel
yesterday

I'm not sure of a general rule. I suspect keeping all dummies is generally better except when the model assumes that there is no multicollinearity. As another example, neural networks are linear before activations, so they can use the multicollinearity to recover the removed dummy internally; but I don't think leaving the dummy there will hurt the model.

– Ben Reiniger
yesterday

+1 Thanks, your answer was definitely helping

– Dan Chaltiel
yesterday

add a comment |

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1