Target Encoding: missing value imputation before or after encodingMissing Categorical Features - no imputationMissing data imputation with KNNImputation of missing values and dealing with categorical valuesWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?Removing Categorial Features in Linear RegressionMissing value in continuous variable: Indicator variable vs. Indicator valueHow to handle large number of features in machine learning?Predicting a cyclic targetMissing Values In New DataTarget encoding with cross validation
Describing a chess game in a novel
What (if any) is the reason to buy in small local stores?
Pronounciation of the combination "st" in spanish accents
What does Jesus mean regarding "Raca," and "you fool?" - is he contrasting them?
In Aliens, how many people were on LV-426 before the Marines arrived?
Synchronized implementation of a bank account in Java
Generic TVP tradeoffs?
Fewest number of steps to reach 200 using special calculator
Why is there so much iron?
How to define limit operations in general topological spaces? Are nets able to do this?
What is the significance behind "40 days" that often appears in the Bible?
What exactly term 'companion plants' means?
I seem to dance, I am not a dancer. Who am I?
Help prove this basic trig identity please!
Does .bashrc contain syntax errors?
Do native speakers use "ultima" and "proxima" frequently in spoken English?
HP P840 HDD RAID 5 many strange drive failures
A Ri-diddley-iley Riddle
Deletion of copy-ctor & copy-assignment - public, private or protected?
Print a physical multiplication table
How is the partial sum of a geometric sequence calculated?
Suggestions on how to spend Shaabath (constructively) alone
Probably overheated black color SMD pads
What can I do if I am asked to learn different programming languages very frequently?
Target Encoding: missing value imputation before or after encoding
Missing Categorical Features - no imputationMissing data imputation with KNNImputation of missing values and dealing with categorical valuesWhat approach for creating a multi-classification model based on all categorical features (1 with 5,000 levels)?Removing Categorial Features in Linear RegressionMissing value in continuous variable: Indicator variable vs. Indicator valueHow to handle large number of features in machine learning?Predicting a cyclic targetMissing Values In New DataTarget encoding with cross validation
$begingroup$
I want to perform a target encoding for my categorical features although I am not sure when to perform the data imputation if any of them has missing values.
Let's say I have a few continuous features, Cnt1-Cnt5 (without NA's) and two categorical features, Cat1 and Cat2, with Cat2 having missing values. Let's also assume that I want to use Random Forest as an imputation method. Which approach would be the correct one?
Impute Cat2 treating Cat1 and Cnt1-Cnt5 as predictors in RF and then perform target encoding on categorical variables.
Target encode Cat2 for non missing and Cat1, build RF and impute missings for Cat2 (which is now numeric, not categorical).
Any other approach?
We can generalize this question and ask whether we should impute missings for any kind of variable (including continuous) before or after target encoding?
I see at least one benefit of imputation after target encoding - if there are unseen levels of categorical variable present in the test data (which will result in NA's in the test set after performing target encoding), those would be easily imputed by RF built on training data, without any potential error due to new levels.
feature-engineering encoding data-imputation
New contributor
$endgroup$
add a comment |
$begingroup$
I want to perform a target encoding for my categorical features although I am not sure when to perform the data imputation if any of them has missing values.
Let's say I have a few continuous features, Cnt1-Cnt5 (without NA's) and two categorical features, Cat1 and Cat2, with Cat2 having missing values. Let's also assume that I want to use Random Forest as an imputation method. Which approach would be the correct one?
Impute Cat2 treating Cat1 and Cnt1-Cnt5 as predictors in RF and then perform target encoding on categorical variables.
Target encode Cat2 for non missing and Cat1, build RF and impute missings for Cat2 (which is now numeric, not categorical).
Any other approach?
We can generalize this question and ask whether we should impute missings for any kind of variable (including continuous) before or after target encoding?
I see at least one benefit of imputation after target encoding - if there are unseen levels of categorical variable present in the test data (which will result in NA's in the test set after performing target encoding), those would be easily imputed by RF built on training data, without any potential error due to new levels.
feature-engineering encoding data-imputation
New contributor
$endgroup$
add a comment |
$begingroup$
I want to perform a target encoding for my categorical features although I am not sure when to perform the data imputation if any of them has missing values.
Let's say I have a few continuous features, Cnt1-Cnt5 (without NA's) and two categorical features, Cat1 and Cat2, with Cat2 having missing values. Let's also assume that I want to use Random Forest as an imputation method. Which approach would be the correct one?
Impute Cat2 treating Cat1 and Cnt1-Cnt5 as predictors in RF and then perform target encoding on categorical variables.
Target encode Cat2 for non missing and Cat1, build RF and impute missings for Cat2 (which is now numeric, not categorical).
Any other approach?
We can generalize this question and ask whether we should impute missings for any kind of variable (including continuous) before or after target encoding?
I see at least one benefit of imputation after target encoding - if there are unseen levels of categorical variable present in the test data (which will result in NA's in the test set after performing target encoding), those would be easily imputed by RF built on training data, without any potential error due to new levels.
feature-engineering encoding data-imputation
New contributor
$endgroup$
I want to perform a target encoding for my categorical features although I am not sure when to perform the data imputation if any of them has missing values.
Let's say I have a few continuous features, Cnt1-Cnt5 (without NA's) and two categorical features, Cat1 and Cat2, with Cat2 having missing values. Let's also assume that I want to use Random Forest as an imputation method. Which approach would be the correct one?
Impute Cat2 treating Cat1 and Cnt1-Cnt5 as predictors in RF and then perform target encoding on categorical variables.
Target encode Cat2 for non missing and Cat1, build RF and impute missings for Cat2 (which is now numeric, not categorical).
Any other approach?
We can generalize this question and ask whether we should impute missings for any kind of variable (including continuous) before or after target encoding?
I see at least one benefit of imputation after target encoding - if there are unseen levels of categorical variable present in the test data (which will result in NA's in the test set after performing target encoding), those would be easily imputed by RF built on training data, without any potential error due to new levels.
feature-engineering encoding data-imputation
feature-engineering encoding data-imputation
New contributor
New contributor
edited yesterday
MarkSt
New contributor
asked yesterday
MarkStMarkSt
62
62
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
MarkSt is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47409%2ftarget-encoding-missing-value-imputation-before-or-after-encoding%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
MarkSt is a new contributor. Be nice, and check out our Code of Conduct.
MarkSt is a new contributor. Be nice, and check out our Code of Conduct.
MarkSt is a new contributor. Be nice, and check out our Code of Conduct.
MarkSt is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47409%2ftarget-encoding-missing-value-imputation-before-or-after-encoding%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown