EDA for analysis of nominal variable with high cardinality The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.
If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?
Why is the Constellation's nose gear so long?
What is the motivation for a law requiring 2 parties to consent for recording a conversation
Time travel alters history but people keep saying nothing's changed
Pokemon Turn Based battle (Python)
Why not take a picture of a closer black hole?
Omit the same coordinate parameters in drawing line in tikz
How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?
Why was M87 targetted for the Event Horizon Telescope instead of Sagittarius A*?
How do I free up internal storage if I don't have any apps downloaded?
How to support a colleague who finds meetings extremely tiring?
Can one be advised by a professor who is very far away?
With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?
What do hard-Brexiteers want with respect to the Irish border?
Why isn't airport relocation done gradually?
two types of coins, decide which type it is based on 100 flips
Worn-tile Scrabble
Output the Arecibo Message
How to notate time signature switching consistently every measure
How much of the clove should I use when using big garlic heads?
Command for nulifying spaces
Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?
Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past
What do the Banks children have against barley water?
EDA for analysis of nominal variable with high cardinality
The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.
$begingroup$
I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?
categorical-data data-analysis
$endgroup$
add a comment |
$begingroup$
I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?
categorical-data data-analysis
$endgroup$
add a comment |
$begingroup$
I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?
categorical-data data-analysis
$endgroup$
I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?
categorical-data data-analysis
categorical-data data-analysis
asked Mar 1 at 6:06
Rohit GavvalRohit Gavval
658
658
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
$endgroup$
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
$endgroup$
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
add a comment |
$begingroup$
You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
$endgroup$
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
add a comment |
$begingroup$
You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
$endgroup$
You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()
Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.
answered Mar 1 at 13:09
Victor OliveiraVictor Oliveira
3657
3657
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
add a comment |
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown