EDA for analysis of nominal variable with high cardinality The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Why is the Constellation's nose gear so long?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

Time travel alters history but people keep saying nothing's changed

Pokemon Turn Based battle (Python)

Why not take a picture of a closer black hole?

Omit the same coordinate parameters in drawing line in tikz

How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?

Why was M87 targetted for the Event Horizon Telescope instead of Sagittarius A*?

How do I free up internal storage if I don't have any apps downloaded?

How to support a colleague who finds meetings extremely tiring?

Can one be advised by a professor who is very far away?

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

What do hard-Brexiteers want with respect to the Irish border?

Why isn't airport relocation done gradually?

two types of coins, decide which type it is based on 100 flips

Worn-tile Scrabble

Output the Arecibo Message

How to notate time signature switching consistently every measure

How much of the clove should I use when using big garlic heads?

Command for nulifying spaces

Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

What do the Banks children have against barley water?

EDA for analysis of nominal variable with high cardinality

The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?

asked Mar 1 at 6:06

Rohit Gavval

658

add a comment |

asked Mar 1 at 6:06

Rohit Gavval

658

add a comment |

asked Mar 1 at 6:06

Rohit Gavval

658

categorical-data data-analysis

asked Mar 1 at 6:06

Rohit Gavval

658

asked Mar 1 at 6:06

Rohit Gavval

658

asked Mar 1 at 6:06

Rohit Gavval

658

asked Mar 1 at 6:06

Rohit Gavval

658

asked Mar 1 at 6:06

Rohit Gavval

658

add a comment |

1 Answer
1

active

oldest

votes

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.

answered Mar 1 at 13:09

Victor Oliveira

3657

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3657

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3657

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3657

You can calculate mean target for each categorical variable and compare its values.
In pandas this can be done easily: df.groupby('categorical_feature').target.mean()

answered Mar 1 at 13:09

Victor Oliveira

3657

answered Mar 1 at 13:09

Victor Oliveira

3657

answered Mar 1 at 13:09

Victor Oliveira

3657

answered Mar 1 at 13:09

Victor Oliveira

3657

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

add a comment |

$begingroup$
My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
$endgroup$
– Rohit Gavval
Mar 7 at 9:43

$begingroup$
@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
$endgroup$
– Victor Oliveira
Mar 7 at 11:23

My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?

– Rohit Gavval
Mar 7 at 9:43

@RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…

– Victor Oliveira
Mar 7 at 11:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1