EDA for analysis of nominal variable with high cardinality The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Why is the Constellation's nose gear so long?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

Time travel alters history but people keep saying nothing's changed

Pokemon Turn Based battle (Python)

Why not take a picture of a closer black hole?

Omit the same coordinate parameters in drawing line in tikz

How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?

Why was M87 targetted for the Event Horizon Telescope instead of Sagittarius A*?

How do I free up internal storage if I don't have any apps downloaded?

How to support a colleague who finds meetings extremely tiring?

Can one be advised by a professor who is very far away?

With regards to an effect that triggers when a creature attacks, how does it entering the battlefield tapped and attacking apply?

What do hard-Brexiteers want with respect to the Irish border?

Why isn't airport relocation done gradually?

two types of coins, decide which type it is based on 100 flips

Worn-tile Scrabble

Output the Arecibo Message

How to notate time signature switching consistently every measure

How much of the clove should I use when using big garlic heads?

Command for nulifying spaces

Why doesn't mkfifo with a mode of 1755 grant read permissions and sticky bit to the user?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

What do the Banks children have against barley water?



EDA for analysis of nominal variable with high cardinality



The 2019 Stack Overflow Developer Survey Results Are InWhat are the approaches to aggregate categorical variables?Kmeans on mixed dataset with high level for categClustering for mixed numeric and nominal discrete dataHow to deal with categorical feature of very high cardinality?Feature importance with high-cardinality categorical features for regression (numerical depdendent variable)Categorical, nominal or continuous variable?Is there a name for a scale which mixes ordinal and nominal?Nominal categorical variable with two levels: Label Encoding or One Hot encoding?How to deal with Nominal categorical with label encoding?Dummy variable for Categorical valuesResponse variable is nominal.










1












$begingroup$


I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










share|improve this question









$endgroup$
















    1












    $begingroup$


    I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










    share|improve this question









    $endgroup$














      1












      1








      1





      $begingroup$


      I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?










      share|improve this question









      $endgroup$




      I have a nominal variable (car model) with very high cardinality (~8500 labels) and I would like to analyse its relation with a binary target variable. While I can create logical groups and compare the distribution of target variable for each of the groups, can anyone suggest if there are any superior techniques/visualization tools for this type of analysis?







      categorical-data data-analysis






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 1 at 6:06









      Rohit GavvalRohit Gavval

      658




      658




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23















          1












          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$












          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23













          1












          1








          1





          $begingroup$

          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.






          share|improve this answer









          $endgroup$



          You can calculate mean target for each categorical variable and compare its values.
          In pandas this can be done easily: df.groupby('categorical_feature').target.mean()



          Then you can make a histogram to compare the approach. I also, seaborn has a catplot, where it do the same as above in a bar plot format, showing mean value for target variable based on each categorical one.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 1 at 13:09









          Victor OliveiraVictor Oliveira

          3657




          3657











          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23
















          • $begingroup$
            My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
            $endgroup$
            – Rohit Gavval
            Mar 7 at 9:43










          • $begingroup$
            @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
            $endgroup$
            – Victor Oliveira
            Mar 7 at 11:23















          $begingroup$
          My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
          $endgroup$
          – Rohit Gavval
          Mar 7 at 9:43




          $begingroup$
          My target variable is dichotomous. So taking the mean is not an option. May be I can take count, but the real problem is that I have around 8000 levels in one categorical attribute. How can I study that?
          $endgroup$
          – Rohit Gavval
          Mar 7 at 9:43












          $begingroup$
          @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
          $endgroup$
          – Victor Oliveira
          Mar 7 at 11:23




          $begingroup$
          @RohitGavval, if you have a binary variable, you can calculate mean. It will be something like 0.333, 0.67, that is the point. Look at my answer to this question where I put the links with more explanation for the mentioned methods: datascience.stackexchange.com/questions/46780/…
          $endgroup$
          – Victor Oliveira
          Mar 7 at 11:23

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46439%2feda-for-analysis-of-nominal-variable-with-high-cardinality%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown