How to use a one-hot encoded nominal feature in a classifier in Scikit Learn? The Next CEO of Stack Overflow2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?

Can I board the first leg of the flight without having final country's visa?

Is it okay to majorly distort historical facts while writing a fiction story?

Calculate the Mean mean of two numbers

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

Expressing the idea of having a very busy time

How to find image of a complex function with given constraints?

Film where the government was corrupt with aliens, people sent to kill aliens are given rigged visors not showing the right aliens

Expectation in a stochastic differential equation

Does Germany produce more waste than the US?

what's the use of '% to gdp' type of variables?

Can this note be analyzed as a non-chord tone?

Is it convenient to ask the journal's editor for two additional days to complete a review?

What connection does MS Office have to Netscape Navigator?

AB diagonalizable then BA also diagonalizable

Reshaping json / reparing json inside shell script (remove trailing comma)

Is it professional to write unrelated content in an almost-empty email?

How do I fit a non linear curve?

IC has pull-down resistors on SMBus lines?

Help/tips for a first time writer?

Is dried pee considered dirt?

How to use ReplaceAll on an expression that contains a rule

What happened in Rome, when the western empire "fell"?

Is it correct to say moon starry nights?

Is there an equivalent of cd - for cp or mv



How to use a one-hot encoded nominal feature in a classifier in Scikit Learn?



The Next CEO of Stack Overflow
2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?










3












$begingroup$


Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)


This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



Here is a view of the two pandas objects:



df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object









share|improve this question







New contributor




Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    3












    $begingroup$


    Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



    svm_classifier = LinearSVC()
    svm_classifier.fit(df_train_num,df_train_genre)


    This gives me a ValueError: Unknown label type: 'unknown'
    What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



    Here is a view of the two pandas objects:



    df_train_num.head(5)


    Unique_Word_Count Sentiment Polarity
    157277 126 0.027766
    90109 114 -0.199545
    106224 16 0.000000
    221087 103 -0.058025
    247082 409 -0.170143

    df_train_genre.head(5)

    157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
    106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    Name: Genre_Encoded, dtype: object









    share|improve this question







    New contributor




    Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      3












      3








      3





      $begingroup$


      Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



      svm_classifier = LinearSVC()
      svm_classifier.fit(df_train_num,df_train_genre)


      This gives me a ValueError: Unknown label type: 'unknown'
      What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



      Here is a view of the two pandas objects:



      df_train_num.head(5)


      Unique_Word_Count Sentiment Polarity
      157277 126 0.027766
      90109 114 -0.199545
      106224 16 0.000000
      221087 103 -0.058025
      247082 409 -0.170143

      df_train_genre.head(5)

      157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
      106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      Name: Genre_Encoded, dtype: object









      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



      svm_classifier = LinearSVC()
      svm_classifier.fit(df_train_num,df_train_genre)


      This gives me a ValueError: Unknown label type: 'unknown'
      What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



      Here is a view of the two pandas objects:



      df_train_num.head(5)


      Unique_Word_Count Sentiment Polarity
      157277 126 0.027766
      90109 114 -0.199545
      106224 16 0.000000
      221087 103 -0.058025
      247082 409 -0.170143

      df_train_genre.head(5)

      157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
      106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      Name: Genre_Encoded, dtype: object






      machine-learning scikit-learn nlp pandas






      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 25 at 20:33









      Mudit JhaMudit Jha

      161




      161




      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.






            share|improve this answer









            $endgroup$

















              0












              $begingroup$

              I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.






              share|improve this answer









              $endgroup$















                0












                0








                0





                $begingroup$

                I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.






                share|improve this answer









                $endgroup$



                I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 26 at 6:16









                Cini09Cini09

                166




                166




















                    Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.









                    draft saved

                    draft discarded


















                    Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.












                    Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.











                    Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.














                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                    Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

                    Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High