How to use a one-hot encoded nominal feature in a classifier in Scikit Learn? The Next CEO of Stack Overflow2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?

Can I board the first leg of the flight without having final country's visa?

Is it okay to majorly distort historical facts while writing a fiction story?

Calculate the Mean mean of two numbers

Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?

Expressing the idea of having a very busy time

How to find image of a complex function with given constraints?

Film where the government was corrupt with aliens, people sent to kill aliens are given rigged visors not showing the right aliens

Expectation in a stochastic differential equation

Does Germany produce more waste than the US?

what's the use of '% to gdp' type of variables?

Can this note be analyzed as a non-chord tone?

Is it convenient to ask the journal's editor for two additional days to complete a review?

What connection does MS Office have to Netscape Navigator?

AB diagonalizable then BA also diagonalizable

Reshaping json / reparing json inside shell script (remove trailing comma)

Is it professional to write unrelated content in an almost-empty email?

How do I fit a non linear curve?

IC has pull-down resistors on SMBus lines?

Help/tips for a first time writer?

Is dried pee considered dirt?

How to use ReplaceAll on an expression that contains a rule

What happened in Rome, when the western empire "fell"?

Is it correct to say moon starry nights?

Is there an equivalent of cd - for cp or mv

How to use a one-hot encoded nominal feature in a classifier in Scikit Learn?

The Next CEO of Stack Overflow

2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?

Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:

svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)

This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error

Here is a view of the two pandas objects:

df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

add a comment |

svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)

Here is a view of the two pandas objects:

df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

add a comment |

svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)

Here is a view of the two pandas objects:

df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)

Here is a view of the two pandas objects:

df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object

machine-learning scikit-learn nlp pandas

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

asked Mar 25 at 20:33

Mudit Jha

161

asked Mar 25 at 20:33

Mudit Jha

161

New contributor

Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.

answered Mar 26 at 6:16

Cini09

166

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.

answered Mar 26 at 6:16

Cini09

166

add a comment |

I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.

answered Mar 26 at 6:16

Cini09

166

add a comment |

I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.

answered Mar 26 at 6:16

Cini09

166

I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.

answered Mar 26 at 6:16

Cini09

166

answered Mar 26 at 6:16

Cini09

166

answered Mar 26 at 6:16

Cini09

166

answered Mar 26 at 6:16

Cini09

166

add a comment |

Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1