How to use a one-hot encoded nominal feature in a classifier in Scikit Learn? The Next CEO of Stack Overflow2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?
Can I board the first leg of the flight without having final country's visa?
Is it okay to majorly distort historical facts while writing a fiction story?
Calculate the Mean mean of two numbers
Lucky Feat: How can "more than one creature spend a luck point to influence the outcome of a roll"?
Expressing the idea of having a very busy time
How to find image of a complex function with given constraints?
Film where the government was corrupt with aliens, people sent to kill aliens are given rigged visors not showing the right aliens
Expectation in a stochastic differential equation
Does Germany produce more waste than the US?
what's the use of '% to gdp' type of variables?
Can this note be analyzed as a non-chord tone?
Is it convenient to ask the journal's editor for two additional days to complete a review?
What connection does MS Office have to Netscape Navigator?
AB diagonalizable then BA also diagonalizable
Reshaping json / reparing json inside shell script (remove trailing comma)
Is it professional to write unrelated content in an almost-empty email?
How do I fit a non linear curve?
IC has pull-down resistors on SMBus lines?
Help/tips for a first time writer?
Is dried pee considered dirt?
How to use ReplaceAll on an expression that contains a rule
What happened in Rome, when the western empire "fell"?
Is it correct to say moon starry nights?
Is there an equivalent of cd - for cp or mv
How to use a one-hot encoded nominal feature in a classifier in Scikit Learn?
The Next CEO of Stack Overflow2019 Community Moderator Electionnon-linear optimization for a linear classifier? (scikit-learn)When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?Does scikit-learn use regularization by default?Scikit Learn OneHotEncoded Features causing error in classifierUsing Scorer Object for Classifier Score Method for scikit-learnHow to use the same scale with new data? - scikit learn - scikit learnscikit-learn classifier reset in loopThe use of feature scaling in scikit learnHow to use scikit-learn normalize data to [-1, 1]?How to normalize just one feature by scikit-learn?
$begingroup$
Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:
svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)
This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error
Here is a view of the two pandas objects:
df_train_num.head(5)
Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143
df_train_genre.head(5)
157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object
machine-learning scikit-learn nlp pandas
New contributor
$endgroup$
add a comment |
$begingroup$
Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:
svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)
This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error
Here is a view of the two pandas objects:
df_train_num.head(5)
Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143
df_train_genre.head(5)
157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object
machine-learning scikit-learn nlp pandas
New contributor
$endgroup$
add a comment |
$begingroup$
Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:
svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)
This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error
Here is a view of the two pandas objects:
df_train_num.head(5)
Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143
df_train_genre.head(5)
157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object
machine-learning scikit-learn nlp pandas
New contributor
$endgroup$
Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:
svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)
This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error
Here is a view of the two pandas objects:
df_train_num.head(5)
Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143
df_train_genre.head(5)
157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object
machine-learning scikit-learn nlp pandas
machine-learning scikit-learn nlp pandas
New contributor
New contributor
New contributor
asked Mar 25 at 20:33
Mudit JhaMudit Jha
161
161
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.
$endgroup$
add a comment |
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.
$endgroup$
add a comment |
$begingroup$
I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.
$endgroup$
add a comment |
$begingroup$
I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.
$endgroup$
I think you should try pd.get_dummies to code the categories; which will create new columns in dataframe and then use that df to pass it to the classifier.
answered Mar 26 at 6:16
Cini09Cini09
166
166
add a comment |
add a comment |
Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown