Very large number of classes for text classification in keras (and not multi-label classsification)2019 Community Moderator ElectionText classification with thousands of output classes in KerasMulti-class text classification with LSTM in KerasTensorflow regression predicting 1 for all inputsHow does keras calculate accuracy for multi label classification?Multi-task learning for Multi-label classification?Organization of layers in Keras for a NLP problemMulti task learning architecture for Multi-label classificationMulti-label classifciation: keras custom metricsMulti Class Classification on large dataset with over 600 classesIN CIFAR 10 DATASETMulti label classification and sigmoid function
Mathematica command that allows it to read my intentions
Ambiguity in the definition of entropy
Personal Teleportation: From Rags to Riches
How to prevent "they're falling in love" trope
Forgetting the musical notes while performing in concert
Is it acceptable for a professor to tell male students to not think that they are smarter than female students?
Is there an expression that means doing something right before you will need it rather than doing it in case you might need it?
What's the point of deactivating Num Lock on login screens?
Difference between sprint backlog and sprint goal?
What method can I use to design a dungeon difficult enough that the PCs can't make it through without killing them?
90's TV series where a boy goes to another dimension through portal near power lines
Why does ы have a soft sign in it?
Why are the 737's rear doors unusable in a water landing?
What about the virus in 12 Monkeys?
Dealing with conflict between co-workers for non-work-related issue affecting their work
Asymptotics of orbits on graphs
Alternative to sending password over mail?
Little known, relatively unlikely, but scientifically plausible, apocalyptic (or near apocalyptic) events
Can I run a new neutral wire to repair a broken circuit?
What is the difference between 仮定 and 想定?
Assassin's bullet with mercury
How can saying a song's name be a copyright violation?
Saudi Arabia Transit Visa
ssTTsSTtRrriinInnnnNNNIiinngg
Very large number of classes for text classification in keras (and not multi-label classsification)
2019 Community Moderator ElectionText classification with thousands of output classes in KerasMulti-class text classification with LSTM in KerasTensorflow regression predicting 1 for all inputsHow does keras calculate accuracy for multi label classification?Multi-task learning for Multi-label classification?Organization of layers in Keras for a NLP problemMulti task learning architecture for Multi-label classificationMulti-label classifciation: keras custom metricsMulti Class Classification on large dataset with over 600 classesIN CIFAR 10 DATASETMulti label classification and sigmoid function
$begingroup$
I am trying to apply text classification with keras. Previously, I used Random Forest and lightgbm with accurancy score arround 65%. Then, I made some first attempts with neural networks, to improve my score, but the results are really bad (score less than 20%).
After searching, I have found approaches for multi-label classification when trying to predict multiple labeles for each input texte, such as the following
multi-label text classification
But this not at all my case!
My dataset is not extremely unbalanced but I have hundreds of classes to predict. I want to predict only one class per text.
After many attempts to find the right values to tune the neural network, I wonder if NN actually do not performe well in this case? Why the Random Forest out performs with the minimum hyper-parametre tuning? Maybe I should transform my problem into a multi-label binary classification?
This is a code example, taken from kaggle:
data = pd.read_csv('multiClassExample.csv', delimiter=';', encoding="latin-1")
enc=LabelEncoder()
enc.fit(data['typeChord'])
data['typeChord']=enc.transform(data['typeChord'])
labels = to_categorical(data['typeChord'], num_classes=len(data.typeChord.unique()))
n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, lower=True)
tokenizer.fit_on_texts(data['text'].values)
sequences = tokenizer.texts_to_sequences(data['text'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = pad_sequences(sequences, maxlen=max_len)
X_train, X_test, y_train, y_test = train_test_split(X , labels, test_size=0.25, random_state=42)
epochs = 20
emb_dim = 128
batch_size = 256
print(labels[:2])
print((X_train.shape, y_train.shape, X_test.shape, y_test.shape))
model = Sequential()
model.add(Embedding(n_most_common_words, emb_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.7))
model.add(LSTM(64, dropout=0.7, recurrent_dropout=0.7))
model.add(Dense(len(data.typeChord.unique()), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss',patience=7, min_delta=0.0001)])
accr = model.evaluate(X_test,y_test)
print('Test setn Loss: :0.3fn Accuracy: :0.3f'.format(accr[0],accr[1]))
neural-network keras random-forest multiclass-classification
$endgroup$
add a comment |
$begingroup$
I am trying to apply text classification with keras. Previously, I used Random Forest and lightgbm with accurancy score arround 65%. Then, I made some first attempts with neural networks, to improve my score, but the results are really bad (score less than 20%).
After searching, I have found approaches for multi-label classification when trying to predict multiple labeles for each input texte, such as the following
multi-label text classification
But this not at all my case!
My dataset is not extremely unbalanced but I have hundreds of classes to predict. I want to predict only one class per text.
After many attempts to find the right values to tune the neural network, I wonder if NN actually do not performe well in this case? Why the Random Forest out performs with the minimum hyper-parametre tuning? Maybe I should transform my problem into a multi-label binary classification?
This is a code example, taken from kaggle:
data = pd.read_csv('multiClassExample.csv', delimiter=';', encoding="latin-1")
enc=LabelEncoder()
enc.fit(data['typeChord'])
data['typeChord']=enc.transform(data['typeChord'])
labels = to_categorical(data['typeChord'], num_classes=len(data.typeChord.unique()))
n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, lower=True)
tokenizer.fit_on_texts(data['text'].values)
sequences = tokenizer.texts_to_sequences(data['text'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = pad_sequences(sequences, maxlen=max_len)
X_train, X_test, y_train, y_test = train_test_split(X , labels, test_size=0.25, random_state=42)
epochs = 20
emb_dim = 128
batch_size = 256
print(labels[:2])
print((X_train.shape, y_train.shape, X_test.shape, y_test.shape))
model = Sequential()
model.add(Embedding(n_most_common_words, emb_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.7))
model.add(LSTM(64, dropout=0.7, recurrent_dropout=0.7))
model.add(Dense(len(data.typeChord.unique()), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss',patience=7, min_delta=0.0001)])
accr = model.evaluate(X_test,y_test)
print('Test setn Loss: :0.3fn Accuracy: :0.3f'.format(accr[0],accr[1]))
neural-network keras random-forest multiclass-classification
$endgroup$
add a comment |
$begingroup$
I am trying to apply text classification with keras. Previously, I used Random Forest and lightgbm with accurancy score arround 65%. Then, I made some first attempts with neural networks, to improve my score, but the results are really bad (score less than 20%).
After searching, I have found approaches for multi-label classification when trying to predict multiple labeles for each input texte, such as the following
multi-label text classification
But this not at all my case!
My dataset is not extremely unbalanced but I have hundreds of classes to predict. I want to predict only one class per text.
After many attempts to find the right values to tune the neural network, I wonder if NN actually do not performe well in this case? Why the Random Forest out performs with the minimum hyper-parametre tuning? Maybe I should transform my problem into a multi-label binary classification?
This is a code example, taken from kaggle:
data = pd.read_csv('multiClassExample.csv', delimiter=';', encoding="latin-1")
enc=LabelEncoder()
enc.fit(data['typeChord'])
data['typeChord']=enc.transform(data['typeChord'])
labels = to_categorical(data['typeChord'], num_classes=len(data.typeChord.unique()))
n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, lower=True)
tokenizer.fit_on_texts(data['text'].values)
sequences = tokenizer.texts_to_sequences(data['text'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = pad_sequences(sequences, maxlen=max_len)
X_train, X_test, y_train, y_test = train_test_split(X , labels, test_size=0.25, random_state=42)
epochs = 20
emb_dim = 128
batch_size = 256
print(labels[:2])
print((X_train.shape, y_train.shape, X_test.shape, y_test.shape))
model = Sequential()
model.add(Embedding(n_most_common_words, emb_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.7))
model.add(LSTM(64, dropout=0.7, recurrent_dropout=0.7))
model.add(Dense(len(data.typeChord.unique()), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss',patience=7, min_delta=0.0001)])
accr = model.evaluate(X_test,y_test)
print('Test setn Loss: :0.3fn Accuracy: :0.3f'.format(accr[0],accr[1]))
neural-network keras random-forest multiclass-classification
$endgroup$
I am trying to apply text classification with keras. Previously, I used Random Forest and lightgbm with accurancy score arround 65%. Then, I made some first attempts with neural networks, to improve my score, but the results are really bad (score less than 20%).
After searching, I have found approaches for multi-label classification when trying to predict multiple labeles for each input texte, such as the following
multi-label text classification
But this not at all my case!
My dataset is not extremely unbalanced but I have hundreds of classes to predict. I want to predict only one class per text.
After many attempts to find the right values to tune the neural network, I wonder if NN actually do not performe well in this case? Why the Random Forest out performs with the minimum hyper-parametre tuning? Maybe I should transform my problem into a multi-label binary classification?
This is a code example, taken from kaggle:
data = pd.read_csv('multiClassExample.csv', delimiter=';', encoding="latin-1")
enc=LabelEncoder()
enc.fit(data['typeChord'])
data['typeChord']=enc.transform(data['typeChord'])
labels = to_categorical(data['typeChord'], num_classes=len(data.typeChord.unique()))
n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, lower=True)
tokenizer.fit_on_texts(data['text'].values)
sequences = tokenizer.texts_to_sequences(data['text'].values)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X = pad_sequences(sequences, maxlen=max_len)
X_train, X_test, y_train, y_test = train_test_split(X , labels, test_size=0.25, random_state=42)
epochs = 20
emb_dim = 128
batch_size = 256
print(labels[:2])
print((X_train.shape, y_train.shape, X_test.shape, y_test.shape))
model = Sequential()
model.add(Embedding(n_most_common_words, emb_dim, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.7))
model.add(LSTM(64, dropout=0.7, recurrent_dropout=0.7))
model.add(Dense(len(data.typeChord.unique()), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
print(model.summary())
history = model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss',patience=7, min_delta=0.0001)])
accr = model.evaluate(X_test,y_test)
print('Test setn Loss: :0.3fn Accuracy: :0.3f'.format(accr[0],accr[1]))
neural-network keras random-forest multiclass-classification
neural-network keras random-forest multiclass-classification
asked Mar 26 at 16:26
user2307229user2307229
11
11
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48042%2fvery-large-number-of-classes-for-text-classification-in-keras-and-not-multi-lab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48042%2fvery-large-number-of-classes-for-text-classification-in-keras-and-not-multi-lab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown