How correctly assign weights to minority class or samples in ANN?How to set class weights for imbalanced classes in Keras?Setting class weights for categorical labels in Keras using generatorHow to set weights in multi-class classification in xgboost for imbalanced data?Setting class-weights in a batch in where a certain class is not presentHow to deal with unbalanced class in biological datasets?Class weights for imbalanced data in multilabel problemsHow to inform class weights when using `tensorflow.python.keras.estimator.model_to_estimator` to convert Keras Models to Estimator API?Class weights for time-series data with imbalanced classesIN CIFAR 10 DATASETCNN - imbalanced classes, class weights vs data augmentation

Why do ¬, ∀ and ∃ have the same precedence?

A variation to the phrase "hanging over my shoulders"

Is there a nicer/politer/more positive alternative for "negates"?

Is this toilet slogan correct usage of the English language?

What (the heck) is a Super Worm Equinox Moon?

Creating two special characters

"before" and "want" for the same systemd service?

Why should universal income be universal?

How to make money from a browser who sees 5 seconds into the future of any web page?

How to get directions in deep space?

Mimic lecturing on blackboard, facing audience

Which Article Helped Get Rid of Technobabble in RPGs?

What fields between the rationals and the reals allow a good notion of 2D distance?

How do I fix the group tension caused by my character stealing and possibly killing without provocation?

Why does AES have exactly 10 rounds for a 128-bit key, 12 for 192 bits and 14 for a 256-bit key size?

Is it ethical to recieve stipend after publishing enough papers?

How to preserve electronics (computers, iPads and phones) for hundreds of years

How to convince somebody that he is fit for something else, but not this job?

Is there a RAID 0 Equivalent for RAM?

Does the reader need to like the PoV character?

Shouldn’t conservatives embrace universal basic income?

Biological Blimps: Propulsion

Doesn't the system of the Supreme Court oppose justice?

What do you call a word that can be spelled forward or backward forming two different words



How correctly assign weights to minority class or samples in ANN?


How to set class weights for imbalanced classes in Keras?Setting class weights for categorical labels in Keras using generatorHow to set weights in multi-class classification in xgboost for imbalanced data?Setting class-weights in a batch in where a certain class is not presentHow to deal with unbalanced class in biological datasets?Class weights for imbalanced data in multilabel problemsHow to inform class weights when using `tensorflow.python.keras.estimator.model_to_estimator` to convert Keras Models to Estimator API?Class weights for time-series data with imbalanced classesIN CIFAR 10 DATASETCNN - imbalanced classes, class weights vs data augmentation













1












$begingroup$


Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:



from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
data1=[]
data2=[]
mu, sigma = 0, 1

for i in range(normal_size):
x=np.random.normal(mu, sigma, w)
data1.append(x)

for i in range(abnormal_size):
y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
data2.append(y)


data1=np.array(data1)
data2=np.array(data2)


data=np.concatenate((data1, data2), axis=0)

labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
labels=labels.reshape(-1,1)

Final_Data=np.concatenate((data, labels), axis=1)
return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1)
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)


Based on the class_weight function, class weights are 10 and 0.52 for the abnormal and normal class respectively.
Whether given different weight or not did not change the performance of the model. Moreover, I have tried to give much more weight (1e+6) to abnormal class, but nothing changed. Model is not able to learn.



Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.



So, what I am doing wrong or why the weighting strategy is not working properly in my case.










share|improve this question











$endgroup$











  • $begingroup$
    Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
    $endgroup$
    – Esmailian
    Mar 18 at 10:28










  • $begingroup$
    I have fixed the error.
    $endgroup$
    – Ram
    Mar 18 at 11:19















1












$begingroup$


Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:



from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
data1=[]
data2=[]
mu, sigma = 0, 1

for i in range(normal_size):
x=np.random.normal(mu, sigma, w)
data1.append(x)

for i in range(abnormal_size):
y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
data2.append(y)


data1=np.array(data1)
data2=np.array(data2)


data=np.concatenate((data1, data2), axis=0)

labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
labels=labels.reshape(-1,1)

Final_Data=np.concatenate((data, labels), axis=1)
return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1)
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)


Based on the class_weight function, class weights are 10 and 0.52 for the abnormal and normal class respectively.
Whether given different weight or not did not change the performance of the model. Moreover, I have tried to give much more weight (1e+6) to abnormal class, but nothing changed. Model is not able to learn.



Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.



So, what I am doing wrong or why the weighting strategy is not working properly in my case.










share|improve this question











$endgroup$











  • $begingroup$
    Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
    $endgroup$
    – Esmailian
    Mar 18 at 10:28










  • $begingroup$
    I have fixed the error.
    $endgroup$
    – Ram
    Mar 18 at 11:19













1












1








1





$begingroup$


Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:



from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
data1=[]
data2=[]
mu, sigma = 0, 1

for i in range(normal_size):
x=np.random.normal(mu, sigma, w)
data1.append(x)

for i in range(abnormal_size):
y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
data2.append(y)


data1=np.array(data1)
data2=np.array(data2)


data=np.concatenate((data1, data2), axis=0)

labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
labels=labels.reshape(-1,1)

Final_Data=np.concatenate((data, labels), axis=1)
return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1)
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)


Based on the class_weight function, class weights are 10 and 0.52 for the abnormal and normal class respectively.
Whether given different weight or not did not change the performance of the model. Moreover, I have tried to give much more weight (1e+6) to abnormal class, but nothing changed. Model is not able to learn.



Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.



So, what I am doing wrong or why the weighting strategy is not working properly in my case.










share|improve this question











$endgroup$




Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:



from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
data1=[]
data2=[]
mu, sigma = 0, 1

for i in range(normal_size):
x=np.random.normal(mu, sigma, w)
data1.append(x)

for i in range(abnormal_size):
y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
data2.append(y)


data1=np.array(data1)
data2=np.array(data2)


data=np.concatenate((data1, data2), axis=0)

labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
labels=labels.reshape(-1,1)

Final_Data=np.concatenate((data, labels), axis=1)
return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1)
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)


Based on the class_weight function, class weights are 10 and 0.52 for the abnormal and normal class respectively.
Whether given different weight or not did not change the performance of the model. Moreover, I have tried to give much more weight (1e+6) to abnormal class, but nothing changed. Model is not able to learn.



Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.



So, what I am doing wrong or why the weighting strategy is not working properly in my case.







python classification keras class-imbalance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 18 at 11:26







Ram

















asked Mar 18 at 8:19









RamRam

214




214











  • $begingroup$
    Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
    $endgroup$
    – Esmailian
    Mar 18 at 10:28










  • $begingroup$
    I have fixed the error.
    $endgroup$
    – Ram
    Mar 18 at 11:19
















  • $begingroup$
    Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
    $endgroup$
    – Esmailian
    Mar 18 at 10:28










  • $begingroup$
    I have fixed the error.
    $endgroup$
    – Ram
    Mar 18 at 11:19















$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28




$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28












$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19




$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19










2 Answers
2






active

oldest

votes


















2












$begingroup$

Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.



There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.



Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





share|improve this answer









$endgroup$












  • $begingroup$
    Thank for the detailed explanation. I am going to try what you said.
    $endgroup$
    – Ram
    Mar 18 at 17:45










  • $begingroup$
    Please do not hesitate to contact me if you need any further information.
    $endgroup$
    – Alireza Zolanvari
    Mar 18 at 18:16


















1












$begingroup$

Note that there is a large fluctuation in the errors after each run.



I have changed



class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])


to



class_weight = np.array([1000, 1])


which resulted in val_loss around (0.09, 0.15)



and to



class_weight = np.array([1, 1000])


which resulted in val_loss around (0.06, 0.1)



So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.






share|improve this answer











$endgroup$












  • $begingroup$
    Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
    $endgroup$
    – Ram
    Mar 18 at 12:08











  • $begingroup$
    @Ram updated the answer
    $endgroup$
    – Esmailian
    Mar 18 at 14:59










  • $begingroup$
    Thanks. So, I will run it multiple times and see how results will change.
    $endgroup$
    – Ram
    Mar 18 at 17:54










Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47502%2fhow-correctly-assign-weights-to-minority-class-or-samples-in-ann%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.



There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.



Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





share|improve this answer









$endgroup$












  • $begingroup$
    Thank for the detailed explanation. I am going to try what you said.
    $endgroup$
    – Ram
    Mar 18 at 17:45










  • $begingroup$
    Please do not hesitate to contact me if you need any further information.
    $endgroup$
    – Alireza Zolanvari
    Mar 18 at 18:16















2












$begingroup$

Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.



There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.



Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





share|improve this answer









$endgroup$












  • $begingroup$
    Thank for the detailed explanation. I am going to try what you said.
    $endgroup$
    – Ram
    Mar 18 at 17:45










  • $begingroup$
    Please do not hesitate to contact me if you need any further information.
    $endgroup$
    – Alireza Zolanvari
    Mar 18 at 18:16













2












2








2





$begingroup$

Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.



There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.



Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





share|improve this answer









$endgroup$



Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.



There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.



Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 18 at 15:09









Alireza ZolanvariAlireza Zolanvari

35716




35716











  • $begingroup$
    Thank for the detailed explanation. I am going to try what you said.
    $endgroup$
    – Ram
    Mar 18 at 17:45










  • $begingroup$
    Please do not hesitate to contact me if you need any further information.
    $endgroup$
    – Alireza Zolanvari
    Mar 18 at 18:16
















  • $begingroup$
    Thank for the detailed explanation. I am going to try what you said.
    $endgroup$
    – Ram
    Mar 18 at 17:45










  • $begingroup$
    Please do not hesitate to contact me if you need any further information.
    $endgroup$
    – Alireza Zolanvari
    Mar 18 at 18:16















$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45




$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45












$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16




$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16











1












$begingroup$

Note that there is a large fluctuation in the errors after each run.



I have changed



class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])


to



class_weight = np.array([1000, 1])


which resulted in val_loss around (0.09, 0.15)



and to



class_weight = np.array([1, 1000])


which resulted in val_loss around (0.06, 0.1)



So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.






share|improve this answer











$endgroup$












  • $begingroup$
    Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
    $endgroup$
    – Ram
    Mar 18 at 12:08











  • $begingroup$
    @Ram updated the answer
    $endgroup$
    – Esmailian
    Mar 18 at 14:59










  • $begingroup$
    Thanks. So, I will run it multiple times and see how results will change.
    $endgroup$
    – Ram
    Mar 18 at 17:54















1












$begingroup$

Note that there is a large fluctuation in the errors after each run.



I have changed



class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])


to



class_weight = np.array([1000, 1])


which resulted in val_loss around (0.09, 0.15)



and to



class_weight = np.array([1, 1000])


which resulted in val_loss around (0.06, 0.1)



So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.






share|improve this answer











$endgroup$












  • $begingroup$
    Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
    $endgroup$
    – Ram
    Mar 18 at 12:08











  • $begingroup$
    @Ram updated the answer
    $endgroup$
    – Esmailian
    Mar 18 at 14:59










  • $begingroup$
    Thanks. So, I will run it multiple times and see how results will change.
    $endgroup$
    – Ram
    Mar 18 at 17:54













1












1








1





$begingroup$

Note that there is a large fluctuation in the errors after each run.



I have changed



class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])


to



class_weight = np.array([1000, 1])


which resulted in val_loss around (0.09, 0.15)



and to



class_weight = np.array([1, 1000])


which resulted in val_loss around (0.06, 0.1)



So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.






share|improve this answer











$endgroup$



Note that there is a large fluctuation in the errors after each run.



I have changed



class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])


to



class_weight = np.array([1000, 1])


which resulted in val_loss around (0.09, 0.15)



and to



class_weight = np.array([1, 1000])


which resulted in val_loss around (0.06, 0.1)



So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 18 at 16:10

























answered Mar 18 at 11:51









EsmailianEsmailian

1,641114




1,641114











  • $begingroup$
    Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
    $endgroup$
    – Ram
    Mar 18 at 12:08











  • $begingroup$
    @Ram updated the answer
    $endgroup$
    – Esmailian
    Mar 18 at 14:59










  • $begingroup$
    Thanks. So, I will run it multiple times and see how results will change.
    $endgroup$
    – Ram
    Mar 18 at 17:54
















  • $begingroup$
    Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
    $endgroup$
    – Ram
    Mar 18 at 12:08











  • $begingroup$
    @Ram updated the answer
    $endgroup$
    – Esmailian
    Mar 18 at 14:59










  • $begingroup$
    Thanks. So, I will run it multiple times and see how results will change.
    $endgroup$
    – Ram
    Mar 18 at 17:54















$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08





$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08













$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59




$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59












$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54




$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47502%2fhow-correctly-assign-weights-to-minority-class-or-samples-in-ann%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High