How correctly assign weights to minority class or samples in ANN?How to set class weights for imbalanced classes in Keras?Setting class weights for categorical labels in Keras using generatorHow to set weights in multi-class classification in xgboost for imbalanced data?Setting class-weights in a batch in where a certain class is not presentHow to deal with unbalanced class in biological datasets?Class weights for imbalanced data in multilabel problemsHow to inform class weights when using `tensorflow.python.keras.estimator.model_to_estimator` to convert Keras Models to Estimator API?Class weights for time-series data with imbalanced classesIN CIFAR 10 DATASETCNN - imbalanced classes, class weights vs data augmentation

Why do ¬, ∀ and ∃ have the same precedence?

A variation to the phrase "hanging over my shoulders"

Is there a nicer/politer/more positive alternative for "negates"?

Is this toilet slogan correct usage of the English language?

What (the heck) is a Super Worm Equinox Moon?

Creating two special characters

"before" and "want" for the same systemd service?

Why should universal income be universal?

How to make money from a browser who sees 5 seconds into the future of any web page?

How to get directions in deep space?

Mimic lecturing on blackboard, facing audience

Which Article Helped Get Rid of Technobabble in RPGs?

What fields between the rationals and the reals allow a good notion of 2D distance?

How do I fix the group tension caused by my character stealing and possibly killing without provocation?

Why does AES have exactly 10 rounds for a 128-bit key, 12 for 192 bits and 14 for a 256-bit key size?

Is it ethical to recieve stipend after publishing enough papers?

How to preserve electronics (computers, iPads and phones) for hundreds of years

How to convince somebody that he is fit for something else, but not this job?

Is there a RAID 0 Equivalent for RAM?

Does the reader need to like the PoV character?

Shouldn’t conservatives embrace universal basic income?

Biological Blimps: Propulsion

Doesn't the system of the Supreme Court oppose justice?

What do you call a word that can be spelled forward or backward forming two different words

How correctly assign weights to minority class or samples in ANN?

How to set class weights for imbalanced classes in Keras?Setting class weights for categorical labels in Keras using generatorHow to set weights in multi-class classification in xgboost for imbalanced data?Setting class-weights in a batch in where a certain class is not presentHow to deal with unbalanced class in biological datasets?Class weights for imbalanced data in multilabel problemsHow to inform class weights when using `tensorflow.python.keras.estimator.model_to_estimator` to convert Keras Models to Estimator API?Class weights for time-series data with imbalanced classesIN CIFAR 10 DATASETCNN - imbalanced classes, class weights vs data augmentation

Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:

from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
 data1=[]
 data2=[]
 mu, sigma = 0, 1

 for i in range(normal_size):
 x=np.random.normal(mu, sigma, w)
 data1.append(x)

 for i in range(abnormal_size):
 y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
 data2.append(y)


 data1=np.array(data1)
 data2=np.array(data2)


 data=np.concatenate((data1, data2), axis=0)

 labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
 labels=labels.reshape(-1,1)

 Final_Data=np.concatenate((data, labels), axis=1)
 return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1) 
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)

Based on the class_weight function, class weights are 10 and 0.52 for the abnormal and normal class respectively.
Whether given different weight or not did not change the performance of the model. Moreover, I have tried to give much more weight (1e+6) to abnormal class, but nothing changed. Model is not able to learn.

Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.

So, what I am doing wrong or why the weighting strategy is not working properly in my case.

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28

$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19

add a comment |

Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:

from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
 data1=[]
 data2=[]
 mu, sigma = 0, 1

 for i in range(normal_size):
 x=np.random.normal(mu, sigma, w)
 data1.append(x)

 for i in range(abnormal_size):
 y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
 data2.append(y)


 data1=np.array(data1)
 data2=np.array(data2)


 data=np.concatenate((data1, data2), axis=0)

 labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
 labels=labels.reshape(-1,1)

 Final_Data=np.concatenate((data, labels), axis=1)
 return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1) 
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)

Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.

So, what I am doing wrong or why the weighting strategy is not working properly in my case.

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28

$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19

add a comment |

Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:

from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
 data1=[]
 data2=[]
 mu, sigma = 0, 1

 for i in range(normal_size):
 x=np.random.normal(mu, sigma, w)
 data1.append(x)

 for i in range(abnormal_size):
 y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
 data2.append(y)


 data1=np.array(data1)
 data2=np.array(data2)


 data=np.concatenate((data1, data2), axis=0)

 labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
 labels=labels.reshape(-1,1)

 Final_Data=np.concatenate((data, labels), axis=1)
 return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1) 
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)

Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.

So, what I am doing wrong or why the weighting strategy is not working properly in my case.

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

Having an imbalanced dataset. Abnormal class rate is %5. To handle with the problem I have gave extra weight to the abnormal class. However, It did not change anything. Here is my code:

from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
import os
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, BatchNormalization
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.utils import class_weight
from keras import optimizers
from keras.layers import Dropout
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler
from keras import regularizers
from sklearn.utils.class_weight import compute_sample_weight


def GenerateData(w,t,normal_size,abnormal_size):
#w: window length
#t: parameter of abnormal pattern (t=0.6/seperable, t=0.06/partially seperable, t=0.006/inseperable)
 data1=[]
 data2=[]
 mu, sigma = 0, 1

 for i in range(normal_size):
 x=np.random.normal(mu, sigma, w)
 data1.append(x)

 for i in range(abnormal_size):
 y=np.random.normal(mu, sigma, w)+t*(np.arange(w)+1)
 data2.append(y)


 data1=np.array(data1)
 data2=np.array(data2)


 data=np.concatenate((data1, data2), axis=0)

 labels=np.concatenate((np.ones(normal_size),np.zeros(abnormal_size)),axis=0)
 labels=labels.reshape(-1,1)

 Final_Data=np.concatenate((data, labels), axis=1)
 return Final_Data

Final_Data=GenerateData(20,0.06,950,50)
df=pd.DataFrame(Final_Data)

df = df.sample(frac=1).reset_index(drop=True)

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform( X_train )
X_test = scaler.transform( X_test )

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])
#sample_weight = compute_sample_weight(class_weight='balanced', y=y_train)

model = Sequential()
model.add(Dense(8, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(y_train.shape[1],activation='softmax'))
opt=optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-3, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=opt)
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-8, patience=20, verbose=1, mode='auto')
checkpointer = ModelCheckpoint(filepath="best_weights.hdf5", verbose=0, save_best_only=True)
history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,class_weight=class_weight,callbacks=[monitor,checkpointer],epochs=2000)#classes are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,sample_weight=sample_weight,callbacks=[monitor,checkpointer],epochs=2000)# samples are weighted
#history=model.fit(X_train, y_train,validation_data=(X_test, y_test),verbose=2,callbacks=[monitor,checkpointer],epochs=2000)# no weighting

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


model.load_weights('best_weights.hdf5') # load weights from best model


# Calculate accuracy
pred = model.predict(X_test)
pred = np.argmax(pred,axis=1)

y_compare = np.argmax(y_test,axis=1) 
score = metrics.accuracy_score(y_compare, pred)
print("Accuracy score: ".format(score))

cnf_matrix = confusion_matrix(y_compare, pred)

Instead of class_weight method, I have tried compute_sample_weight, but nothing changed.

So, what I am doing wrong or why the weighting strategy is not working properly in my case.

python classification keras class-imbalance

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

edited Mar 18 at 11:26

asked Mar 18 at 8:19

Ram

214

asked Mar 18 at 8:19

Ram

214

asked Mar 18 at 8:19

Ram

214

$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28

$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19

add a comment |

$begingroup$
Running the code gives this error: ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.
$endgroup$
– Esmailian
Mar 18 at 10:28

$begingroup$
I have fixed the error.
$endgroup$
– Ram
Mar 18 at 11:19

Running the code gives this error:

ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.

– Esmailian
Mar 18 at 10:28

Running the code gives this error:

ValueError: Found a sample_weight array with shape (2,) for an input with shape (700, 2). sample_weight cannot be broadcast.

– Esmailian
Mar 18 at 10:28

I have fixed the error.

– Ram
Mar 18 at 11:19

add a comment |

2 Answers
2

active

oldest

votes

Although giving extra weight for handling imbalanced data-set is suggested, it's not a good way. I suggest you use an appropriate loss function for handling imbalanced data-set instead of giving weight to the abnormal class.

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.

Disclaimer:

If you use python, PyCM module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5) 

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

After that, each of these parameters you want to use as the loss function can be used as follows:

>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)

answered Mar 18 at 15:09

Alireza Zolanvari

35716

$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45

$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16

add a comment |

Note that there is a large fluctuation in the errors after each run.

I have changed

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])

class_weight = np.array([1000, 1])

which resulted in val_loss around (0.09, 0.15)

and to

class_weight = np.array([1, 1000])

which resulted in val_loss around (0.06, 0.1)

So class weighting is working correctly and has an effect on the final result, but fluctuation is high. It is better to take an average on multiple runs. The negligible difference in test error simply means that weighting is not that important for this particular task.

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08

$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59

$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47502%2fhow-correctly-assign-weights-to-minority-class-or-samples-in-ann%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.

Disclaimer:

If you use python, PyCM module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5) 

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

After that, each of these parameters you want to use as the loss function can be used as follows:

>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)

answered Mar 18 at 15:09

Alireza Zolanvari

35716

$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45

$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16

add a comment |

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.

Disclaimer:

If you use python, PyCM module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5) 

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

After that, each of these parameters you want to use as the loss function can be used as follows:

>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)

answered Mar 18 at 15:09

Alireza Zolanvari

35716

$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45

$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16

add a comment |

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.

Disclaimer:

If you use python, PyCM module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5) 

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

After that, each of these parameters you want to use as the loss function can be used as follows:

>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)

answered Mar 18 at 15:09

Alireza Zolanvari

35716

There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are Kappa, CEN, MCEN, MCC, and DP.

Disclaimer:

If you use python, PyCM module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5) 

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]

After that, each of these parameters you want to use as the loss function can be used as follows:

>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)

answered Mar 18 at 15:09

Alireza Zolanvari

35716

answered Mar 18 at 15:09

Alireza Zolanvari

35716

answered Mar 18 at 15:09

Alireza Zolanvari

35716

answered Mar 18 at 15:09

Alireza Zolanvari

35716

$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45

$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16

add a comment |

$begingroup$
Thank for the detailed explanation. I am going to try what you said.
$endgroup$
– Ram
Mar 18 at 17:45

$begingroup$
Please do not hesitate to contact me if you need any further information.
$endgroup$
– Alireza Zolanvari
Mar 18 at 18:16

Thank for the detailed explanation. I am going to try what you said.

– Ram
Mar 18 at 17:45

Please do not hesitate to contact me if you need any further information.

– Alireza Zolanvari
Mar 18 at 18:16

add a comment |

Note that there is a large fluctuation in the errors after each run.

I have changed

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])

class_weight = np.array([1000, 1])

which resulted in val_loss around (0.09, 0.15)

and to

class_weight = np.array([1, 1000])

which resulted in val_loss around (0.06, 0.1)

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08

$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59

$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

add a comment |

Note that there is a large fluctuation in the errors after each run.

I have changed

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])

class_weight = np.array([1000, 1])

which resulted in val_loss around (0.09, 0.15)

and to

class_weight = np.array([1, 1000])

which resulted in val_loss around (0.06, 0.1)

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08

$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59

$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

add a comment |

Note that there is a large fluctuation in the errors after each run.

I have changed

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])

class_weight = np.array([1000, 1])

which resulted in val_loss around (0.09, 0.15)

and to

class_weight = np.array([1, 1000])

which resulted in val_loss around (0.06, 0.1)

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

Note that there is a large fluctuation in the errors after each run.

I have changed

class_weight = class_weight.compute_class_weight('balanced', np.unique(y[:,-1]),y[:,-1])

class_weight = np.array([1000, 1])

which resulted in val_loss around (0.09, 0.15)

and to

class_weight = np.array([1, 1000])

which resulted in val_loss around (0.06, 0.1)

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

edited Mar 18 at 16:10

answered Mar 18 at 11:51

Esmailian

1,641114

answered Mar 18 at 11:51

Esmailian

1,641114

answered Mar 18 at 11:51

Esmailian

1,641114

$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08

$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59

$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

add a comment |

$begingroup$
Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced
$endgroup$
– Ram
Mar 18 at 12:08

$begingroup$
@Ram updated the answer
$endgroup$
– Esmailian
Mar 18 at 14:59

$begingroup$
Thanks. So, I will run it multiple times and see how results will change.
$endgroup$
– Ram
Mar 18 at 17:54

Please correct me if I missed something from your comment. But the point is already to get a high accuracy from an imbalanced dataset. There is no problem if the data is balanced

– Ram
Mar 18 at 12:08

@Ram updated the answer

– Esmailian
Mar 18 at 14:59

Thanks. So, I will run it multiple times and see how results will change.

– Ram
Mar 18 at 17:54

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

q87jI5okZ6yLd09MlwTr9h4eLg4 yO4duzycARdQQtdFOP,2oINd,oMWZNBTPbv6 oj,sYVB8bHproIfoBNUEiCDz,A3 WQuht

搜尋此網誌

Trjtdtk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

2 Answers
2

2 Answers
2

2 Answers
2