Dealing with biased binary classifier Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPython: Handling imbalance Classes in python Machine LearningWhat is the best strategy to use on data with many classification labels?unbalanced data classificationComparing SMOTE to down sampling the majority class in imbalanced binary classificationWhy MLP only learns bias for unbalanced binary classification?When do we say that the dataset is not classifiable?Dealing with small number of examples in hierarchical text classificationMy naive (ha!) Gaussian Naive Bayes classifier is too slowBalancing XGboost still skews towards the majority class

When does a function NOT have an antiderivative?

My mentor says to set image to Fine instead of RAW — how is this different from JPG?

Twin's vs. Twins'

How does the body cool itself in a stillsuit?

French equivalents of おしゃれは足元から (Every good outfit starts with the shoes)

One-one communication

Was the pager message from Nick Fury to Captain Marvel unnecessary?

How to make triangles with rounded sides and corners? (squircle with 3 sides)

Can two people see the same photon?

The Nth Gryphon Number

Does the universe have a fixed centre of mass?

Why are current probes so expensive?

Problem with display of presentation

Inverse square law not accurate for non-point masses?

Does the main washing effect of soap come from foam?

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

Short story about astronauts fertilizing soil with their own bodies

Keep at all times, the minus sign above aligned with minus sign below

Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?

The test team as an enemy of development? And how can this be avoided?

What was the last profitable war?

Can the Haste spell grant both a Beast Master ranger and their animal companion extra attacks?

Where and when has Thucydides been studied?

malloc in main() or malloc in another function: allocating memory for a struct and its members

Dealing with biased binary classifier

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsPython: Handling imbalance Classes in python Machine LearningWhat is the best strategy to use on data with many classification labels?unbalanced data classificationComparing SMOTE to down sampling the majority class in imbalanced binary classificationWhy MLP only learns bias for unbalanced binary classification?When do we say that the dataset is not classifiable?Dealing with small number of examples in hierarchical text classificationMy naive (ha!) Gaussian Naive Bayes classifier is too slowBalancing XGboost still skews towards the majority class

My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.

Confusion Matrix - 
[[333 133]
 [ 62 612]]

I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.

My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?

I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

add a comment |

My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.

Confusion Matrix - 
[[333 133]
 [ 62 612]]

I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.

My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?

I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

add a comment |

My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.

Confusion Matrix - 
[[333 133]
 [ 62 612]]

I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.

My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?

I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.

Confusion Matrix - 
[[333 133]
 [ 62 612]]

I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.

My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?

I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.

machine-learning classification class-imbalance

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

edited Apr 3 at 16:20

Tasos

1,59511138

edited Apr 3 at 16:20

Tasos

1,59511138

edited Apr 3 at 16:20

Tasos

1,59511138

asked Apr 3 at 15:47

Nick

111

asked Apr 3 at 15:47

Nick

111

asked Apr 3 at 15:47

Nick

111

add a comment |

1 Answer
1

active

oldest

votes

What you have in your data called imbalanced classes

From Datacamp

Imbalanced data typically refers to classification tasks where the
classes are not represented equally.

For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.

In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data

One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())

answered Apr 3 at 16:22

Tasos

1,59511138

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48535%2fdealing-with-biased-binary-classifier%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

What you have in your data called imbalanced classes

From Datacamp

Imbalanced data typically refers to classification tasks where the
classes are not represented equally.

For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.

In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data

One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())

answered Apr 3 at 16:22

Tasos

1,59511138

add a comment |

What you have in your data called imbalanced classes

From Datacamp

Imbalanced data typically refers to classification tasks where the
classes are not represented equally.

For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.

In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data

One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())

answered Apr 3 at 16:22

Tasos

1,59511138

add a comment |

What you have in your data called imbalanced classes

From Datacamp

Imbalanced data typically refers to classification tasks where the
classes are not represented equally.

For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.

In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data

One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())

answered Apr 3 at 16:22

Tasos

1,59511138

What you have in your data called imbalanced classes

From Datacamp

Imbalanced data typically refers to classification tasks where the
classes are not represented equally.

For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.

In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data

One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.

from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())

answered Apr 3 at 16:22

Tasos

1,59511138

answered Apr 3 at 16:22

Tasos

1,59511138

answered Apr 3 at 16:22

Tasos

1,59511138

answered Apr 3 at 16:22

Tasos

1,59511138

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1