Dealing with biased binary classifier Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsPython: Handling imbalance Classes in python Machine LearningWhat is the best strategy to use on data with many classification labels?unbalanced data classificationComparing SMOTE to down sampling the majority class in imbalanced binary classificationWhy MLP only learns bias for unbalanced binary classification?When do we say that the dataset is not classifiable?Dealing with small number of examples in hierarchical text classificationMy naive (ha!) Gaussian Naive Bayes classifier is too slowBalancing XGboost still skews towards the majority class
When does a function NOT have an antiderivative?
My mentor says to set image to Fine instead of RAW — how is this different from JPG?
Twin's vs. Twins'
How does the body cool itself in a stillsuit?
French equivalents of おしゃれは足元から (Every good outfit starts with the shoes)
One-one communication
Was the pager message from Nick Fury to Captain Marvel unnecessary?
How to make triangles with rounded sides and corners? (squircle with 3 sides)
Can two people see the same photon?
The Nth Gryphon Number
Does the universe have a fixed centre of mass?
Why are current probes so expensive?
Problem with display of presentation
Inverse square law not accurate for non-point masses?
Does the main washing effect of soap come from foam?
As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?
Short story about astronauts fertilizing soil with their own bodies
Keep at all times, the minus sign above aligned with minus sign below
Why did Bronn offer to be Tyrion Lannister's champion in trial by combat?
The test team as an enemy of development? And how can this be avoided?
What was the last profitable war?
Can the Haste spell grant both a Beast Master ranger and their animal companion extra attacks?
Where and when has Thucydides been studied?
malloc in main() or malloc in another function: allocating memory for a struct and its members
Dealing with biased binary classifier
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsPython: Handling imbalance Classes in python Machine LearningWhat is the best strategy to use on data with many classification labels?unbalanced data classificationComparing SMOTE to down sampling the majority class in imbalanced binary classificationWhy MLP only learns bias for unbalanced binary classification?When do we say that the dataset is not classifiable?Dealing with small number of examples in hierarchical text classificationMy naive (ha!) Gaussian Naive Bayes classifier is too slowBalancing XGboost still skews towards the majority class
$begingroup$
My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.
Confusion Matrix -
[[333 133]
[ 62 612]]
I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.
My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?
I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.
machine-learning classification class-imbalance
$endgroup$
add a comment |
$begingroup$
My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.
Confusion Matrix -
[[333 133]
[ 62 612]]
I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.
My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?
I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.
machine-learning classification class-imbalance
$endgroup$
add a comment |
$begingroup$
My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.
Confusion Matrix -
[[333 133]
[ 62 612]]
I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.
My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?
I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.
machine-learning classification class-imbalance
$endgroup$
My training data is weighed heavier on the '1' class, with about a 4:6 ratio. This outputs a classifier that is of 82% accuracy with an emphasis on the '1' class, which makes sense.
Confusion Matrix -
[[333 133]
[ 62 612]]
I have the test proportions as well, in which the data will be tested on, which is 0.3 of '1' and 0.7 of '0' or 1900 0s and 900 1s. My classifier outputs 1400 1s and 1300 0s.
My theory is that I need to build a classifier that favours the '0', If so how can I make the classifier biased to one class over another?
I have tried to used the class weights, this does increase the '0' predictions but only by a very small percentage.
machine-learning classification class-imbalance
machine-learning classification class-imbalance
edited Apr 3 at 16:20
Tasos
1,59511138
1,59511138
asked Apr 3 at 15:47
NickNick
111
111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
What you have in your data called imbalanced classes
From Datacamp
Imbalanced data typically refers to classification tasks where the
classes are not represented equally.
For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.
In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data
One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48535%2fdealing-with-biased-binary-classifier%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
What you have in your data called imbalanced classes
From Datacamp
Imbalanced data typically refers to classification tasks where the
classes are not represented equally.
For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.
In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data
One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
$endgroup$
add a comment |
$begingroup$
What you have in your data called imbalanced classes
From Datacamp
Imbalanced data typically refers to classification tasks where the
classes are not represented equally.
For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.
In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data
One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
$endgroup$
add a comment |
$begingroup$
What you have in your data called imbalanced classes
From Datacamp
Imbalanced data typically refers to classification tasks where the
classes are not represented equally.
For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.
In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data
One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
$endgroup$
What you have in your data called imbalanced classes
From Datacamp
Imbalanced data typically refers to classification tasks where the
classes are not represented equally.
For example, you may have a binary classification problem with 100
instances out of which 80 instances are labeled with Class-1, and the
remaining 20 instances are marked with Class-2.
In this link, you can find a nice article that explains more what it is and how you can handle it -> https://www.datacamp.com/community/tutorials/diving-deep-imbalanced-data
One of the solutions is to use over or under-sampling. You can achieve this with the SMOTE algorithm. Here is an example in Python.
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sm = SMOTE(random_state=2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
answered Apr 3 at 16:22
TasosTasos
1,59511138
1,59511138
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48535%2fdealing-with-biased-binary-classifier%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown