How to favour a particular class during classification using XGBoost?Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?
Knife as defense against stray dogs
HP P840 HDD RAID 5 many strange drive failures
Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?
Should I be concerned about student access to a test bank?
Using Past-Perfect interchangeably with the Past Continuous
Would it be believable to defy demographics in a story?
Do I need to consider instance restrictions when showing a language is in P?
In what cases must I use 了 and in what cases not?
Writing in a Christian voice
What does "Four-F." mean?
What does Jesus mean regarding "Raca," and "you fool?" - is he contrasting them?
How are passwords stolen from companies if they only store hashes?
Do US professors/group leaders only get a salary, but no group budget?
Turning a hard to access nut?
Optimising a list searching algorithm
Is there a term for accumulated dirt on the outside of your hands and feet?
Are dual Irish/British citizens bound by the 90/180 day rule when travelling in the EU after Brexit?
Print last inputted byte
How could an airship be repaired midflight?
Can a medieval gyroplane be built?
Probably overheated black color SMD pads
What are substitutions for coconut in curry?
What exactly term 'companion plants' means?
How difficult is it to simply disable/disengage the MCAS on Boeing 737 Max 8 & 9 Aircraft?
How to favour a particular class during classification using XGBoost?
Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?
$begingroup$
I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.
I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).
from xgboost import XGBClassifier
# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s
# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)
print(y_pred)
I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.
UPDATE:
I have now tried to use scale_pos_weight
in the XGBoost, with its value set to 0.70
(a random figure), but it is still landing most samples to 0, instead of 1.
machine-learning python bigdata xgboost
$endgroup$
add a comment |
$begingroup$
I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.
I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).
from xgboost import XGBClassifier
# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s
# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)
print(y_pred)
I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.
UPDATE:
I have now tried to use scale_pos_weight
in the XGBoost, with its value set to 0.70
(a random figure), but it is still landing most samples to 0, instead of 1.
machine-learning python bigdata xgboost
$endgroup$
add a comment |
$begingroup$
I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.
I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).
from xgboost import XGBClassifier
# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s
# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)
print(y_pred)
I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.
UPDATE:
I have now tried to use scale_pos_weight
in the XGBoost, with its value set to 0.70
(a random figure), but it is still landing most samples to 0, instead of 1.
machine-learning python bigdata xgboost
$endgroup$
I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.
I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).
from xgboost import XGBClassifier
# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s
# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)
print(y_pred)
I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.
UPDATE:
I have now tried to use scale_pos_weight
in the XGBoost, with its value set to 0.70
(a random figure), but it is still landing most samples to 0, instead of 1.
machine-learning python bigdata xgboost
machine-learning python bigdata xgboost
edited yesterday
JChat
asked 2 days ago
JChatJChat
154
154
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
XGBoost has the scale_pos_weight
parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).
An example in Python is here.
New contributor
$endgroup$
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47387%2fhow-to-favour-a-particular-class-during-classification-using-xgboost%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
XGBoost has the scale_pos_weight
parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).
An example in Python is here.
New contributor
$endgroup$
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
add a comment |
$begingroup$
XGBoost has the scale_pos_weight
parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).
An example in Python is here.
New contributor
$endgroup$
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
add a comment |
$begingroup$
XGBoost has the scale_pos_weight
parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).
An example in Python is here.
New contributor
$endgroup$
XGBoost has the scale_pos_weight
parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).
An example in Python is here.
New contributor
edited yesterday
New contributor
answered 2 days ago
wwwslingerwwwslinger
1183
1183
New contributor
New contributor
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
add a comment |
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47387%2fhow-to-favour-a-particular-class-during-classification-using-xgboost%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown