How to favour a particular class during classification using XGBoost?Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?

Knife as defense against stray dogs

HP P840 HDD RAID 5 many strange drive failures

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

Should I be concerned about student access to a test bank?

Using Past-Perfect interchangeably with the Past Continuous

Would it be believable to defy demographics in a story?

Do I need to consider instance restrictions when showing a language is in P?

In what cases must I use 了 and in what cases not?

Writing in a Christian voice

What does "Four-F." mean?

What does Jesus mean regarding "Raca," and "you fool?" - is he contrasting them?

How are passwords stolen from companies if they only store hashes?

Do US professors/group leaders only get a salary, but no group budget?

Turning a hard to access nut?

Optimising a list searching algorithm

Is there a term for accumulated dirt on the outside of your hands and feet?

Are dual Irish/British citizens bound by the 90/180 day rule when travelling in the EU after Brexit?

Print last inputted byte

How could an airship be repaired midflight?

Can a medieval gyroplane be built?

Probably overheated black color SMD pads

What are substitutions for coconut in curry?

What exactly term 'companion plants' means?

How difficult is it to simply disable/disengage the MCAS on Boeing 737 Max 8 & 9 Aircraft?

How to favour a particular class during classification using XGBoost?

Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?

I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.

I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).

from xgboost import XGBClassifier

# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s

# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)

print(y_pred)

I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.

UPDATE:

I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.

edited yesterday

asked 2 days ago

JChat

154

add a comment |

from xgboost import XGBClassifier

# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s

# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)

print(y_pred)

UPDATE:

I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.

edited yesterday

asked 2 days ago

JChat

154

add a comment |

from xgboost import XGBClassifier

# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s

# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)

print(y_pred)

UPDATE:

I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.

edited yesterday

asked 2 days ago

JChat

154

from xgboost import XGBClassifier

# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s

# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)

print(y_pred)

UPDATE:

I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.

machine-learning python bigdata xgboost

edited yesterday

asked 2 days ago

JChat

154

edited yesterday

asked 2 days ago

JChat

154

edited yesterday

asked 2 days ago

JChat

154

asked 2 days ago

JChat

154

asked 2 days ago

JChat

154

add a comment |

1 Answer
1

active

oldest

votes

XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).

An example in Python is here.

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday

$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday

$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday

$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday

$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47387%2fhow-to-favour-a-particular-class-during-classification-using-xgboost%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

An example in Python is here.

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday

$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday

$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday

$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday

$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday

add a comment |

An example in Python is here.

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday

$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday

$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday

$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday

$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday

add a comment |

An example in Python is here.

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

An example in Python is here.

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

edited yesterday

answered 2 days ago

wwwslinger

1183

New contributor

answered 2 days ago

wwwslinger

1183

answered 2 days ago

wwwslinger

1183

New contributor

wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday

$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday

$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday

$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday

$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday

add a comment |

$begingroup$
Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
$endgroup$
– JChat
yesterday

$begingroup$
Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
$endgroup$
– JChat
yesterday

$begingroup$
The docs reference examples in Python, but I added a link to one in my answer.
$endgroup$
– wwwslinger
yesterday

$begingroup$
Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
$endgroup$
– JChat
yesterday

$begingroup$
The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
$endgroup$
– wwwslinger
yesterday

Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().

– JChat
yesterday

Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.

– JChat
yesterday

The docs reference examples in Python, but I added a link to one in my answer.

– wwwslinger
yesterday

Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.

– JChat
yesterday

The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.

– wwwslinger
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

XeayHN 0,ClO,BzO7ZNssN7Wh,2 aMt4olCiXGYsAqHdnsH Z bahO07ENE NT OiGirVGh hPgt p 5UHFIa6n0E

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1