How to favour a particular class during classification using XGBoost?Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?

Knife as defense against stray dogs

HP P840 HDD RAID 5 many strange drive failures

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

Should I be concerned about student access to a test bank?

Using Past-Perfect interchangeably with the Past Continuous

Would it be believable to defy demographics in a story?

Do I need to consider instance restrictions when showing a language is in P?

In what cases must I use 了 and in what cases not?

Writing in a Christian voice

What does "Four-F." mean?

What does Jesus mean regarding "Raca," and "you fool?" - is he contrasting them?

How are passwords stolen from companies if they only store hashes?

Do US professors/group leaders only get a salary, but no group budget?

Turning a hard to access nut?

Optimising a list searching algorithm

Is there a term for accumulated dirt on the outside of your hands and feet?

Are dual Irish/British citizens bound by the 90/180 day rule when travelling in the EU after Brexit?

Print last inputted byte

How could an airship be repaired midflight?

Can a medieval gyroplane be built?

Probably overheated black color SMD pads

What are substitutions for coconut in curry?

What exactly term 'companion plants' means?

How difficult is it to simply disable/disengage the MCAS on Boeing 737 Max 8 & 9 Aircraft?



How to favour a particular class during classification using XGBoost?


Classifying Email in RImbalanced classification data with a top decile conversion metricHow to match up categorical labels in training and evaluationIs recall more important than precision for mass mailings?“other” class in Image classificationWhen training an image classifier, is it best practice to equally distribute the number of images in each category?Impact of sparse features on tree-based modelsboosting an xgboost classifier with another xgboost classifier using different sets of featuresTraining multi-label classifier with unbalanced samples in KerasHow to choose metrics for evaluating classification results?













0












$begingroup$


I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.



I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).



from xgboost import XGBClassifier

# fit model on training data
model = XGBClassifier()
model.fit(X_train, labels) # where labels are either 1s or 0s

# make predictions for test data
y_pred = model.predict(X_test)
y_pred = y_pred > 0.70 # account for > 0.70 probability
y_pred = y_pred.astype(int)

print(y_pred)


I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.



UPDATE:



I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.










share|improve this question











$endgroup$
















    0












    $begingroup$


    I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.



    I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).



    from xgboost import XGBClassifier

    # fit model on training data
    model = XGBClassifier()
    model.fit(X_train, labels) # where labels are either 1s or 0s

    # make predictions for test data
    y_pred = model.predict(X_test)
    y_pred = y_pred > 0.70 # account for > 0.70 probability
    y_pred = y_pred.astype(int)

    print(y_pred)


    I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.



    UPDATE:



    I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.










    share|improve this question











    $endgroup$














      0












      0








      0





      $begingroup$


      I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.



      I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).



      from xgboost import XGBClassifier

      # fit model on training data
      model = XGBClassifier()
      model.fit(X_train, labels) # where labels are either 1s or 0s

      # make predictions for test data
      y_pred = model.predict(X_test)
      y_pred = y_pred > 0.70 # account for > 0.70 probability
      y_pred = y_pred.astype(int)

      print(y_pred)


      I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.



      UPDATE:



      I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.










      share|improve this question











      $endgroup$




      I am using a simple XGBoost model to classify 2 classes (0 and 1) in a binary context. In case of the original data, the 0 is the majority class and 1 the minority class. The thing which is happening is that in case of classification, most 0s are being classified correctly, with many going into 1s, but most 1s are being misclassified into 0s.



      I am fairly new to this, and having looked at various documentations and questions on SE, am really confused as to how I can specify my XGBoost model to favour class 1 (to be precise, if most 0s are misclassified into 1s, that is not a problem, but I want that most 1s are correctly classified as 1s (to increase the true positives, if there are false positives that is something which isn't much of a problem). The segment of code I am presently using to train and test the XGBoost are as follows (afterwards I use the confusion matrix in which the true positives (1s) are highly misclassified into 0s).



      from xgboost import XGBClassifier

      # fit model on training data
      model = XGBClassifier()
      model.fit(X_train, labels) # where labels are either 1s or 0s

      # make predictions for test data
      y_pred = model.predict(X_test)
      y_pred = y_pred > 0.70 # account for > 0.70 probability
      y_pred = y_pred.astype(int)

      print(y_pred)


      I just want to know if there is a simple way to specify to the XGBoost model any parameter in my code, so that the true positive rate can be increased? I can compromise of false positives being high, but I want the number of 1s to be correctly classified as 1s, instead of most of them going into 0s. Any help in this regard is appreciated.



      UPDATE:



      I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.







      machine-learning python bigdata xgboost






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday







      JChat

















      asked 2 days ago









      JChatJChat

      154




      154




















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).



          An example in Python is here.






          share|improve this answer










          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$












          • $begingroup$
            Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
            $endgroup$
            – JChat
            yesterday











          • $begingroup$
            The docs reference examples in Python, but I added a link to one in my answer.
            $endgroup$
            – wwwslinger
            yesterday










          • $begingroup$
            Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
            $endgroup$
            – wwwslinger
            yesterday










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47387%2fhow-to-favour-a-particular-class-during-classification-using-xgboost%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).



          An example in Python is here.






          share|improve this answer










          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$












          • $begingroup$
            Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
            $endgroup$
            – JChat
            yesterday











          • $begingroup$
            The docs reference examples in Python, but I added a link to one in my answer.
            $endgroup$
            – wwwslinger
            yesterday










          • $begingroup$
            Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
            $endgroup$
            – wwwslinger
            yesterday















          0












          $begingroup$

          XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).



          An example in Python is here.






          share|improve this answer










          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$












          • $begingroup$
            Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
            $endgroup$
            – JChat
            yesterday











          • $begingroup$
            The docs reference examples in Python, but I added a link to one in my answer.
            $endgroup$
            – wwwslinger
            yesterday










          • $begingroup$
            Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
            $endgroup$
            – wwwslinger
            yesterday













          0












          0








          0





          $begingroup$

          XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).



          An example in Python is here.






          share|improve this answer










          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          XGBoost has the scale_pos_weight parameter to help with this, depending on how you want to evaluate it (see tuning notes). It should be the ratio of negative count to positive count (or inverse based on how you indexed your classes).



          An example in Python is here.







          share|improve this answer










          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited yesterday





















          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 2 days ago









          wwwslingerwwwslinger

          1183




          1183




          New contributor




          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          wwwslinger is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.











          • $begingroup$
            Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
            $endgroup$
            – JChat
            yesterday











          • $begingroup$
            The docs reference examples in Python, but I added a link to one in my answer.
            $endgroup$
            – wwwslinger
            yesterday










          • $begingroup$
            Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
            $endgroup$
            – wwwslinger
            yesterday
















          • $begingroup$
            Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
            $endgroup$
            – JChat
            yesterday











          • $begingroup$
            The docs reference examples in Python, but I added a link to one in my answer.
            $endgroup$
            – wwwslinger
            yesterday










          • $begingroup$
            Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
            $endgroup$
            – JChat
            yesterday










          • $begingroup$
            The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
            $endgroup$
            – wwwslinger
            yesterday















          $begingroup$
          Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
          $endgroup$
          – JChat
          yesterday




          $begingroup$
          Thanks a lot for your answer. It would be great if you could kindly give a small example of using ratio of negative count to positive count. Is it a fractional value in that sense? It would be helpful if you could give a one line example in using it within fit().
          $endgroup$
          – JChat
          yesterday












          $begingroup$
          Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
          $endgroup$
          – JChat
          yesterday





          $begingroup$
          Also, unfortunately I couldn't find the use of scale_pos_weight in Python, but the documentation only mentions that in R. xgboost.readthedocs.io/en/latest/python/… this is the Python page but I am unable to understand how to use it in the current context please.
          $endgroup$
          – JChat
          yesterday













          $begingroup$
          The docs reference examples in Python, but I added a link to one in my answer.
          $endgroup$
          – wwwslinger
          yesterday




          $begingroup$
          The docs reference examples in Python, but I added a link to one in my answer.
          $endgroup$
          – wwwslinger
          yesterday












          $begingroup$
          Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
          $endgroup$
          – JChat
          yesterday




          $begingroup$
          Happy to accept your answer. However, I have now tried to use scale_pos_weight in the XGBoost, with its value set to 0.70 (a random figure), but it is still landing most samples to 0, instead of 1.. Any suggestions please? 0 is the majority class and 1 the minority one, and I want to maximise the predictions of 1s to be true, even if it leads to false positives.
          $endgroup$
          – JChat
          yesterday












          $begingroup$
          The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
          $endgroup$
          – wwwslinger
          yesterday




          $begingroup$
          The value should be representative of the class distribution. See the example, try inverting the ratio, and try whole numbers. I think some examples I've seen had 9 when one class was 9 times more prevalent.
          $endgroup$
          – wwwslinger
          yesterday

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47387%2fhow-to-favour-a-particular-class-during-classification-using-xgboost%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High