Doubt to use accuracy or macro f1 measure in an unbalanced classification taskOver-fitting issue in a classification problem (unbalanced data)Which accuracy metric of a ML classifier can maximize map@K of a recommender system for an unbalanced dataset?Balanced Linear SVM wins every class except One vs Allunbalanced data classificationWhy MLP only learns bias for unbalanced binary classification?Hyperparameter tuning in multiclass classification problem: which scoring metric?How to deal with unbalanced data in pixelwise classification?multi class classification : unbalanced data - good testing results poor prediction resultsAudio classification data balanceMicro-F1 and Macro-F1 are equal in binary classification and I don't know why

Matrix using tikz package

What does "^L" mean in C?

gerund and noun applications

Pronounciation of the combination "st" in spanish accents

Differential and Linear trail propagation in Noekeon

Calculate the frequency of characters in a string

Why is there so much iron?

If "dar" means "to give", what does "daros" mean?

How do hiring committees for research positions view getting "scooped"?

Maths symbols and unicode-math input inside siunitx commands

Worshiping one God at a time?

How does one measure the Fourier components of a signal?

The average age of first marriage in Russia

What (if any) is the reason to buy in small local stores?

Comment Box for Substitution Method of Integrals

Does multi-classing into Fighter give you heavy armor proficiency?

Print last inputted byte

Suggestions on how to spend Shaabath (constructively) alone

Can other pieces capture a threatening piece and prevent a checkmate?

How to define limit operations in general topological spaces? Are nets able to do this?

How can an organ that provides biological immortality be unable to regenerate?

Violin - Can double stops be played when the strings are not next to each other?

Is it insecure to send a password in a `curl` command?

Print a physical multiplication table



Doubt to use accuracy or macro f1 measure in an unbalanced classification task


Over-fitting issue in a classification problem (unbalanced data)Which accuracy metric of a ML classifier can maximize map@K of a recommender system for an unbalanced dataset?Balanced Linear SVM wins every class except One vs Allunbalanced data classificationWhy MLP only learns bias for unbalanced binary classification?Hyperparameter tuning in multiclass classification problem: which scoring metric?How to deal with unbalanced data in pixelwise classification?multi class classification : unbalanced data - good testing results poor prediction resultsAudio classification data balanceMicro-F1 and Macro-F1 are equal in binary classification and I don't know why













1












$begingroup$


I have a multi-class classification task where the organizers said that the final results will be using the Accuracy measure.



The provided data is unbalanced, and I don't have an idea about the test set (is it balanced or not), but I think it will be balanced since they use accuracy.



Anyway ..



My question: Is it a good idea to tune my system using F1-macro rather than Accuracy? since the training data is unbalanced.



or it's better to use the Accuracy?










share|improve this question











$endgroup$











  • $begingroup$
    It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:48






  • 1




    $begingroup$
    Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:50











  • $begingroup$
    only my results .. and thanx for the suggestion
    $endgroup$
    – Ghanem
    Dec 5 '18 at 16:18






  • 1




    $begingroup$
    Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
    $endgroup$
    – ignatius
    Dec 5 '18 at 16:20















1












$begingroup$


I have a multi-class classification task where the organizers said that the final results will be using the Accuracy measure.



The provided data is unbalanced, and I don't have an idea about the test set (is it balanced or not), but I think it will be balanced since they use accuracy.



Anyway ..



My question: Is it a good idea to tune my system using F1-macro rather than Accuracy? since the training data is unbalanced.



or it's better to use the Accuracy?










share|improve this question











$endgroup$











  • $begingroup$
    It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:48






  • 1




    $begingroup$
    Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:50











  • $begingroup$
    only my results .. and thanx for the suggestion
    $endgroup$
    – Ghanem
    Dec 5 '18 at 16:18






  • 1




    $begingroup$
    Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
    $endgroup$
    – ignatius
    Dec 5 '18 at 16:20













1












1








1





$begingroup$


I have a multi-class classification task where the organizers said that the final results will be using the Accuracy measure.



The provided data is unbalanced, and I don't have an idea about the test set (is it balanced or not), but I think it will be balanced since they use accuracy.



Anyway ..



My question: Is it a good idea to tune my system using F1-macro rather than Accuracy? since the training data is unbalanced.



or it's better to use the Accuracy?










share|improve this question











$endgroup$




I have a multi-class classification task where the organizers said that the final results will be using the Accuracy measure.



The provided data is unbalanced, and I don't have an idea about the test set (is it balanced or not), but I think it will be balanced since they use accuracy.



Anyway ..



My question: Is it a good idea to tune my system using F1-macro rather than Accuracy? since the training data is unbalanced.



or it's better to use the Accuracy?







classification unbalanced-classes evaluation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Alireza Zolanvari

19114




19114










asked Dec 5 '18 at 14:36









GhanemGhanem

1186




1186











  • $begingroup$
    It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:48






  • 1




    $begingroup$
    Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:50











  • $begingroup$
    only my results .. and thanx for the suggestion
    $endgroup$
    – Ghanem
    Dec 5 '18 at 16:18






  • 1




    $begingroup$
    Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
    $endgroup$
    – ignatius
    Dec 5 '18 at 16:20
















  • $begingroup$
    It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:48






  • 1




    $begingroup$
    Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
    $endgroup$
    – ignatius
    Dec 5 '18 at 15:50











  • $begingroup$
    only my results .. and thanx for the suggestion
    $endgroup$
    – Ghanem
    Dec 5 '18 at 16:18






  • 1




    $begingroup$
    Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
    $endgroup$
    – ignatius
    Dec 5 '18 at 16:20















$begingroup$
It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
$endgroup$
– ignatius
Dec 5 '18 at 15:48




$begingroup$
It's a better approach to use F1. Will the organizers evaluate all your process or only your results? I'd tune my model with F1 and then deliver it with the accuracy
$endgroup$
– ignatius
Dec 5 '18 at 15:48




1




1




$begingroup$
Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
$endgroup$
– ignatius
Dec 5 '18 at 15:50





$begingroup$
Also, they might want to evaluate how you approach the problem... The reason behind giving an unbalanced data-set and accuracy as metric might be to check whether you notice the problems with that and how you face it, for example balancing your data in some way
$endgroup$
– ignatius
Dec 5 '18 at 15:50













$begingroup$
only my results .. and thanx for the suggestion
$endgroup$
– Ghanem
Dec 5 '18 at 16:18




$begingroup$
only my results .. and thanx for the suggestion
$endgroup$
– Ghanem
Dec 5 '18 at 16:18




1




1




$begingroup$
Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
$endgroup$
– ignatius
Dec 5 '18 at 16:20




$begingroup$
Well, so nothing prevents you from tuning the model with a metric of your choice. Good luck!
$endgroup$
– ignatius
Dec 5 '18 at 16:20










2 Answers
2






active

oldest

votes


















0












$begingroup$

using accuracy for unbalance data means that correct classification for the most populous class members is more important than others. If the importance of correct classifying for all data records is equal in your problem accuracy is one of the worst choices.



There are some other good choices beside F1-macro which can be more helpful. Some of these metrics are as follows:



"Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN",
"MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC",
"AUCI", "G", "DP", "DPI", "GI"


Disclaimer:



If you use python, PyCM module can help you to find out these metrics.



Here is a simple code to get the recommended parameters from this module:



>>> from pycm import *

>>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


After that, each of these parameters you want to use as the loss function can be used as follows:



>>> y_pred = model.predict #the prediction of the implemented model

>>> y_actu = data.target #data labels

>>> cm = ConfusionMatrix(y_actu, y_pred)

>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





share|improve this answer









$endgroup$




















    0












    $begingroup$

    You should definitely use macro-average F1 as the accuracy could be highly biased by the majority class. The F1 makes an harmonic mean of recall and precision, giving a trade-off measure considering what has been correctly predicted and what not.






    share|improve this answer









    $endgroup$












      Your Answer





      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "557"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42177%2fdoubt-to-use-accuracy-or-macro-f1-measure-in-an-unbalanced-classification-task%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      using accuracy for unbalance data means that correct classification for the most populous class members is more important than others. If the importance of correct classifying for all data records is equal in your problem accuracy is one of the worst choices.



      There are some other good choices beside F1-macro which can be more helpful. Some of these metrics are as follows:



      "Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN",
      "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC",
      "AUCI", "G", "DP", "DPI", "GI"


      Disclaimer:



      If you use python, PyCM module can help you to find out these metrics.



      Here is a simple code to get the recommended parameters from this module:



      >>> from pycm import *

      >>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

      >>> print(cm.recommended_list)
      ["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


      After that, each of these parameters you want to use as the loss function can be used as follows:



      >>> y_pred = model.predict #the prediction of the implemented model

      >>> y_actu = data.target #data labels

      >>> cm = ConfusionMatrix(y_actu, y_pred)

      >>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





      share|improve this answer









      $endgroup$

















        0












        $begingroup$

        using accuracy for unbalance data means that correct classification for the most populous class members is more important than others. If the importance of correct classifying for all data records is equal in your problem accuracy is one of the worst choices.



        There are some other good choices beside F1-macro which can be more helpful. Some of these metrics are as follows:



        "Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN",
        "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC",
        "AUCI", "G", "DP", "DPI", "GI"


        Disclaimer:



        If you use python, PyCM module can help you to find out these metrics.



        Here is a simple code to get the recommended parameters from this module:



        >>> from pycm import *

        >>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

        >>> print(cm.recommended_list)
        ["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


        After that, each of these parameters you want to use as the loss function can be used as follows:



        >>> y_pred = model.predict #the prediction of the implemented model

        >>> y_actu = data.target #data labels

        >>> cm = ConfusionMatrix(y_actu, y_pred)

        >>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





        share|improve this answer









        $endgroup$















          0












          0








          0





          $begingroup$

          using accuracy for unbalance data means that correct classification for the most populous class members is more important than others. If the importance of correct classifying for all data records is equal in your problem accuracy is one of the worst choices.



          There are some other good choices beside F1-macro which can be more helpful. Some of these metrics are as follows:



          "Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN",
          "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC",
          "AUCI", "G", "DP", "DPI", "GI"


          Disclaimer:



          If you use python, PyCM module can help you to find out these metrics.



          Here is a simple code to get the recommended parameters from this module:



          >>> from pycm import *

          >>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

          >>> print(cm.recommended_list)
          ["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


          After that, each of these parameters you want to use as the loss function can be used as follows:



          >>> y_pred = model.predict #the prediction of the implemented model

          >>> y_actu = data.target #data labels

          >>> cm = ConfusionMatrix(y_actu, y_pred)

          >>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)





          share|improve this answer









          $endgroup$



          using accuracy for unbalance data means that correct classification for the most populous class members is more important than others. If the importance of correct classifying for all data records is equal in your problem accuracy is one of the worst choices.



          There are some other good choices beside F1-macro which can be more helpful. Some of these metrics are as follows:



          "Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN",
          "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC",
          "AUCI", "G", "DP", "DPI", "GI"


          Disclaimer:



          If you use python, PyCM module can help you to find out these metrics.



          Here is a simple code to get the recommended parameters from this module:



          >>> from pycm import *

          >>> cm = ConfusionMatrix(matrix="Class1": "Class1": 1, "Class2":2, "Class2": "Class1": 0, "Class2": 5)

          >>> print(cm.recommended_list)
          ["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]


          After that, each of these parameters you want to use as the loss function can be used as follows:



          >>> y_pred = model.predict #the prediction of the implemented model

          >>> y_actu = data.target #data labels

          >>> cm = ConfusionMatrix(y_actu, y_pred)

          >>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 11 at 7:37









          Alireza ZolanvariAlireza Zolanvari

          19114




          19114





















              0












              $begingroup$

              You should definitely use macro-average F1 as the accuracy could be highly biased by the majority class. The F1 makes an harmonic mean of recall and precision, giving a trade-off measure considering what has been correctly predicted and what not.






              share|improve this answer









              $endgroup$

















                0












                $begingroup$

                You should definitely use macro-average F1 as the accuracy could be highly biased by the majority class. The F1 makes an harmonic mean of recall and precision, giving a trade-off measure considering what has been correctly predicted and what not.






                share|improve this answer









                $endgroup$















                  0












                  0








                  0





                  $begingroup$

                  You should definitely use macro-average F1 as the accuracy could be highly biased by the majority class. The F1 makes an harmonic mean of recall and precision, giving a trade-off measure considering what has been correctly predicted and what not.






                  share|improve this answer









                  $endgroup$



                  You should definitely use macro-average F1 as the accuracy could be highly biased by the majority class. The F1 makes an harmonic mean of recall and precision, giving a trade-off measure considering what has been correctly predicted and what not.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 11 at 8:23









                  3nomis3nomis

                  1929




                  1929



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f42177%2fdoubt-to-use-accuracy-or-macro-f1-measure-in-an-unbalanced-classification-task%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Is flight data recorder erased after every flight?When are black boxes used?What protects the location beacon (pinger) of a flight data recorder?Is there anywhere I can pick up raw flight data recorder information?Who legally owns the Flight Data Recorder?Constructing flight recorder dataWhy are FDRs and CVRs still two separate physical devices?What are the data elements shown on the GE235 flight data recorder (FDR) plot?Are CVR and FDR reset after every flight?What is the format of data stored by a Flight Data Recorder?How much data is stored in the flight data recorder per hour in a typical flight of an A380?Is a smart flight data recorder possible?

                      Is there a general name for the setup in which payoffs are not known exactly but players try to influence each other's perception of the payoffs?Osborne, Nash equilibria and the correctness of beliefsIs there a name for this family of games (Binomial games?)?Perfect Bayesian EquilibriumCalculating mixed strategy equilibrium in battle of sexesPure Strategy SPNEIs there a commitment mechanism which allows players to achieve pareto optimal solutions?Extensive Form GamesAn $n$-player prisoner's dilemma where a coalition of 2 players is better off defectingTit-For-Stat Strategy Best RepliesPotential solutions of the $n$-player Prisoner's Dilemma

                      Which is better: GPT or RelGAN for text generation?2019 Community Moderator ElectionWhat is the difference between TextGAN and LM for text generation?GANs (generative adversarial networks) possible for text as well?Generator loss not decreasing- text to image synthesisChoosing a right algorithm for template-based text generationHow should I format input and output for text generation with LSTMsGumbel Softmax vs Vanilla Softmax for GAN trainingWhich neural network to choose for classification from text/speech?NLP text autoencoder that generates text in poetic meterWhat is the interpretation of the expectation notation in the GAN formulation?What is the difference between TextGAN and LM for text generation?How to prepare the data for text generation task