Validation vs. test vs. training accuracy. Which one should I compare for claiming overfit?How to plan an analysis to prevent overfitting?Which observation to use when doing k-fold validation or boostrap?Training Validation Testing set split for facial expression datasetwhy k-fold cross validation (CV) overfits? Or why discrepancy occurs between CV and test set?Unstable accuracy of CNN - When should I stop training?Reporting test result for cross-validation with Neural NetworkShould I prevent augmented data to leak to the test/cross validation setsvalidation/training accuracy and overfittingValidation accuracy for neural networkIn which epoch should i stop the training to avoid overfittingValidation accuracy is always close to training accuracy

How do you justify more code being written by following clean code practices?

Determine voltage drop over 10G resistors with cheap multimeter

Do native speakers use "ultima" and "proxima" frequently in spoken English?

Air travel with refrigerated insulin

Do people actually use the word "kaputt" in conversation?

How to understand 「僕は誰より彼女が好きなんだ。」

Single word to change groups

What should be the ideal length of sentences in a blog post for ease of reading?

Pre-Employment Background Check With Consent For Future Checks

What do the positive and negative (+/-) transmit and receive pins mean on Ethernet cables?

Writing in a Christian voice

Do I need an EFI partition for each 18.04 ubuntu I have on my HD?

Would mining huge amounts of resources on the Moon change its orbit?

Does fire aspect on a sword, destroy mob drops?

label a part of commutative diagram

PTIJ: If Haman would have fallen with no one around to hear him fall, would that still have made a sound?

Are hand made posters acceptable in Academia?

Why does Surtur say that Thor is Asgard's doom?

Hot air balloons as primitive bombers

God... independent

What (if any) is the reason to buy in small local stores?

Turning a hard to access nut?

Should I be concerned about student access to a test bank?

Have the tides ever turned twice on any open problem?



Validation vs. test vs. training accuracy. Which one should I compare for claiming overfit?


How to plan an analysis to prevent overfitting?Which observation to use when doing k-fold validation or boostrap?Training Validation Testing set split for facial expression datasetwhy k-fold cross validation (CV) overfits? Or why discrepancy occurs between CV and test set?Unstable accuracy of CNN - When should I stop training?Reporting test result for cross-validation with Neural NetworkShould I prevent augmented data to leak to the test/cross validation setsvalidation/training accuracy and overfittingValidation accuracy for neural networkIn which epoch should i stop the training to avoid overfittingValidation accuracy is always close to training accuracy













7












$begingroup$


I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



For example:



I divide my data for 70% training and 30% test.



When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



Afterward, I test the model on 30% test data and get Test Accuracy.



In this case, what will be training accuracy? And which
two accuracies should I compare to see if the model is overfitting or not?










share|improve this question









New contributor




A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$
















    7












    $begingroup$


    I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



    But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



    For example:



    I divide my data for 70% training and 30% test.



    When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



    Afterward, I test the model on 30% test data and get Test Accuracy.



    In this case, what will be training accuracy? And which
    two accuracies should I compare to see if the model is overfitting or not?










    share|improve this question









    New contributor




    A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      7












      7








      7


      1



      $begingroup$


      I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



      But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



      For example:



      I divide my data for 70% training and 30% test.



      When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



      Afterward, I test the model on 30% test data and get Test Accuracy.



      In this case, what will be training accuracy? And which
      two accuracies should I compare to see if the model is overfitting or not?










      share|improve this question









      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.



      But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?



      For example:



      I divide my data for 70% training and 30% test.



      When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?



      Afterward, I test the model on 30% test data and get Test Accuracy.



      In this case, what will be training accuracy? And which
      two accuracies should I compare to see if the model is overfitting or not?







      machine-learning cross-validation accuracy overfitting






      share|improve this question









      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Mar 14 at 12:32









      Peter Mortensen

      1174




      1174






      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Mar 13 at 19:14









      A.BA.B

      1384




      1384




      New contributor




      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      A.B is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          7












          $begingroup$


          Which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.



          Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.




          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:



          from sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])


          Set return_estimator = True to get the trained models too.



          More on validation set



          Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,



          1. Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.


          2. Two examples for selecting between multiple models: we (a) do K-fold CV on a SVM and a decision tree (to get K models for each), or (b) apply two already-built SVM and decision tree models on a validation set, then we select the model with the highest validation accuracy (among 2K models in (a), or 2 models in (b)). Finally, we test the selected model on a held-out set, to get the test accuracy.






          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            Mar 13 at 20:21











          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            Mar 13 at 20:23










          • $begingroup$
            @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
            $endgroup$
            – A.B
            Mar 13 at 22:22










          • $begingroup$
            @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
            $endgroup$
            – Esmailian
            Mar 13 at 22:27






          • 1




            $begingroup$
            Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
            $endgroup$
            – A.B
            Mar 13 at 22:32


















          4












          $begingroup$

          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thank you for answer.
            $endgroup$
            – A.B
            Mar 13 at 22:20










          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          A.B is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47263%2fvalidation-vs-test-vs-training-accuracy-which-one-should-i-compare-for-claimi%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7












          $begingroup$


          Which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.



          Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.




          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:



          from sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])


          Set return_estimator = True to get the trained models too.



          More on validation set



          Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,



          1. Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.


          2. Two examples for selecting between multiple models: we (a) do K-fold CV on a SVM and a decision tree (to get K models for each), or (b) apply two already-built SVM and decision tree models on a validation set, then we select the model with the highest validation accuracy (among 2K models in (a), or 2 models in (b)). Finally, we test the selected model on a held-out set, to get the test accuracy.






          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            Mar 13 at 20:21











          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            Mar 13 at 20:23










          • $begingroup$
            @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
            $endgroup$
            – A.B
            Mar 13 at 22:22










          • $begingroup$
            @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
            $endgroup$
            – Esmailian
            Mar 13 at 22:27






          • 1




            $begingroup$
            Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
            $endgroup$
            – A.B
            Mar 13 at 22:32















          7












          $begingroup$


          Which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.



          Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.




          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:



          from sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])


          Set return_estimator = True to get the trained models too.



          More on validation set



          Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,



          1. Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.


          2. Two examples for selecting between multiple models: we (a) do K-fold CV on a SVM and a decision tree (to get K models for each), or (b) apply two already-built SVM and decision tree models on a validation set, then we select the model with the highest validation accuracy (among 2K models in (a), or 2 models in (b)). Finally, we test the selected model on a held-out set, to get the test accuracy.






          share|improve this answer











          $endgroup$








          • 2




            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            Mar 13 at 20:21











          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            Mar 13 at 20:23










          • $begingroup$
            @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
            $endgroup$
            – A.B
            Mar 13 at 22:22










          • $begingroup$
            @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
            $endgroup$
            – Esmailian
            Mar 13 at 22:27






          • 1




            $begingroup$
            Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
            $endgroup$
            – A.B
            Mar 13 at 22:32













          7












          7








          7





          $begingroup$


          Which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.



          Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.




          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:



          from sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])


          Set return_estimator = True to get the trained models too.



          More on validation set



          Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,



          1. Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.


          2. Two examples for selecting between multiple models: we (a) do K-fold CV on a SVM and a decision tree (to get K models for each), or (b) apply two already-built SVM and decision tree models on a validation set, then we select the model with the highest validation accuracy (among 2K models in (a), or 2 models in (b)). Finally, we test the selected model on a held-out set, to get the test accuracy.






          share|improve this answer











          $endgroup$




          Which two accuracies I compare to see if the model is overfitting or not?




          You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.



          Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.



          I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.




          When I get to run 10 fold cross-validation, I get 10 accuracies that I
          can take the average/mean of. should I call this mean as validation
          accuracy?




          No. It is a [estimate of] test accuracy.

          The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.




          Afterward, I test the model on 30% test data and get Test Accuracy.




          If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.




          In this case, what will be training accuracy?




          From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:



          from sklearn import model_selection
          from sklearn import datasets
          from sklearn import svm

          iris = datasets.load_iris()
          clf = svm.SVC(kernel='linear', C=1)
          scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
          print('Train scores:')
          print(scores['train_score'])
          print('Test scores:')
          print(scores['test_score'])


          Set return_estimator = True to get the trained models too.



          More on validation set



          Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,



          1. Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.


          2. Two examples for selecting between multiple models: we (a) do K-fold CV on a SVM and a decision tree (to get K models for each), or (b) apply two already-built SVM and decision tree models on a validation set, then we select the model with the highest validation accuracy (among 2K models in (a), or 2 models in (b)). Finally, we test the selected model on a held-out set, to get the test accuracy.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago

























          answered Mar 13 at 20:14









          EsmailianEsmailian

          1,461113




          1,461113







          • 2




            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            Mar 13 at 20:21











          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            Mar 13 at 20:23










          • $begingroup$
            @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
            $endgroup$
            – A.B
            Mar 13 at 22:22










          • $begingroup$
            @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
            $endgroup$
            – Esmailian
            Mar 13 at 22:27






          • 1




            $begingroup$
            Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
            $endgroup$
            – A.B
            Mar 13 at 22:32












          • 2




            $begingroup$
            I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
            $endgroup$
            – Ben Reiniger
            Mar 13 at 20:21











          • $begingroup$
            @BenReiniger You are right I should clear this case.
            $endgroup$
            – Esmailian
            Mar 13 at 20:23










          • $begingroup$
            @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
            $endgroup$
            – A.B
            Mar 13 at 22:22










          • $begingroup$
            @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
            $endgroup$
            – Esmailian
            Mar 13 at 22:27






          • 1




            $begingroup$
            Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
            $endgroup$
            – A.B
            Mar 13 at 22:32







          2




          2




          $begingroup$
          I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
          $endgroup$
          – Ben Reiniger
          Mar 13 at 20:21





          $begingroup$
          I think I disagree with "30% test set not needed." If you are using CV to select a better model, then you are exposing the test folds (which I would call a validation set in this case) and risk overfitting there. The final test set should remain untouched (by both you and your algorithms) until the end, to estimate the final model performance (if that's something you need). But yes, while model-building, the (averaged) training fold score vs. the (averaged) validation fold score is what you're looking at for overfitting indication.
          $endgroup$
          – Ben Reiniger
          Mar 13 at 20:21













          $begingroup$
          @BenReiniger You are right I should clear this case.
          $endgroup$
          – Esmailian
          Mar 13 at 20:23




          $begingroup$
          @BenReiniger You are right I should clear this case.
          $endgroup$
          – Esmailian
          Mar 13 at 20:23












          $begingroup$
          @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
          $endgroup$
          – A.B
          Mar 13 at 22:22




          $begingroup$
          @Esmailian train_score is also an average of 10 scores? Also, to do a similar kind of thing with GridSearchCV(in case hyper paramter tuning and cross-validation are required in one step) can we use return_train_score=true? is it same?
          $endgroup$
          – A.B
          Mar 13 at 22:22












          $begingroup$
          @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
          $endgroup$
          – Esmailian
          Mar 13 at 22:27




          $begingroup$
          @A.B It is an array, needs to be averaged. return_train_score=true or =false only changes the returned report, underlying result is the same.
          $endgroup$
          – Esmailian
          Mar 13 at 22:27




          1




          1




          $begingroup$
          Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
          $endgroup$
          – A.B
          Mar 13 at 22:32




          $begingroup$
          Okay thanks, I am accepting the answer as "which accuracy is to be used" makes sense. But is it possible for you to elaborate more on "validation set is used to build/select a better model (e.g. avoid over-fitting) vs in your case, 10-fold CV tests an already-built model" for me and future readers?
          $endgroup$
          – A.B
          Mar 13 at 22:32











          4












          $begingroup$

          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thank you for answer.
            $endgroup$
            – A.B
            Mar 13 at 22:20















          4












          $begingroup$

          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






          share|improve this answer









          $endgroup$












          • $begingroup$
            Thank you for answer.
            $endgroup$
            – A.B
            Mar 13 at 22:20













          4












          4








          4





          $begingroup$

          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.






          share|improve this answer









          $endgroup$



          Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 13 at 19:32









          astelastel

          411




          411











          • $begingroup$
            Thank you for answer.
            $endgroup$
            – A.B
            Mar 13 at 22:20
















          • $begingroup$
            Thank you for answer.
            $endgroup$
            – A.B
            Mar 13 at 22:20















          $begingroup$
          Thank you for answer.
          $endgroup$
          – A.B
          Mar 13 at 22:20




          $begingroup$
          Thank you for answer.
          $endgroup$
          – A.B
          Mar 13 at 22:20










          A.B is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          A.B is a new contributor. Be nice, and check out our Code of Conduct.












          A.B is a new contributor. Be nice, and check out our Code of Conduct.











          A.B is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47263%2fvalidation-vs-test-vs-training-accuracy-which-one-should-i-compare-for-claimi%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High