Confusion in applying k-fold cross validation to dataset2019 Community Moderator Electionhow to generate sample dataset for classification problemTraining Validation Testing set split for facial expression datasetConsistently inconsistent cross-validation results that are wildly different from original model accuracyK fold cross validation algorithmReporting test result for cross-validation with Neural NetworkLinear Regression + KFold cross validationEM-ELM Cross validationOversampling before Cross-Validation, is it a problem?Cross validation for highly imbalanced data with undersamplingPCA, SMOTE and cross validation- how to combine them together?

To string or not to string

How to add double frame in tcolorbox?

Why don't electron-positron collisions release infinite energy?

What would happen to a modern skyscraper if it rains micro blackholes?

Fully-Firstable Anagram Sets

Can a Warlock become Neutral Good?

How old can references or sources in a thesis be?

How can bays and straits be determined in a procedurally generated map?

Is it unprofessional to ask if a job posting on GlassDoor is real?

Mage Armor with Defense fighting style (for Adventurers League bladeslinger)

Fencing style for blades that can attack from a distance

Watching something be written to a file live with tail

I'm planning on buying a laser printer but concerned about the life cycle of toner in the machine

"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"

US citizen flying to France today and my passport expires in less than 2 months

Today is the Center

Is it possible to do 50 km distance without any previous training?

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

The Clique vs. Independent Set Problem

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

In Japanese, what’s the difference between “Tonari ni” (となりに) and “Tsugi” (つぎ)? When would you use one over the other?

Are the number of citations and number of published articles the most important criteria for a tenure promotion?

Replacing matching entries in one column of a file by another column from a different file

Theorems that impeded progress



Confusion in applying k-fold cross validation to dataset



2019 Community Moderator Electionhow to generate sample dataset for classification problemTraining Validation Testing set split for facial expression datasetConsistently inconsistent cross-validation results that are wildly different from original model accuracyK fold cross validation algorithmReporting test result for cross-validation with Neural NetworkLinear Regression + KFold cross validationEM-ELM Cross validationOversampling before Cross-Validation, is it a problem?Cross validation for highly imbalanced data with undersamplingPCA, SMOTE and cross validation- how to combine them together?










0












$begingroup$


I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.



In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows



enter image description here



In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
    $endgroup$
    – Esmailian
    Mar 27 at 16:11
















0












$begingroup$


I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.



In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows



enter image description here



In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.










share|improve this question









$endgroup$







  • 1




    $begingroup$
    Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
    $endgroup$
    – Esmailian
    Mar 27 at 16:11














0












0








0





$begingroup$


I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.



In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows



enter image description here



In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.










share|improve this question









$endgroup$




I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.



In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows



enter image description here



In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.







machine-learning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 27 at 16:00









Kalyan KatikapalliKalyan Katikapalli

11




11







  • 1




    $begingroup$
    Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
    $endgroup$
    – Esmailian
    Mar 27 at 16:11













  • 1




    $begingroup$
    Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
    $endgroup$
    – Esmailian
    Mar 27 at 16:11








1




1




$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11





$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11











1 Answer
1






active

oldest

votes


















0












$begingroup$

In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.



for more, see here



and in python, you can implement 10 fold cross-validation using sklearn library here



Now, because your dataset is already split into 10 fold, you have two choices:



1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.



2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.



this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.






share|improve this answer











$endgroup$












  • $begingroup$
    The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:38











  • $begingroup$
    The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:40










  • $begingroup$
    I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
    $endgroup$
    – Kalyan Katikapalli
    Mar 28 at 2:11










  • $begingroup$
    the question is updated, see if it can help you.
    $endgroup$
    – honar.cs
    Mar 28 at 13:06











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48095%2fconfusion-in-applying-k-fold-cross-validation-to-dataset%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.



for more, see here



and in python, you can implement 10 fold cross-validation using sklearn library here



Now, because your dataset is already split into 10 fold, you have two choices:



1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.



2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.



this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.






share|improve this answer











$endgroup$












  • $begingroup$
    The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:38











  • $begingroup$
    The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:40










  • $begingroup$
    I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
    $endgroup$
    – Kalyan Katikapalli
    Mar 28 at 2:11










  • $begingroup$
    the question is updated, see if it can help you.
    $endgroup$
    – honar.cs
    Mar 28 at 13:06















0












$begingroup$

In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.



for more, see here



and in python, you can implement 10 fold cross-validation using sklearn library here



Now, because your dataset is already split into 10 fold, you have two choices:



1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.



2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.



this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.






share|improve this answer











$endgroup$












  • $begingroup$
    The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:38











  • $begingroup$
    The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:40










  • $begingroup$
    I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
    $endgroup$
    – Kalyan Katikapalli
    Mar 28 at 2:11










  • $begingroup$
    the question is updated, see if it can help you.
    $endgroup$
    – honar.cs
    Mar 28 at 13:06













0












0








0





$begingroup$

In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.



for more, see here



and in python, you can implement 10 fold cross-validation using sklearn library here



Now, because your dataset is already split into 10 fold, you have two choices:



1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.



2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.



this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.






share|improve this answer











$endgroup$



In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.



for more, see here



and in python, you can implement 10 fold cross-validation using sklearn library here



Now, because your dataset is already split into 10 fold, you have two choices:



1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.



2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.



this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 28 at 13:02

























answered Mar 27 at 16:35









honar.cshonar.cs

31614




31614











  • $begingroup$
    The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:38











  • $begingroup$
    The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:40










  • $begingroup$
    I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
    $endgroup$
    – Kalyan Katikapalli
    Mar 28 at 2:11










  • $begingroup$
    the question is updated, see if it can help you.
    $endgroup$
    – honar.cs
    Mar 28 at 13:06
















  • $begingroup$
    The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:38











  • $begingroup$
    The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
    $endgroup$
    – Kalyan Katikapalli
    Mar 27 at 16:40










  • $begingroup$
    I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
    $endgroup$
    – Kalyan Katikapalli
    Mar 28 at 2:11










  • $begingroup$
    the question is updated, see if it can help you.
    $endgroup$
    – honar.cs
    Mar 28 at 13:06















$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38





$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38













$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40




$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40












$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11




$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11












$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06




$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48095%2fconfusion-in-applying-k-fold-cross-validation-to-dataset%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High