Confusion in applying k-fold cross validation to dataset2019 Community Moderator Electionhow to generate sample dataset for classification problemTraining Validation Testing set split for facial expression datasetConsistently inconsistent cross-validation results that are wildly different from original model accuracyK fold cross validation algorithmReporting test result for cross-validation with Neural NetworkLinear Regression + KFold cross validationEM-ELM Cross validationOversampling before Cross-Validation, is it a problem?Cross validation for highly imbalanced data with undersamplingPCA, SMOTE and cross validation- how to combine them together?
To string or not to string
How to add double frame in tcolorbox?
Why don't electron-positron collisions release infinite energy?
What would happen to a modern skyscraper if it rains micro blackholes?
Fully-Firstable Anagram Sets
Can a Warlock become Neutral Good?
How old can references or sources in a thesis be?
How can bays and straits be determined in a procedurally generated map?
Is it unprofessional to ask if a job posting on GlassDoor is real?
Mage Armor with Defense fighting style (for Adventurers League bladeslinger)
Fencing style for blades that can attack from a distance
Watching something be written to a file live with tail
I'm planning on buying a laser printer but concerned about the life cycle of toner in the machine
"to be prejudice towards/against someone" vs "to be prejudiced against/towards someone"
US citizen flying to France today and my passport expires in less than 2 months
Today is the Center
Is it possible to do 50 km distance without any previous training?
Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?
The Clique vs. Independent Set Problem
How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?
In Japanese, what’s the difference between “Tonari ni” (となりに) and “Tsugi” (つぎ)? When would you use one over the other?
Are the number of citations and number of published articles the most important criteria for a tenure promotion?
Replacing matching entries in one column of a file by another column from a different file
Theorems that impeded progress
Confusion in applying k-fold cross validation to dataset
2019 Community Moderator Electionhow to generate sample dataset for classification problemTraining Validation Testing set split for facial expression datasetConsistently inconsistent cross-validation results that are wildly different from original model accuracyK fold cross validation algorithmReporting test result for cross-validation with Neural NetworkLinear Regression + KFold cross validationEM-ELM Cross validationOversampling before Cross-Validation, is it a problem?Cross validation for highly imbalanced data with undersamplingPCA, SMOTE and cross validation- how to combine them together?
$begingroup$
I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.
In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows
In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.
machine-learning
$endgroup$
add a comment |
$begingroup$
I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.
In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows
In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.
machine-learning
$endgroup$
1
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11
add a comment |
$begingroup$
I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.
In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows
In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.
machine-learning
$endgroup$
I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I'm not able to understand how to apply 10 fold cross validation on this data set.
In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows
In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.
machine-learning
machine-learning
asked Mar 27 at 16:00
Kalyan KatikapalliKalyan Katikapalli
11
11
1
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11
add a comment |
1
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11
1
1
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.
for more, see here
and in python, you can implement 10 fold cross-validation using sklearn library here
Now, because your dataset is already split into 10 fold, you have two choices:
1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.
2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.
this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.
$endgroup$
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48095%2fconfusion-in-applying-k-fold-cross-validation-to-dataset%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.
for more, see here
and in python, you can implement 10 fold cross-validation using sklearn library here
Now, because your dataset is already split into 10 fold, you have two choices:
1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.
2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.
this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.
$endgroup$
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
add a comment |
$begingroup$
In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.
for more, see here
and in python, you can implement 10 fold cross-validation using sklearn library here
Now, because your dataset is already split into 10 fold, you have two choices:
1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.
2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.
this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.
$endgroup$
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
add a comment |
$begingroup$
In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.
for more, see here
and in python, you can implement 10 fold cross-validation using sklearn library here
Now, because your dataset is already split into 10 fold, you have two choices:
1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.
2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.
this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.
$endgroup$
In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.
for more, see here
and in python, you can implement 10 fold cross-validation using sklearn library here
Now, because your dataset is already split into 10 fold, you have two choices:
1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.
2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.
this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.
edited Mar 28 at 13:02
answered Mar 27 at 16:35
honar.cshonar.cs
31614
31614
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
add a comment |
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The data set is already split to 10 folds with each fold internally split into train,test and validation sets. In this case, how to apply 10-fold cross validation?. @honar.cs
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:38
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
The answer you told is applicable in the case where each fold doesn't have internal split into train, test and validation sets.
$endgroup$
– Kalyan Katikapalli
Mar 27 at 16:40
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
I can ignore the internal split and apply cv. Here the question is, "Is there any other strategy to handle these kinds of datasets"?
$endgroup$
– Kalyan Katikapalli
Mar 28 at 2:11
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
$begingroup$
the question is updated, see if it can help you.
$endgroup$
– honar.cs
Mar 28 at 13:06
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48095%2fconfusion-in-applying-k-fold-cross-validation-to-dataset%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Welcome to this site! If you want to do K-fold CV on these K folds, ignore the inner training-validation-test separations, do the CV, then report the test score. Otherwise, why you are not allowed to ignore the inner separations and merge them? The answer to this question is key and depends on your specific case.
$endgroup$
– Esmailian
Mar 27 at 16:11