How to plan an analysis to prevent overfitting?how to explain the behaviour: linear svm does better than non-linear RBFHow to represent target variable for chess AIMachine Learning: Writing PoemsHow to perform Logistic Regression with a large number of features?How to approach speech analysis?Model Selection with Oversampling/ Cross-Validation leads to similar test results in 2 approachesI have limited samples for one class, unlimited samples for the other class. Need to balance?ML algorithms for regression in the case of label noise with a known distribution?SciKit-Learn Decision Tree OverfittingHow can I measure the reliability of the specificity of a model with very small train, test, and validation datasets?
What is IP squat space
How many prime numbers are there that can't be written as a sum of two composite numbers?
Could the Saturn V actually have launched astronauts around Venus?
Is having access to past exams cheating and, if yes, could it be proven just by a good grade?
Why doesn't the EU now just force the UK to choose between referendum and no-deal?
Good allowance savings plan?
Why are the outputs of printf and std::cout different
When do we add an hyphen (-) to a complex adjective word?
Does this property of comaximal ideals always holds?
Giving EXEC (@Variable) a Column name and Concatenation
Should we release the security issues we found in our product as CVE or we can just update those on weekly release notes?
Who is our nearest planetary neighbor, on average?
Did CP/M support custom hardware using device drivers?
Is it normal that my co-workers at a fitness company criticize my food choices?
Can anyone tell me why this program fails?
Force user to remove USB token
Why did it take so long to abandon sail after steamships were demonstrated?
How could a female member of a species produce eggs unto death?
Happy pi day, everyone!
Does splitting a potentially monolithic application into several smaller ones help prevent bugs?
Would it take an action or something similar to activate the blindsight property of a Dragon Mask?
Increase thickness of graph lines larger than ultra thick
What has been your most complicated TikZ drawing?
Are the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?
How to plan an analysis to prevent overfitting?
how to explain the behaviour: linear svm does better than non-linear RBFHow to represent target variable for chess AIMachine Learning: Writing PoemsHow to perform Logistic Regression with a large number of features?How to approach speech analysis?Model Selection with Oversampling/ Cross-Validation leads to similar test results in 2 approachesI have limited samples for one class, unlimited samples for the other class. Need to balance?ML algorithms for regression in the case of label noise with a known distribution?SciKit-Learn Decision Tree OverfittingHow can I measure the reliability of the specificity of a model with very small train, test, and validation datasets?
$begingroup$
Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.
I'm working on a little project where my dataset have 6k lines and around 300 features.
As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).
As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.
I guess that if I re-run the program until metrics are good, my algorithm will be overfitted, and I also guess it would be because of the resample of test/training samples, but please correct me if I'm wrong.
If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)
In case it matters, I'm working with python's scikit-learn module.
*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?
machine-learning project-planning
New contributor
$endgroup$
add a comment |
$begingroup$
Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.
I'm working on a little project where my dataset have 6k lines and around 300 features.
As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).
As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.
I guess that if I re-run the program until metrics are good, my algorithm will be overfitted, and I also guess it would be because of the resample of test/training samples, but please correct me if I'm wrong.
If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)
In case it matters, I'm working with python's scikit-learn module.
*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?
machine-learning project-planning
New contributor
$endgroup$
add a comment |
$begingroup$
Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.
I'm working on a little project where my dataset have 6k lines and around 300 features.
As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).
As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.
I guess that if I re-run the program until metrics are good, my algorithm will be overfitted, and I also guess it would be because of the resample of test/training samples, but please correct me if I'm wrong.
If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)
In case it matters, I'm working with python's scikit-learn module.
*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?
machine-learning project-planning
New contributor
$endgroup$
Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.
I'm working on a little project where my dataset have 6k lines and around 300 features.
As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).
As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.
I guess that if I re-run the program until metrics are good, my algorithm will be overfitted, and I also guess it would be because of the resample of test/training samples, but please correct me if I'm wrong.
If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)
In case it matters, I'm working with python's scikit-learn module.
*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?
machine-learning project-planning
machine-learning project-planning
New contributor
New contributor
New contributor
asked 15 hours ago
Dan ChaltielDan Chaltiel
1113
1113
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47316%2fhow-to-plan-an-analysis-to-prevent-overfitting%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47316%2fhow-to-plan-an-analysis-to-prevent-overfitting%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown