How to plan an analysis to prevent overfitting?how to explain the behaviour: linear svm does better than non-linear RBFHow to represent target variable for chess AIMachine Learning: Writing PoemsHow to perform Logistic Regression with a large number of features?How to approach speech analysis?Model Selection with Oversampling/ Cross-Validation leads to similar test results in 2 approachesI have limited samples for one class, unlimited samples for the other class. Need to balance?ML algorithms for regression in the case of label noise with a known distribution?SciKit-Learn Decision Tree OverfittingHow can I measure the reliability of the specificity of a model with very small train, test, and validation datasets?

What is IP squat space

How many prime numbers are there that can't be written as a sum of two composite numbers?

Could the Saturn V actually have launched astronauts around Venus?

Is having access to past exams cheating and, if yes, could it be proven just by a good grade?

Why doesn't the EU now just force the UK to choose between referendum and no-deal?

Good allowance savings plan?

Why are the outputs of printf and std::cout different

When do we add an hyphen (-) to a complex adjective word?

Does this property of comaximal ideals always holds?

Giving EXEC (@Variable) a Column name and Concatenation

Should we release the security issues we found in our product as CVE or we can just update those on weekly release notes?

Who is our nearest planetary neighbor, on average?

Did CP/M support custom hardware using device drivers?

Is it normal that my co-workers at a fitness company criticize my food choices?

Can anyone tell me why this program fails?

Force user to remove USB token

Why did it take so long to abandon sail after steamships were demonstrated?

How could a female member of a species produce eggs unto death?

Happy pi day, everyone!

Does splitting a potentially monolithic application into several smaller ones help prevent bugs?

Would it take an action or something similar to activate the blindsight property of a Dragon Mask?

Increase thickness of graph lines larger than ultra thick

What has been your most complicated TikZ drawing?

Are the common programs (for example: "ls", "cat") in Linux and BSD come from the same source code?

How to plan an analysis to prevent overfitting?

how to explain the behaviour: linear svm does better than non-linear RBFHow to represent target variable for chess AIMachine Learning: Writing PoemsHow to perform Logistic Regression with a large number of features?How to approach speech analysis?Model Selection with Oversampling/ Cross-Validation leads to similar test results in 2 approachesI have limited samples for one class, unlimited samples for the other class. Need to balance?ML algorithms for regression in the case of label noise with a known distribution?SciKit-Learn Decision Tree OverfittingHow can I measure the reliability of the specificity of a model with very small train, test, and validation datasets?

Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.

I'm working on a little project where my dataset have 6k lines and around 300 features.

As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).

As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.

I guess that if I re-run the program until metrics are good, my algorithm will be overfitted, and I also guess it would be because of the resample of test/training samples, but please correct me if I'm wrong.

If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)

In case it matters, I'm working with python's scikit-learn module.

*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?

asked 15 hours ago

Dan Chaltiel

1113

New contributor

add a comment |

Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.

I'm working on a little project where my dataset have 6k lines and around 300 features.

As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).

As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.

If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)

In case it matters, I'm working with python's scikit-learn module.

*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?

asked 15 hours ago

Dan Chaltiel

1113

New contributor

add a comment |

Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.

I'm working on a little project where my dataset have 6k lines and around 300 features.

As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).

As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.

If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)

In case it matters, I'm working with python's scikit-learn module.

*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?

asked 15 hours ago

Dan Chaltiel

1113

New contributor

Coming from statistics, I'm freshly trying to learn machine learning. I've read a lot of tutorials about ML, but have no real training.

I'm working on a little project where my dataset have 6k lines and around 300 features.

As I've read in my tutorials, I split my dataset into a training sample (80%) and a testing sample (20%), and then train my algorithm on the training sample with cross-validation (5 folds).

As I re-ran my program twice (I've only tested KNN which I now know is quite not appropriate), I got really different results, with different sensitivity, specificity and precision.

If I'm going to try a lot of algorithms to see what I can get, should I fix my samples somewhere ? Is it even OK to do so ? (it would not always be in statistics)

In case it matters, I'm working with python's scikit-learn module.

*PS: my outcome is binary and my features are mostly binary, with few categorial and few numeric. I'm thinking about logistic, but which algorithm would be the best one ?

machine-learning project-planning

asked 15 hours ago

Dan Chaltiel

1113

New contributor

asked 15 hours ago

Dan Chaltiel

1113

New contributor

asked 15 hours ago

Dan Chaltiel

1113

New contributor

asked 15 hours ago

Dan Chaltiel

1113

asked 15 hours ago

Dan Chaltiel

1113

New contributor

Dan Chaltiel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47316%2fhow-to-plan-an-analysis-to-prevent-overfitting%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Dan Chaltiel is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

cfX,rdRtpA0QrmsEVHvGdEV1ujt6Kyk gnX4vkZEv4isU hgqRx87

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli