is it bad to have many different measurements for the same target variable?2019 Community Moderator ElectionBinary classification with unexplained dataDoes variation in data density over time affect regression models?Consistently inconsistent cross-validation results that are wildly different from original model accuracyHow to handle the target variable being in the featuresIs removing poorly predicted data points a valid approach?Is it valid to include your validation data in your vocabulary for NLP?How to apply machine learning model to new datasetClarification about Normalized Discounted Cumulative Gain (NDCG) together with Regression for Ranking?How important is it for each row of data to have the same number of features?How do I correctly build model on given data to predict target parameter?

In 'Revenger,' what does 'cove' come from?

How to set continue counter from another counter (latex)?

Do Iron Man suits sport waste management systems?

What is a Samsaran Word™?

Mathematica command that allows it to read my intentions

Is it possible for a PC to dismember a humanoid?

Array of objects return object when condition matched

Venezuelan girlfriend wants to travel the USA to be with me. What is the process?

What historical events would have to change in order to make 19th century "steampunk" technology possible?

Forgetting the musical notes while performing in concert

Get order collection by order id in Magento 2?

How dangerous is XSS

How to prevent "they're falling in love" trope

Could the museum Saturn V's be refitted for one more flight?

Is it possible to mathematically extract an AES key from black-box encrypt/decrypt hardware?

Personal Teleportation: From Rags to Riches

Where would I need my direct neural interface to be implanted?

Detention in 1997

Different meanings of こわい

Simple macro for new # symbol

How to properly check if the given string is empty in a POSIX shell script?

Should I tell management that I intend to leave due to bad software development practices?

How can a day be exactly 24 hours long?

How can I deal with my CEO asking me to hire someone with a higher salary than me, a co-founder?

is it bad to have many different measurements for the same target variable?

2019 Community Moderator ElectionBinary classification with unexplained dataDoes variation in data density over time affect regression models?Consistently inconsistent cross-validation results that are wildly different from original model accuracyHow to handle the target variable being in the featuresIs removing poorly predicted data points a valid approach?Is it valid to include your validation data in your vocabulary for NLP?How to apply machine learning model to new datasetClarification about Normalized Discounted Cumulative Gain (NDCG) together with Regression for Ranking?How important is it for each row of data to have the same number of features?How do I correctly build model on given data to predict target parameter?

I'm working on a dataset that has repeated measurements for the same target variable.

When I don't change anything and create model, cross validation overfits with 0.99 score but in testset it gives around 0.39.

When I use mean, std, skew, quartiles for each measurement to have only one measurement for each feature, it gives a much better score.

Can anyone explain to me why? and when it is good to use the second method?

the original dataset looks like this (all numbers are fake):

id /measurement1/measurement2/.../target/
0-1/0.18283 /0.12855 /.../ 1 /
0-2/0.1141 /0.38484 /.../ 1 /
0-3/0.4475 /0.18374 /.../ 1 /

and transformed dataset looks like this:

id /meas1_avg/meas1_std/meas1_skew/meas2_avg/meas2_std/.../target/
0 /0.28747 /0.183848/ 0.198384 /0.18484 /0.28474 /.../ 1 /

asked Mar 26 at 14:58

edunlimit

203

add a comment |

I'm working on a dataset that has repeated measurements for the same target variable.

When I don't change anything and create model, cross validation overfits with 0.99 score but in testset it gives around 0.39.

When I use mean, std, skew, quartiles for each measurement to have only one measurement for each feature, it gives a much better score.

Can anyone explain to me why? and when it is good to use the second method?

the original dataset looks like this (all numbers are fake):

id /measurement1/measurement2/.../target/
0-1/0.18283 /0.12855 /.../ 1 /
0-2/0.1141 /0.38484 /.../ 1 /
0-3/0.4475 /0.18374 /.../ 1 /

and transformed dataset looks like this:

id /meas1_avg/meas1_std/meas1_skew/meas2_avg/meas2_std/.../target/
0 /0.28747 /0.183848/ 0.198384 /0.18484 /0.28474 /.../ 1 /

asked Mar 26 at 14:58

edunlimit

203

add a comment |

I'm working on a dataset that has repeated measurements for the same target variable.

When I don't change anything and create model, cross validation overfits with 0.99 score but in testset it gives around 0.39.

When I use mean, std, skew, quartiles for each measurement to have only one measurement for each feature, it gives a much better score.

Can anyone explain to me why? and when it is good to use the second method?

the original dataset looks like this (all numbers are fake):

id /measurement1/measurement2/.../target/
0-1/0.18283 /0.12855 /.../ 1 /
0-2/0.1141 /0.38484 /.../ 1 /
0-3/0.4475 /0.18374 /.../ 1 /

and transformed dataset looks like this:

id /meas1_avg/meas1_std/meas1_skew/meas2_avg/meas2_std/.../target/
0 /0.28747 /0.183848/ 0.198384 /0.18484 /0.28474 /.../ 1 /

asked Mar 26 at 14:58

edunlimit

203

I'm working on a dataset that has repeated measurements for the same target variable.

When I don't change anything and create model, cross validation overfits with 0.99 score but in testset it gives around 0.39.

When I use mean, std, skew, quartiles for each measurement to have only one measurement for each feature, it gives a much better score.

Can anyone explain to me why? and when it is good to use the second method?

the original dataset looks like this (all numbers are fake):

id /measurement1/measurement2/.../target/
0-1/0.18283 /0.12855 /.../ 1 /
0-2/0.1141 /0.38484 /.../ 1 /
0-3/0.4475 /0.18374 /.../ 1 /

and transformed dataset looks like this:

id /meas1_avg/meas1_std/meas1_skew/meas2_avg/meas2_std/.../target/
0 /0.28747 /0.183848/ 0.198384 /0.18484 /0.28474 /.../ 1 /

machine-learning feature-engineering data-science-model

asked Mar 26 at 14:58

edunlimit

203

asked Mar 26 at 14:58

edunlimit

203

asked Mar 26 at 14:58

edunlimit

203

asked Mar 26 at 14:58

edunlimit

203

asked Mar 26 at 14:58

edunlimit

203

add a comment |

1 Answer
1

active

oldest

votes

Note that you are solving two different problems here.

In the first problem, you want to predict the target variable given one noisy measurement.

In the second problem, you want to predict the target variable given some statistics from a group of noisy measurements.

Your results show that the second problem is easier to solve which is intuitive, since the amount of noise (variance) for average of multiple measurements is less than only one measurement (closely related to Law of Large Numbers), thus the relation in the second problem is easier to find by the model.

Therefore, if both problems are equivalent to you, go with the second problem which is easier to solve.

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48036%2fis-it-bad-to-have-many-different-measurements-for-the-same-target-variable%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Note that you are solving two different problems here.

In the first problem, you want to predict the target variable given one noisy measurement.

In the second problem, you want to predict the target variable given some statistics from a group of noisy measurements.

Therefore, if both problems are equivalent to you, go with the second problem which is easier to solve.

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

add a comment |

Note that you are solving two different problems here.

In the first problem, you want to predict the target variable given one noisy measurement.

In the second problem, you want to predict the target variable given some statistics from a group of noisy measurements.

Therefore, if both problems are equivalent to you, go with the second problem which is easier to solve.

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

add a comment |

Note that you are solving two different problems here.

In the first problem, you want to predict the target variable given one noisy measurement.

In the second problem, you want to predict the target variable given some statistics from a group of noisy measurements.

Therefore, if both problems are equivalent to you, go with the second problem which is easier to solve.

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

Note that you are solving two different problems here.

In the first problem, you want to predict the target variable given one noisy measurement.

In the second problem, you want to predict the target variable given some statistics from a group of noisy measurements.

Therefore, if both problems are equivalent to you, go with the second problem which is easier to solve.

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

edited Mar 26 at 18:09

answered Mar 26 at 15:06

Esmailian

2,536318

answered Mar 26 at 15:06

Esmailian

2,536318

answered Mar 26 at 15:06

Esmailian

2,536318

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

zC wIYR,jNN 3fXdh

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

1 Answer
1

1 Answer
1

1 Answer
1