How to use Machine Learning to discover important biomarkers in an unbalanced small data set Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsRegression for binary classification and AUC metricHow to deal with a machine learning model which affects future ground truth data?Data representation (NoSQL database?) for a medical studyHandling data imbalance and class number for classificationHow to classify parametric curves?How do I control for some patients providing multiple samples in my training data?Can we make two separate models vs one for classification?Multivariate time Series classification - One classBuilding a minimal encoding of nominal labels from numerical featuresoversampling data with subclass

The logistics of corpse disposal

How does a Death Domain cleric's Touch of Death feature work with Touch-range spells delivered by familiars?

Is there any avatar supposed to be born between the death of Krishna and the birth of Kalki?

What is the musical term for a note that continously plays through a melody?

Why is "Consequences inflicted." not a sentence?

Bonus calculation: Am I making a mountain out of a molehill?

Is a manifold-with-boundary with given interior and non-empty boundary essentially unique?

Gastric acid as a weapon

How can I make names more distinctive without making them longer?

Why was the term "discrete" used in discrete logarithm?

Is it true that "carbohydrates are of no use for the basal metabolic need"?

When -s is used with third person singular. What's its use in this context?

Models of set theory where not every set can be linearly ordered

I am not a queen, who am I?

When is phishing education going too far?

Is above average number of years spent on PhD considered a red flag in future academia or industry positions?

What LEGO pieces have "real-world" functionality?

Does accepting a pardon have any bearing on trying that person for the same crime in a sovereign jurisdiction?

How to recreate this effect in Photoshop?

Stars Make Stars

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

Is there a concise way to say "all of the X, one of each"?

Super Attribute Position on Product Page Magento 1

Why are there no cargo aircraft with "flying wing" design?

How to use Machine Learning to discover important biomarkers in an unbalanced small data set

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsRegression for binary classification and AUC metricHow to deal with a machine learning model which affects future ground truth data?Data representation (NoSQL database?) for a medical studyHandling data imbalance and class number for classificationHow to classify parametric curves?How do I control for some patients providing multiple samples in my training data?Can we make two separate models vs one for classification?Multivariate time Series classification - One classBuilding a minimal encoding of nominal labels from numerical featuresoversampling data with subclass

I have a project which I am just starting out, I am only just learning machine learning and statistics so I am somewhat unsure as to what approaches will be good to start off with, and I am sorry if this does not belong here.

The data set is of different patients carrying a certain disease and each patient has different biomarkers and physical measurements such as heart rate at different time points, until death, if they do die. I was told that the goal was to identify the key features, which would be associated with a a patient dying.

I only have 33 patients, and only 16 of them have died. But disregarding patient the biomarkers came from I have 300 odd time slots, I first tried to approach it as a binary classification problem, classifying the 'death' point from other points. The problems were:

The data imbalance and,

How to you interpret the models to discover most important features.

For imbalance, I tried SMOTE oversampling with didn't work as I thought, then I randomly under-sampled, which gave decent results but the data set was even smaller, so I wasn't sure if its a good idea.

Simple binary classification models like Gaussian Naive Bayes and Logistic Regression did okay even with the imbalanced data, but they don't (at least as far as I know) give a way to discern feature importance..

So my main questions are:

What's the best way to approach this problem, or in general what kind of approaches work when you want to identify most influential features (data measurements).

If I do want to approach it as a binary classification problem what approaches can I take to combat class imbalance?

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

1

$begingroup$
I think you need to be a bit more specific on what your questions is or it will be difficult to provide you with helpful answers
$endgroup$
– oW_♦
Apr 1 at 20:56

$begingroup$
You are going to have a hard time getting solid results for testing biomarkers with sample sizes that low unless that effect sizes are huge. The number of different measurements doesn't really increase statistical power very much. You might want to look at prior research on "severity of illness" to get an idea of what has already been discovered. There is quite a bit already done in the intensive care literature using various techniques.
$endgroup$
– 42-
Apr 2 at 0:13

add a comment |

The data imbalance and,

How to you interpret the models to discover most important features.

So my main questions are:

What's the best way to approach this problem, or in general what kind of approaches work when you want to identify most influential features (data measurements).

If I do want to approach it as a binary classification problem what approaches can I take to combat class imbalance?

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

1

$begingroup$
I think you need to be a bit more specific on what your questions is or it will be difficult to provide you with helpful answers
$endgroup$
– oW_♦
Apr 1 at 20:56

$begingroup$
You are going to have a hard time getting solid results for testing biomarkers with sample sizes that low unless that effect sizes are huge. The number of different measurements doesn't really increase statistical power very much. You might want to look at prior research on "severity of illness" to get an idea of what has already been discovered. There is quite a bit already done in the intensive care literature using various techniques.
$endgroup$
– 42-
Apr 2 at 0:13

add a comment |

The data imbalance and,

How to you interpret the models to discover most important features.

So my main questions are:

What's the best way to approach this problem, or in general what kind of approaches work when you want to identify most influential features (data measurements).

If I do want to approach it as a binary classification problem what approaches can I take to combat class imbalance?

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

The data imbalance and,

How to you interpret the models to discover most important features.

So my main questions are:

What's the best way to approach this problem, or in general what kind of approaches work when you want to identify most influential features (data measurements).

If I do want to approach it as a binary classification problem what approaches can I take to combat class imbalance?

machine-learning data-mining

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

edited Apr 2 at 1:30

Stephen Rauch♦

1,52551330

asked Apr 1 at 20:06

Infinity

asked Apr 1 at 20:06

Infinity

asked Apr 1 at 20:06

Infinity

1

$begingroup$
I think you need to be a bit more specific on what your questions is or it will be difficult to provide you with helpful answers
$endgroup$
– oW_♦
Apr 1 at 20:56

$begingroup$
You are going to have a hard time getting solid results for testing biomarkers with sample sizes that low unless that effect sizes are huge. The number of different measurements doesn't really increase statistical power very much. You might want to look at prior research on "severity of illness" to get an idea of what has already been discovered. There is quite a bit already done in the intensive care literature using various techniques.
$endgroup$
– 42-
Apr 2 at 0:13

add a comment |

1

$begingroup$
I think you need to be a bit more specific on what your questions is or it will be difficult to provide you with helpful answers
$endgroup$
– oW_♦
Apr 1 at 20:56

$begingroup$
You are going to have a hard time getting solid results for testing biomarkers with sample sizes that low unless that effect sizes are huge. The number of different measurements doesn't really increase statistical power very much. You might want to look at prior research on "severity of illness" to get an idea of what has already been discovered. There is quite a bit already done in the intensive care literature using various techniques.
$endgroup$
– 42-
Apr 2 at 0:13

I think you need to be a bit more specific on what your questions is or it will be difficult to provide you with helpful answers

– oW_♦
Apr 1 at 20:56

You are going to have a hard time getting solid results for testing biomarkers with sample sizes that low unless that effect sizes are huge. The number of different measurements doesn't really increase statistical power very much. You might want to look at prior research on "severity of illness" to get an idea of what has already been discovered. There is quite a bit already done in the intensive care literature using various techniques.

– 42-
Apr 2 at 0:13

add a comment |

1 Answer
1

active

oldest

votes

If your goal is to identify important features, I would say go for a Decision Tree which inherently calculates importance/separation capability of the features while selecting them for splitting the internal nodes. You can also go for an ensemble of decision trees such as RandomForest which will return feature importance based on their average impurity reduction throughout all its trees.

This article can help you set up a basic experiment.

answered Apr 1 at 22:25

Sajid Ahmed

315

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48372%2fhow-to-use-machine-learning-to-discover-important-biomarkers-in-an-unbalanced-sm%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

This article can help you set up a basic experiment.

answered Apr 1 at 22:25

Sajid Ahmed

315

add a comment |

This article can help you set up a basic experiment.

answered Apr 1 at 22:25

Sajid Ahmed

315

add a comment |

This article can help you set up a basic experiment.

answered Apr 1 at 22:25

Sajid Ahmed

315

This article can help you set up a basic experiment.

answered Apr 1 at 22:25

Sajid Ahmed

315

answered Apr 1 at 22:25

Sajid Ahmed

315

answered Apr 1 at 22:25

Sajid Ahmed

315

answered Apr 1 at 22:25

Sajid Ahmed

315

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

1 Answer
1

1 Answer
1

1 Answer
1