How to incorporate an attribute that only exists in some observations?How to best represent rate or proportion as a feature?Distance measure calculation addresses for record linkingDealing with a dataset where a subset of points live in a higher dimensional spaceFix missing data by adding another feature instead of using the mean?Using python and machine learning to extract information from an invoice? Inital dataset?Missing value in continuous variable: Indicator variable vs. Indicator valueDealing with NaN (missing) values for Logistic Regression- Best practices?Missing Values In New DataHow to deal with missing data for Bernoulli Naive Bayes?
Multi tool use
What precisely is a link?
Why is Thanos so tough at the beginning of "Avengers: Endgame"?
How did Captain America use this power?
If Earth is tilted, why is Polaris always above the same spot?
CRT Oscilloscope - part of the plot is missing
Is it the same airport YUL and YMQ in Canada?
Was Hulk present at this event?
Scientific German Translation (from a Nobel Prize Winning Work)
Why was Germany not as successful as other Europeans in establishing overseas colonies?
If 1. e4 c6 is considered as a sound defense for black, why is 1. c3 so rare?
Password expiration with Password manager
Poin of the the Dothraki's attack in GOT, S8, E3?
Visa for volunteering in England
Hang 20lb projector screen on Hardieplank
Entropy as a function of temperature: is temperature well defined?
Why do freehub and cassette have only one position that matches?
Any examples of headwear for races with animal ears?
What happens if I start too many background jobs?
Why was the battle set up *outside* Winterfell?
Transfer over $10k
Disabling Resource Governor in SQL Server
Why is this a valid proof for the harmonic series?
How did Arya get back her dagger from Sansa?
Meaning of "individuandum"
How to incorporate an attribute that only exists in some observations?
How to best represent rate or proportion as a feature?Distance measure calculation addresses for record linkingDealing with a dataset where a subset of points live in a higher dimensional spaceFix missing data by adding another feature instead of using the mean?Using python and machine learning to extract information from an invoice? Inital dataset?Missing value in continuous variable: Indicator variable vs. Indicator valueDealing with NaN (missing) values for Logistic Regression- Best practices?Missing Values In New DataHow to deal with missing data for Bernoulli Naive Bayes?
$begingroup$
In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").
Example python data:
import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))
event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1
Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?
feature-extraction feature-engineering missing-data
$endgroup$
add a comment |
$begingroup$
In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").
Example python data:
import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))
event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1
Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?
feature-extraction feature-engineering missing-data
$endgroup$
1
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
1
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30
add a comment |
$begingroup$
In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").
Example python data:
import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))
event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1
Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?
feature-extraction feature-engineering missing-data
$endgroup$
In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").
Example python data:
import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))
event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1
Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?
feature-extraction feature-engineering missing-data
feature-extraction feature-engineering missing-data
edited Apr 8 at 20:09
Riebeckite
asked Apr 8 at 20:03
RiebeckiteRiebeckite
63
63
1
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
1
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30
add a comment |
1
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
1
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30
1
1
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
1
1
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48913%2fhow-to-incorporate-an-attribute-that-only-exists-in-some-observations%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48913%2fhow-to-incorporate-an-attribute-that-only-exists-in-some-observations%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
S9RN MNVkPhov1WZNwVuIrkw,wUK49b
1
$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_♦
Apr 8 at 21:11
1
$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30