How to incorporate an attribute that only exists in some observations?How to best represent rate or proportion as a feature?Distance measure calculation addresses for record linkingDealing with a dataset where a subset of points live in a higher dimensional spaceFix missing data by adding another feature instead of using the mean?Using python and machine learning to extract information from an invoice? Inital dataset?Missing value in continuous variable: Indicator variable vs. Indicator valueDealing with NaN (missing) values for Logistic Regression- Best practices?Missing Values In New DataHow to deal with missing data for Bernoulli Naive Bayes?

What precisely is a link?

Why is Thanos so tough at the beginning of "Avengers: Endgame"?

How did Captain America use this power?

If Earth is tilted, why is Polaris always above the same spot?

CRT Oscilloscope - part of the plot is missing

Is it the same airport YUL and YMQ in Canada?

Was Hulk present at this event?

Scientific German Translation (from a Nobel Prize Winning Work)

Why was Germany not as successful as other Europeans in establishing overseas colonies?

If 1. e4 c6 is considered as a sound defense for black, why is 1. c3 so rare?

Password expiration with Password manager

Poin of the the Dothraki's attack in GOT, S8, E3?

Visa for volunteering in England

Hang 20lb projector screen on Hardieplank

Entropy as a function of temperature: is temperature well defined?

Why do freehub and cassette have only one position that matches?

Any examples of headwear for races with animal ears?

What happens if I start too many background jobs?

Why was the battle set up *outside* Winterfell?

Transfer over $10k

Disabling Resource Governor in SQL Server

Why is this a valid proof for the harmonic series?

How did Arya get back her dagger from Sansa?

Meaning of "individuandum"



How to incorporate an attribute that only exists in some observations?


How to best represent rate or proportion as a feature?Distance measure calculation addresses for record linkingDealing with a dataset where a subset of points live in a higher dimensional spaceFix missing data by adding another feature instead of using the mean?Using python and machine learning to extract information from an invoice? Inital dataset?Missing value in continuous variable: Indicator variable vs. Indicator valueDealing with NaN (missing) values for Logistic Regression- Best practices?Missing Values In New DataHow to deal with missing data for Bernoulli Naive Bayes?













1












$begingroup$


In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").



Example python data:



import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))

event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1


Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
    $endgroup$
    – oW_
    Apr 8 at 21:11







  • 1




    $begingroup$
    I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
    $endgroup$
    – Riebeckite
    Apr 9 at 0:30















1












$begingroup$


In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").



Example python data:



import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))

event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1


Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
    $endgroup$
    – oW_
    Apr 8 at 21:11







  • 1




    $begingroup$
    I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
    $endgroup$
    – Riebeckite
    Apr 9 at 0:30













1












1








1





$begingroup$


In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").



Example python data:



import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))

event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1


Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?










share|improve this question











$endgroup$




In a binary classification problem, some of my observations have an event that occurs. I can, obviously, add a 1/0 flag if the event occurs ("event_occurred" in the data below). However, my intuition is that the class is related to the number of days since that event occurred. I'd like to somehow include the number of days since the event occurred in my model ("days_since_event").



Example python data:



import pandas as pd
df = pd.DataFrame('event_date':
pd.Series(['2019-02-25','','2019-01-31','','2019-03-03']),
'event_occurred': pd.Series([1,0,1,0,1]),
'days_since_event': pd.Series([42, '', 67, '', 36]),
'class': pd.Series([1,2,2,1,1]))

event_date event_occurred days_since_event class
0 2019-02-25 1 42 1
1 0 2
2 2019-01-31 1 67 2
3 0 1
4 2019-03-03 1 36 1


Is this a standard missing data problem or is there a way to better represent this data in a model-friendly format? Is this a situation where I can fill the missing observations with a global value and trust that the model will learn to ignore that value if "event_occurred" is 0?







feature-extraction feature-engineering missing-data






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 8 at 20:09







Riebeckite

















asked Apr 8 at 20:03









RiebeckiteRiebeckite

63




63







  • 1




    $begingroup$
    Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
    $endgroup$
    – oW_
    Apr 8 at 21:11







  • 1




    $begingroup$
    I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
    $endgroup$
    – Riebeckite
    Apr 9 at 0:30












  • 1




    $begingroup$
    Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
    $endgroup$
    – oW_
    Apr 8 at 21:11







  • 1




    $begingroup$
    I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
    $endgroup$
    – Riebeckite
    Apr 9 at 0:30







1




1




$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_
Apr 8 at 21:11





$begingroup$
Do you have any particular model in mind? That might help answer your question. Most tree based models, for example, would be able to handle this kind of situation without having to replace the missing values.
$endgroup$
– oW_
Apr 8 at 21:11





1




1




$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30




$begingroup$
I've been using logistic regression but I wanted to try gradient-boosted decision trees via LightGBM too. It looks like LightGBM can handle missing values out of the box like you suggested. I'll give that a try, thanks!
$endgroup$
– Riebeckite
Apr 9 at 0:30










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48913%2fhow-to-incorporate-an-attribute-that-only-exists-in-some-observations%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48913%2fhow-to-incorporate-an-attribute-that-only-exists-in-some-observations%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High