Predict how many days late or early someone will finish their workStatistical Commute Analysis in JavaPredicting Soccer: guessing which matches a model will predict correctlyHow do I perform Naive Bayes Classification with a Bayesian Belief Network?How to interpret a decision tree correctly?How to predict the probability of an event?Match users based on the content of their articlesPredict customer action from previous buying historyk-Nearest Neighbours with time series data - how to obtain whole-time-period estimatorsReinforcement algorithm for binary classificationHow to train a model to predict a time window than an event will occur on a website

Can someone publish a story that happened to you?

Minor Revision with suggestion of an alternative proof by reviewer

Is it possible to determine the symmetric encryption method used by output size?

what is the sudo password for a --disabled-password user

Is there any limitation with Arduino Nano serial communication distance?

Size of electromagnet needed to replicate Earth's magnetic field

Is there a way to get a compiler for the original B programming language?

Normal Map bad shading in Rendered display

What does the "ep" capability mean?

Was there a Viking Exchange as well as a Columbian one?

Why do Computer Science majors learn Calculus?

How come there are so many candidates for the 2020 Democratic party presidential nomination?

What makes accurate emulation of old systems a difficult task?

How could Tony Stark make this in Endgame?

Combinable filters

How to solve constants out of the internal energy equation?

Will tsunami waves travel forever if there was no land?

How can the Zone of Truth spell be defeated without the caster knowing?

How to pronounce 'C++' in Spanish

What is the relationship between spectral sequences and obstruction theory?

French for 'It must be my imagination'?

How to creep the reader out with what seems like a normal person?

Is the 5 MB static resource size limit 5,242,880 bytes or 5,000,000 bytes?

Do I have an "anti-research" personality?



Predict how many days late or early someone will finish their work


Statistical Commute Analysis in JavaPredicting Soccer: guessing which matches a model will predict correctlyHow do I perform Naive Bayes Classification with a Bayesian Belief Network?How to interpret a decision tree correctly?How to predict the probability of an event?Match users based on the content of their articlesPredict customer action from previous buying historyk-Nearest Neighbours with time series data - how to obtain whole-time-period estimatorsReinforcement algorithm for binary classificationHow to train a model to predict a time window than an event will occur on a website













1












$begingroup$


So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?



As a concrete example of the problem I'm trying to solve:



John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?



Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.










share|improve this question











$endgroup$











  • $begingroup$
    Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
    $endgroup$
    – jonnor
    Apr 7 at 11:38















1












$begingroup$


So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?



As a concrete example of the problem I'm trying to solve:



John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?



Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.










share|improve this question











$endgroup$











  • $begingroup$
    Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
    $endgroup$
    – jonnor
    Apr 7 at 11:38













1












1








1





$begingroup$


So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?



As a concrete example of the problem I'm trying to solve:



John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?



Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.










share|improve this question











$endgroup$




So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?



As a concrete example of the problem I'm trying to solve:



John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?



Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.







machine-learning time-series predictive-modeling probability markov-process






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 7 at 12:48







GenRincewind

















asked Apr 7 at 10:10









GenRincewindGenRincewind

62




62











  • $begingroup$
    Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
    $endgroup$
    – jonnor
    Apr 7 at 11:38
















  • $begingroup$
    Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
    $endgroup$
    – jonnor
    Apr 7 at 11:38















$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38




$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38










1 Answer
1






active

oldest

votes


















0












$begingroup$

If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.



Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:



|deadline_year|deadline_week_number|deadline_day_of_week|



We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.



|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|



And we expect that delays may depend on the person who performs the task, and who created the task.



|task_owner|task_creator|



Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed (negative=before time, 0=on time).



One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.






share|improve this answer











$endgroup$












  • $begingroup$
    I added the additional information I possess.
    $endgroup$
    – GenRincewind
    Apr 7 at 12:49











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48801%2fpredict-how-many-days-late-or-early-someone-will-finish-their-work%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.



Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:



|deadline_year|deadline_week_number|deadline_day_of_week|



We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.



|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|



And we expect that delays may depend on the person who performs the task, and who created the task.



|task_owner|task_creator|



Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed (negative=before time, 0=on time).



One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.






share|improve this answer











$endgroup$












  • $begingroup$
    I added the additional information I possess.
    $endgroup$
    – GenRincewind
    Apr 7 at 12:49















0












$begingroup$

If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.



Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:



|deadline_year|deadline_week_number|deadline_day_of_week|



We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.



|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|



And we expect that delays may depend on the person who performs the task, and who created the task.



|task_owner|task_creator|



Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed (negative=before time, 0=on time).



One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.






share|improve this answer











$endgroup$












  • $begingroup$
    I added the additional information I possess.
    $endgroup$
    – GenRincewind
    Apr 7 at 12:49













0












0








0





$begingroup$

If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.



Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:



|deadline_year|deadline_week_number|deadline_day_of_week|



We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.



|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|



And we expect that delays may depend on the person who performs the task, and who created the task.



|task_owner|task_creator|



Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed (negative=before time, 0=on time).



One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.






share|improve this answer











$endgroup$



If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.



Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:



|deadline_year|deadline_week_number|deadline_day_of_week|



We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.



|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|



And we expect that delays may depend on the person who performs the task, and who created the task.



|task_owner|task_creator|



Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed (negative=before time, 0=on time).



One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 7 at 14:48

























answered Apr 7 at 12:26









jonnorjonnor

2826




2826











  • $begingroup$
    I added the additional information I possess.
    $endgroup$
    – GenRincewind
    Apr 7 at 12:49
















  • $begingroup$
    I added the additional information I possess.
    $endgroup$
    – GenRincewind
    Apr 7 at 12:49















$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49




$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49

















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48801%2fpredict-how-many-days-late-or-early-someone-will-finish-their-work%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High