Predict how many days late or early someone will finish their workStatistical Commute Analysis in JavaPredicting Soccer: guessing which matches a model will predict correctlyHow do I perform Naive Bayes Classification with a Bayesian Belief Network?How to interpret a decision tree correctly?How to predict the probability of an event?Match users based on the content of their articlesPredict customer action from previous buying historyk-Nearest Neighbours with time series data - how to obtain whole-time-period estimatorsReinforcement algorithm for binary classificationHow to train a model to predict a time window than an event will occur on a website
Can someone publish a story that happened to you?
Minor Revision with suggestion of an alternative proof by reviewer
Is it possible to determine the symmetric encryption method used by output size?
what is the sudo password for a --disabled-password user
Is there any limitation with Arduino Nano serial communication distance?
Size of electromagnet needed to replicate Earth's magnetic field
Is there a way to get a compiler for the original B programming language?
Normal Map bad shading in Rendered display
What does the "ep" capability mean?
Was there a Viking Exchange as well as a Columbian one?
Why do Computer Science majors learn Calculus?
How come there are so many candidates for the 2020 Democratic party presidential nomination?
What makes accurate emulation of old systems a difficult task?
How could Tony Stark make this in Endgame?
Combinable filters
How to solve constants out of the internal energy equation?
Will tsunami waves travel forever if there was no land?
How can the Zone of Truth spell be defeated without the caster knowing?
How to pronounce 'C++' in Spanish
What is the relationship between spectral sequences and obstruction theory?
French for 'It must be my imagination'?
How to creep the reader out with what seems like a normal person?
Is the 5 MB static resource size limit 5,242,880 bytes or 5,000,000 bytes?
Do I have an "anti-research" personality?
Predict how many days late or early someone will finish their work
Statistical Commute Analysis in JavaPredicting Soccer: guessing which matches a model will predict correctlyHow do I perform Naive Bayes Classification with a Bayesian Belief Network?How to interpret a decision tree correctly?How to predict the probability of an event?Match users based on the content of their articlesPredict customer action from previous buying historyk-Nearest Neighbours with time series data - how to obtain whole-time-period estimatorsReinforcement algorithm for binary classificationHow to train a model to predict a time window than an event will occur on a website
$begingroup$
So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?
As a concrete example of the problem I'm trying to solve:
John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?
Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.
machine-learning time-series predictive-modeling probability markov-process
$endgroup$
add a comment |
$begingroup$
So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?
As a concrete example of the problem I'm trying to solve:
John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?
Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.
machine-learning time-series predictive-modeling probability markov-process
$endgroup$
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38
add a comment |
$begingroup$
So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?
As a concrete example of the problem I'm trying to solve:
John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?
Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.
machine-learning time-series predictive-modeling probability markov-process
$endgroup$
So I have a set of deadlines and people, with a database of when those people finished their previous work and how much after the deadline it was, as well as when the work was given. The work itself were articles, so I also have the word count for each. How do you, based on the previous data, calculate the amount of days earlier or later somebody will most probably finish their work?
As a concrete example of the problem I'm trying to solve:
John finished his last 5 projects 5,4,3,6,2 days late. What is the most probable amount of days earlier or late he will finish his work?
Basically I'm looking for an appropriate machine learning algortihm to implement to calculate this probable end date.
machine-learning time-series predictive-modeling probability markov-process
machine-learning time-series predictive-modeling probability markov-process
edited Apr 7 at 12:48
GenRincewind
asked Apr 7 at 10:10
GenRincewindGenRincewind
62
62
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38
add a comment |
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.
Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:
|deadline_year|deadline_week_number|deadline_day_of_week|
We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.
|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|
And we expect that delays may depend on the person who performs the task, and who created the task.
|task_owner|task_creator|
Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed
(negative=before time, 0=on time).
One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.
$endgroup$
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48801%2fpredict-how-many-days-late-or-early-someone-will-finish-their-work%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.
Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:
|deadline_year|deadline_week_number|deadline_day_of_week|
We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.
|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|
And we expect that delays may depend on the person who performs the task, and who created the task.
|task_owner|task_creator|
Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed
(negative=before time, 0=on time).
One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.
$endgroup$
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
add a comment |
$begingroup$
If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.
Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:
|deadline_year|deadline_week_number|deadline_day_of_week|
We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.
|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|
And we expect that delays may depend on the person who performs the task, and who created the task.
|task_owner|task_creator|
Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed
(negative=before time, 0=on time).
One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.
$endgroup$
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
add a comment |
$begingroup$
If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.
Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:
|deadline_year|deadline_week_number|deadline_day_of_week|
We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.
|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|
And we expect that delays may depend on the person who performs the task, and who created the task.
|task_owner|task_creator|
Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed
(negative=before time, 0=on time).
One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.
$endgroup$
If we assume that each task delivery is independent of eachother, and the process does not change a lot over time (stationary), we can treat this as a standard regression problem.
Since this is about deadlines, we expect that there might be variations over time, or patterns of delay across the seasons of the year or week. So time-based features might look something like:
|deadline_year|deadline_week_number|deadline_day_of_week|
We also expect that the size of a delay might depend on the size of the task. So if you have the start date, or an estimate on number of days, definitely include that. If people can have multiple tasks at the same time, include that also.
|workdays_between_start_and_deadline|workdays_estimated|concurrent_tasks|
And we expect that delays may depend on the person who performs the task, and who created the task.
|task_owner|task_creator|
Use Exploratory Data Analysis and your knowledge about the processes that created to find more of these possible relationships. Use scatterplots of each feature against the target days_delayed
(negative=before time, 0=on time).
One can start with a strong non-linear model like RandomForest. This can give estimates which can be scored (by mean squared error for example), and indicate whether your features are predictive or not.
To get probability intervals, you can use a Bayesian model such as Bayesian Ridge Regression. This is a linear model, so may have to spend more time on feature engineering to make the relationships between feature and target (roughly) linear.
edited Apr 7 at 14:48
answered Apr 7 at 12:26
jonnorjonnor
2826
2826
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
add a comment |
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
$begingroup$
I added the additional information I possess.
$endgroup$
– GenRincewind
Apr 7 at 12:49
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48801%2fpredict-how-many-days-late-or-early-someone-will-finish-their-work%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Very well written question, props! Roughly how many deadlines do you have per person and in total? Do you have access to other data, like a textual description of a task?
$endgroup$
– jonnor
Apr 7 at 11:38