On the choice of LSTM input/output dimension for a spatio-temporal problem The Next CEO of Stack Overflow2019 Community Moderator ElectionLSTM unit: cell state dimensionWhere is the output in the LSTM?Input and output Dimension of LSTM RNNDefining Input Shape for Time Series using LSTM in KerasUnderstanding LSTM input shape for kerasArchitecture help for multivariate input and output LSTM modelsHow to feed output of predict value back into the input using LSTM in pythonUnderstanding output of LSTM for regressionHow to design a LSTM network with different number of input/output units?Zero padding for LSTM input
Are police here, aren't itthey?
I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin
Is it allowed to be an Apple trusted developer with pure Java
What connection does MS Office have to Netscape Navigator?
Why do airplanes bank sharply to the right after air-to-air refueling?
Is there a difference between "Fahrstuhl" and "Aufzug"
Does increasing your ability score affect your main stat?
Combine columns from several files into one
Would a completely good Muggle be able to use a wand?
Proper way to express "He disappeared them"
Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?
How should I support this large drywall patch?
I want to delete every two lines after 3rd lines in file contain very large number of lines :
What was the first Unix version to run on a microcomputer?
Make solar eclipses exceedingly rare, but still have new moons
How to invert MapIndexed on a ragged structure? How to construct a tree from rules?
The exact meaning of 'Mom made me a sandwich'
Why is quantifier elimination desirable for a given theory?
WOW air has ceased operation, can I get my tickets refunded?
Unreliable Magic - Is it worth it?
Can I equip Skullclamp on a creature I am sacrificing?
No sign flipping while figuring out the emf of voltaic cell?
Won the lottery - how do I keep the money?
Reference request: Grassmannian and Plucker coordinates in type B, C, D
On the choice of LSTM input/output dimension for a spatio-temporal problem
The Next CEO of Stack Overflow2019 Community Moderator ElectionLSTM unit: cell state dimensionWhere is the output in the LSTM?Input and output Dimension of LSTM RNNDefining Input Shape for Time Series using LSTM in KerasUnderstanding LSTM input shape for kerasArchitecture help for multivariate input and output LSTM modelsHow to feed output of predict value back into the input using LSTM in pythonUnderstanding output of LSTM for regressionHow to design a LSTM network with different number of input/output units?Zero padding for LSTM input
$begingroup$
I am using LSTM neural networks from (R)Keras for a matter of spatio-temporal interpolation. I manage to get the network to output predictions but the results are not outstanding (very little improvement on validation loss). I am wondering about the shapes of training data and labels.
Say I have 50 dates of measurements of the variable of interest $y$, accompanied by about 100 covariates $x$ (spatial coordinates, temperatures...). Each date has 24 measurements of $y$, so nsamples=50*24=1200
. If I set the timestep
hyperparameter of LSTM to e.g. 3, and use a moving window of step 1, I have therefore an input table $X$ of shape (1200, 3, 100).
On the other hand, should the labels table $Y$ be of dimension (1200, 3) or (1200, 1) ? More precisely, which of the following describes the problem the best:
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t-2 ; Y_n,t-1 ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t ; Y_n,t ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t+1
$$
Or are they all plausible ways of addressing slightly different problems? As I said, I'm trying to spatially interpolate $Y$ for the 50 dates of measurements, as well as predicting $Y$ for the year(s) to come. So I expect one is more relevant than the others but I have no clue on which one.
I hope this is understandable as I clearly miss some technical vocabulary here.
neural-network keras r lstm
New contributor
$endgroup$
add a comment |
$begingroup$
I am using LSTM neural networks from (R)Keras for a matter of spatio-temporal interpolation. I manage to get the network to output predictions but the results are not outstanding (very little improvement on validation loss). I am wondering about the shapes of training data and labels.
Say I have 50 dates of measurements of the variable of interest $y$, accompanied by about 100 covariates $x$ (spatial coordinates, temperatures...). Each date has 24 measurements of $y$, so nsamples=50*24=1200
. If I set the timestep
hyperparameter of LSTM to e.g. 3, and use a moving window of step 1, I have therefore an input table $X$ of shape (1200, 3, 100).
On the other hand, should the labels table $Y$ be of dimension (1200, 3) or (1200, 1) ? More precisely, which of the following describes the problem the best:
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t-2 ; Y_n,t-1 ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t ; Y_n,t ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t+1
$$
Or are they all plausible ways of addressing slightly different problems? As I said, I'm trying to spatially interpolate $Y$ for the 50 dates of measurements, as well as predicting $Y$ for the year(s) to come. So I expect one is more relevant than the others but I have no clue on which one.
I hope this is understandable as I clearly miss some technical vocabulary here.
neural-network keras r lstm
New contributor
$endgroup$
add a comment |
$begingroup$
I am using LSTM neural networks from (R)Keras for a matter of spatio-temporal interpolation. I manage to get the network to output predictions but the results are not outstanding (very little improvement on validation loss). I am wondering about the shapes of training data and labels.
Say I have 50 dates of measurements of the variable of interest $y$, accompanied by about 100 covariates $x$ (spatial coordinates, temperatures...). Each date has 24 measurements of $y$, so nsamples=50*24=1200
. If I set the timestep
hyperparameter of LSTM to e.g. 3, and use a moving window of step 1, I have therefore an input table $X$ of shape (1200, 3, 100).
On the other hand, should the labels table $Y$ be of dimension (1200, 3) or (1200, 1) ? More precisely, which of the following describes the problem the best:
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t-2 ; Y_n,t-1 ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t ; Y_n,t ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t+1
$$
Or are they all plausible ways of addressing slightly different problems? As I said, I'm trying to spatially interpolate $Y$ for the 50 dates of measurements, as well as predicting $Y$ for the year(s) to come. So I expect one is more relevant than the others but I have no clue on which one.
I hope this is understandable as I clearly miss some technical vocabulary here.
neural-network keras r lstm
New contributor
$endgroup$
I am using LSTM neural networks from (R)Keras for a matter of spatio-temporal interpolation. I manage to get the network to output predictions but the results are not outstanding (very little improvement on validation loss). I am wondering about the shapes of training data and labels.
Say I have 50 dates of measurements of the variable of interest $y$, accompanied by about 100 covariates $x$ (spatial coordinates, temperatures...). Each date has 24 measurements of $y$, so nsamples=50*24=1200
. If I set the timestep
hyperparameter of LSTM to e.g. 3, and use a moving window of step 1, I have therefore an input table $X$ of shape (1200, 3, 100).
On the other hand, should the labels table $Y$ be of dimension (1200, 3) or (1200, 1) ? More precisely, which of the following describes the problem the best:
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t-2 ; Y_n,t-1 ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow (Y_n,t ; Y_n,t ; Y_n,t)
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t
$$
$$
(X_n,t-2 ; X_n,t-1 ; X_n,t) rightarrow Y_n,t+1
$$
Or are they all plausible ways of addressing slightly different problems? As I said, I'm trying to spatially interpolate $Y$ for the 50 dates of measurements, as well as predicting $Y$ for the year(s) to come. So I expect one is more relevant than the others but I have no clue on which one.
I hope this is understandable as I clearly miss some technical vocabulary here.
neural-network keras r lstm
neural-network keras r lstm
New contributor
New contributor
edited Mar 26 at 12:39
Esmailian
2,187218
2,187218
New contributor
asked Mar 25 at 17:20
Yo B.Yo B.
1234
1234
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
On time-series models
All models that you have mentioned are correct and practical depending on the problem (the index $n$ is not required). The second one however produces redundant results which is a waste of computation. Even
$$
(X_t ; X_t+1 ; X_t+2) rightarrow (Y_t-1)
$$
is correct, if you are fitting on an archive and want to predict a year given the covariates from the next three years.
But only the last model
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1)
$$
is a forecasting model. So in general, if you want to interpolate into the next $k$-th year from now $t$, you should use:
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+k)
$$
or
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1,...,Y_t+k)
$$
Even a better model that takes advantage of known $Y$'s in the past would be:
$$
(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow (Y_t+k)
$$
where $|$ denotes vector concatenation to produce a 100 + 1 dimensional vector for each known year.
As a personal opinion, for the time-series prediction task, 24 data points per year is very small compared to the dimension of $X$, which is 100. 1200 samples for $X rightarrow Y$ regression (ignoring the time) is more practical; if selecting 10 from 100 covariates is possible even better.
Because of the small data set, I would suggest:
$(Y_t-m ;...; Y_t-1 ; Y_t) rightarrow (Y_t+k)$ for time series prediction, and
$X rightarrow Y$ regression for estimating the relation between X and Y.
Relation to LSTM and RNN
If we use LSTM/RNN to model time-series, they would be stateful. That is, when input $X_t-2$ is fed to an LSTM, it keeps an internal state (hidden state) to be combined with the next input $X_t-1$ and so on. Regarding the input/output dimension, here is an RNN animation from a post on medium by Raimi Karim that shows an arbitrary step among 3 steps of feeding $(X_t-2 ; X_t-1 ; X_t)$ to the network:
As you see, dimension and number of inputs are independent of output. We can feed 5 inputs $X_t-4$ to $X_t$, each 100 dimension (100d) and receive a 1d output by setting the dimension of hidden states to 1d, or setting it to 10d and use an extra dense layer at the end to convert 10d to 1d, or receive a 50d output, or a 150d (three 50d) output, etc.
Word "stateful" in Keras (source)
LSTM and RNN are stateful by definition, this [badly named] variable in Keras means
If stateful=True, the last state for each sample at index i in a batch will be
used as initial state for the sample of index i in the following
batch. Fabien Chollet
For example, if each batch has 24 samples indexed from 0 to 23 (each sample could have the form $(X_t-2, X_t-1, X_t, Y_t+1)$), then the last hidden state $h$ from 8th sample will be used as the initial hidden state for 8th sample in the next batch. Except for special cases that there is a temporal order between batches and their samples, this must be set to False.
$endgroup$
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not usingstateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure whenstateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!
$endgroup$
– Yo B.
Mar 26 at 10:53
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Yo B. is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47961%2fon-the-choice-of-lstm-input-output-dimension-for-a-spatio-temporal-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
On time-series models
All models that you have mentioned are correct and practical depending on the problem (the index $n$ is not required). The second one however produces redundant results which is a waste of computation. Even
$$
(X_t ; X_t+1 ; X_t+2) rightarrow (Y_t-1)
$$
is correct, if you are fitting on an archive and want to predict a year given the covariates from the next three years.
But only the last model
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1)
$$
is a forecasting model. So in general, if you want to interpolate into the next $k$-th year from now $t$, you should use:
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+k)
$$
or
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1,...,Y_t+k)
$$
Even a better model that takes advantage of known $Y$'s in the past would be:
$$
(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow (Y_t+k)
$$
where $|$ denotes vector concatenation to produce a 100 + 1 dimensional vector for each known year.
As a personal opinion, for the time-series prediction task, 24 data points per year is very small compared to the dimension of $X$, which is 100. 1200 samples for $X rightarrow Y$ regression (ignoring the time) is more practical; if selecting 10 from 100 covariates is possible even better.
Because of the small data set, I would suggest:
$(Y_t-m ;...; Y_t-1 ; Y_t) rightarrow (Y_t+k)$ for time series prediction, and
$X rightarrow Y$ regression for estimating the relation between X and Y.
Relation to LSTM and RNN
If we use LSTM/RNN to model time-series, they would be stateful. That is, when input $X_t-2$ is fed to an LSTM, it keeps an internal state (hidden state) to be combined with the next input $X_t-1$ and so on. Regarding the input/output dimension, here is an RNN animation from a post on medium by Raimi Karim that shows an arbitrary step among 3 steps of feeding $(X_t-2 ; X_t-1 ; X_t)$ to the network:
As you see, dimension and number of inputs are independent of output. We can feed 5 inputs $X_t-4$ to $X_t$, each 100 dimension (100d) and receive a 1d output by setting the dimension of hidden states to 1d, or setting it to 10d and use an extra dense layer at the end to convert 10d to 1d, or receive a 50d output, or a 150d (three 50d) output, etc.
Word "stateful" in Keras (source)
LSTM and RNN are stateful by definition, this [badly named] variable in Keras means
If stateful=True, the last state for each sample at index i in a batch will be
used as initial state for the sample of index i in the following
batch. Fabien Chollet
For example, if each batch has 24 samples indexed from 0 to 23 (each sample could have the form $(X_t-2, X_t-1, X_t, Y_t+1)$), then the last hidden state $h$ from 8th sample will be used as the initial hidden state for 8th sample in the next batch. Except for special cases that there is a temporal order between batches and their samples, this must be set to False.
$endgroup$
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not usingstateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure whenstateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!
$endgroup$
– Yo B.
Mar 26 at 10:53
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
add a comment |
$begingroup$
On time-series models
All models that you have mentioned are correct and practical depending on the problem (the index $n$ is not required). The second one however produces redundant results which is a waste of computation. Even
$$
(X_t ; X_t+1 ; X_t+2) rightarrow (Y_t-1)
$$
is correct, if you are fitting on an archive and want to predict a year given the covariates from the next three years.
But only the last model
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1)
$$
is a forecasting model. So in general, if you want to interpolate into the next $k$-th year from now $t$, you should use:
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+k)
$$
or
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1,...,Y_t+k)
$$
Even a better model that takes advantage of known $Y$'s in the past would be:
$$
(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow (Y_t+k)
$$
where $|$ denotes vector concatenation to produce a 100 + 1 dimensional vector for each known year.
As a personal opinion, for the time-series prediction task, 24 data points per year is very small compared to the dimension of $X$, which is 100. 1200 samples for $X rightarrow Y$ regression (ignoring the time) is more practical; if selecting 10 from 100 covariates is possible even better.
Because of the small data set, I would suggest:
$(Y_t-m ;...; Y_t-1 ; Y_t) rightarrow (Y_t+k)$ for time series prediction, and
$X rightarrow Y$ regression for estimating the relation between X and Y.
Relation to LSTM and RNN
If we use LSTM/RNN to model time-series, they would be stateful. That is, when input $X_t-2$ is fed to an LSTM, it keeps an internal state (hidden state) to be combined with the next input $X_t-1$ and so on. Regarding the input/output dimension, here is an RNN animation from a post on medium by Raimi Karim that shows an arbitrary step among 3 steps of feeding $(X_t-2 ; X_t-1 ; X_t)$ to the network:
As you see, dimension and number of inputs are independent of output. We can feed 5 inputs $X_t-4$ to $X_t$, each 100 dimension (100d) and receive a 1d output by setting the dimension of hidden states to 1d, or setting it to 10d and use an extra dense layer at the end to convert 10d to 1d, or receive a 50d output, or a 150d (three 50d) output, etc.
Word "stateful" in Keras (source)
LSTM and RNN are stateful by definition, this [badly named] variable in Keras means
If stateful=True, the last state for each sample at index i in a batch will be
used as initial state for the sample of index i in the following
batch. Fabien Chollet
For example, if each batch has 24 samples indexed from 0 to 23 (each sample could have the form $(X_t-2, X_t-1, X_t, Y_t+1)$), then the last hidden state $h$ from 8th sample will be used as the initial hidden state for 8th sample in the next batch. Except for special cases that there is a temporal order between batches and their samples, this must be set to False.
$endgroup$
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not usingstateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure whenstateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!
$endgroup$
– Yo B.
Mar 26 at 10:53
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
add a comment |
$begingroup$
On time-series models
All models that you have mentioned are correct and practical depending on the problem (the index $n$ is not required). The second one however produces redundant results which is a waste of computation. Even
$$
(X_t ; X_t+1 ; X_t+2) rightarrow (Y_t-1)
$$
is correct, if you are fitting on an archive and want to predict a year given the covariates from the next three years.
But only the last model
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1)
$$
is a forecasting model. So in general, if you want to interpolate into the next $k$-th year from now $t$, you should use:
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+k)
$$
or
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1,...,Y_t+k)
$$
Even a better model that takes advantage of known $Y$'s in the past would be:
$$
(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow (Y_t+k)
$$
where $|$ denotes vector concatenation to produce a 100 + 1 dimensional vector for each known year.
As a personal opinion, for the time-series prediction task, 24 data points per year is very small compared to the dimension of $X$, which is 100. 1200 samples for $X rightarrow Y$ regression (ignoring the time) is more practical; if selecting 10 from 100 covariates is possible even better.
Because of the small data set, I would suggest:
$(Y_t-m ;...; Y_t-1 ; Y_t) rightarrow (Y_t+k)$ for time series prediction, and
$X rightarrow Y$ regression for estimating the relation between X and Y.
Relation to LSTM and RNN
If we use LSTM/RNN to model time-series, they would be stateful. That is, when input $X_t-2$ is fed to an LSTM, it keeps an internal state (hidden state) to be combined with the next input $X_t-1$ and so on. Regarding the input/output dimension, here is an RNN animation from a post on medium by Raimi Karim that shows an arbitrary step among 3 steps of feeding $(X_t-2 ; X_t-1 ; X_t)$ to the network:
As you see, dimension and number of inputs are independent of output. We can feed 5 inputs $X_t-4$ to $X_t$, each 100 dimension (100d) and receive a 1d output by setting the dimension of hidden states to 1d, or setting it to 10d and use an extra dense layer at the end to convert 10d to 1d, or receive a 50d output, or a 150d (three 50d) output, etc.
Word "stateful" in Keras (source)
LSTM and RNN are stateful by definition, this [badly named] variable in Keras means
If stateful=True, the last state for each sample at index i in a batch will be
used as initial state for the sample of index i in the following
batch. Fabien Chollet
For example, if each batch has 24 samples indexed from 0 to 23 (each sample could have the form $(X_t-2, X_t-1, X_t, Y_t+1)$), then the last hidden state $h$ from 8th sample will be used as the initial hidden state for 8th sample in the next batch. Except for special cases that there is a temporal order between batches and their samples, this must be set to False.
$endgroup$
On time-series models
All models that you have mentioned are correct and practical depending on the problem (the index $n$ is not required). The second one however produces redundant results which is a waste of computation. Even
$$
(X_t ; X_t+1 ; X_t+2) rightarrow (Y_t-1)
$$
is correct, if you are fitting on an archive and want to predict a year given the covariates from the next three years.
But only the last model
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1)
$$
is a forecasting model. So in general, if you want to interpolate into the next $k$-th year from now $t$, you should use:
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+k)
$$
or
$$
(X_t-2 ; X_t-1 ; X_t) rightarrow (Y_t+1,...,Y_t+k)
$$
Even a better model that takes advantage of known $Y$'s in the past would be:
$$
(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow (Y_t+k)
$$
where $|$ denotes vector concatenation to produce a 100 + 1 dimensional vector for each known year.
As a personal opinion, for the time-series prediction task, 24 data points per year is very small compared to the dimension of $X$, which is 100. 1200 samples for $X rightarrow Y$ regression (ignoring the time) is more practical; if selecting 10 from 100 covariates is possible even better.
Because of the small data set, I would suggest:
$(Y_t-m ;...; Y_t-1 ; Y_t) rightarrow (Y_t+k)$ for time series prediction, and
$X rightarrow Y$ regression for estimating the relation between X and Y.
Relation to LSTM and RNN
If we use LSTM/RNN to model time-series, they would be stateful. That is, when input $X_t-2$ is fed to an LSTM, it keeps an internal state (hidden state) to be combined with the next input $X_t-1$ and so on. Regarding the input/output dimension, here is an RNN animation from a post on medium by Raimi Karim that shows an arbitrary step among 3 steps of feeding $(X_t-2 ; X_t-1 ; X_t)$ to the network:
As you see, dimension and number of inputs are independent of output. We can feed 5 inputs $X_t-4$ to $X_t$, each 100 dimension (100d) and receive a 1d output by setting the dimension of hidden states to 1d, or setting it to 10d and use an extra dense layer at the end to convert 10d to 1d, or receive a 50d output, or a 150d (three 50d) output, etc.
Word "stateful" in Keras (source)
LSTM and RNN are stateful by definition, this [badly named] variable in Keras means
If stateful=True, the last state for each sample at index i in a batch will be
used as initial state for the sample of index i in the following
batch. Fabien Chollet
For example, if each batch has 24 samples indexed from 0 to 23 (each sample could have the form $(X_t-2, X_t-1, X_t, Y_t+1)$), then the last hidden state $h$ from 8th sample will be used as the initial hidden state for 8th sample in the next batch. Except for special cases that there is a temporal order between batches and their samples, this must be set to False.
edited Mar 26 at 12:41
answered Mar 25 at 17:52
EsmailianEsmailian
2,187218
2,187218
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not usingstateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure whenstateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!
$endgroup$
– Yo B.
Mar 26 at 10:53
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
add a comment |
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not usingstateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure whenstateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!
$endgroup$
– Yo B.
Mar 26 at 10:53
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
1
1
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thank you, this is clear! So the choice for any of those problems would only depend on how I arrange inputs and outputs, right? So the network does not need any more argument to be trained for one problem or the other. If you mind answering that also: does any of this problem needs stateful LSTM? Or should preferentially be addresed with stateful LSTM? I am very confused by this argument, and nothing I have read so far makes it clear wether I should use those or not (and if so, how should I modify the formulas above).
$endgroup$
– Yo B.
Mar 26 at 9:23
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not using
stateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure when stateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!$endgroup$
– Yo B.
Mar 26 at 10:53
$begingroup$
Thanks for the edit and the reference! Just to be clear, currently I am not using
stateful=TRUE
. Does this mean that the network is not building links between $X_t-1$ and $X_t$? If so, what is the interest of feeding the network with this time-batch structure when stateful=FALSE
? I get it's a different question, you don't have to answer. Thanks again!$endgroup$
– Yo B.
Mar 26 at 10:53
1
1
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
Well thanks again for this last edit, you really helped me there! All the best
$endgroup$
– Yo B.
Mar 26 at 11:36
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
$begingroup$
If you don't mind answering this last question: in the predictive case $(X_t-2|Y_t-2 ; X_t-1|Y_t-1 ; X_t|Y_t) rightarrow Y_t+1$, how should I input $X_t+1$? I'm asking that because lots of my covariates are static and can therefore be useful in predicting the process in the years to come.
$endgroup$
– Yo B.
Mar 26 at 15:45
1
1
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
$begingroup$
@YoB. if $t+1$ denotes "next year" we have no access to $X_t+1$ in real-time cases. Btw, since each timestamp should be 101 dimension, you can use $(...X_t|Y_t;X_t+1|0) rightarrow (Y_t+1)$ just using a dummy 0, but I think it may cause under-performance, try it. You can also use the previous year: $(...X_t|Y_t;X_t+1|Y_t) rightarrow (Y_t+1)$.
$endgroup$
– Esmailian
Mar 26 at 15:55
add a comment |
Yo B. is a new contributor. Be nice, and check out our Code of Conduct.
Yo B. is a new contributor. Be nice, and check out our Code of Conduct.
Yo B. is a new contributor. Be nice, and check out our Code of Conduct.
Yo B. is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47961%2fon-the-choice-of-lstm-input-output-dimension-for-a-spatio-temporal-problem%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown