Recursively updating the MLE as new observations stream inSimple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE
How to fade a semiplane defined by line?
Multiplicative persistence
Can a Canadian Travel to the USA twice, less than 180 days each time?
How does a computer interpret real numbers?
Limits and Infinite Integration by Parts
Why can Carol Danvers change her suit colours in the first place?
Can a stoichiometric mixture of oxygen and methane exist as a liquid at standard pressure and some (low) temperature?
Are Captain Marvel's powers affected by Thanos' actions in Infinity War
What is the evidence for the "tyranny of the majority problem" in a direct democracy context?
Recommended PCB layout understanding - ADM2572 datasheet
Redundant comparison & "if" before assignment
What happens if you are holding an Iron Flask with a demon inside and walk into an Antimagic Field?
Is there a way to get `mathscr' with lower case letters in pdfLaTeX?
Has any country ever had 2 former presidents in jail simultaneously?
What is going on with 'gets(stdin)' on the site coderbyte?
Hero deduces identity of a killer
What are the advantages of simplicial model categories over non-simplicial ones?
Can disgust be a key component of horror?
Plot of a tornado-shaped surface
Why would a new[] expression ever invoke a destructor?
Why is the "ls" command showing permissions of files in a FAT32 partition?
The IT department bottlenecks progress. How should I handle this?
How could a planet have erratic days?
putting logo on same line but after title, latex
Recursively updating the MLE as new observations stream in
Simple MLE Question4 cases of Maximum Likelihood Estimation of Gaussian distribution parameterssimulating random samples with a given MLEFor the family of distributions, $f_theta(x) = theta x^theta-1$, what is the sufficient statistic corresponding to the monotone likelihood ratio?Prove that MLE does not depend on the dominating measureDetermining an MLEMLE of $f(xmidtheta) = theta x^theta−1e^−x^thetaI_(0,infty)(x)$Sufficient statistic when $Xsim U(theta,2 theta)$Estimating the MLE where the parameter is also the constraintTrouble with MLE
$begingroup$
General Question
Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?
Toy Example
If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$
maximum-likelihood online
$endgroup$
add a comment |
$begingroup$
General Question
Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?
Toy Example
If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$
maximum-likelihood online
$endgroup$
6
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago
add a comment |
$begingroup$
General Question
Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?
Toy Example
If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$
maximum-likelihood online
$endgroup$
General Question
Say we have iid data $x_1$, $x_2$, ... $sim f(x,|,boldsymboltheta)$ streaming in. We want to recursively compute the maximum likelihood estimate of $boldsymboltheta$. That is, having computed
$$hatboldsymboltheta_n-1=undersetboldsymbolthetainmathbbR^pargmaxprod_i=1^n-1f(x_i,|,boldsymboltheta),$$
we observe a new $x_n$, and wish to somehow incrementally update our estimate
$$hatboldsymboltheta_n-1,,x_n to hatboldsymboltheta_n$$
without having to start from scratch. Are there generic algorithms for this?
Toy Example
If $x_1$, $x_2$, ... $sim N(x,|,mu, 1)$, then
$$hatmu_n-1 = frac1n-1sumlimits_i=1^n-1x_iquadtextandquadhatmu_n = frac1nsumlimits_i=1^nx_i,$$
so
$$hatmu_n=frac1nleft[(n-1)hatmu_n-1 + x_nright].$$
maximum-likelihood online
maximum-likelihood online
edited Mar 18 at 23:11
bamts
asked Mar 18 at 22:07
bamtsbamts
815314
815314
6
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago
add a comment |
6
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago
6
6
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).
If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).
One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.
On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).
$endgroup$
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
add a comment |
$begingroup$
In machine learning, this is referred to as online learning.
As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.
A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398220%2frecursively-updating-the-mle-as-new-observations-stream-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).
If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).
One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.
On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).
$endgroup$
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
add a comment |
$begingroup$
See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).
If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).
One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.
On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).
$endgroup$
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
add a comment |
$begingroup$
See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).
If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).
One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.
On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).
$endgroup$
See the concept of sufficiency and in particular, minimal sufficient statistics. In many cases you need the whole sample to compute the estimate at a given sample size, with no trivial way to update from a sample one size smaller (i.e. there's no convenient general result).
If the distribution is exponential family (and in some other cases besides; the uniform is a neat example) there's a nice sufficient statistic that can in many cases be updated in the manner you seek (i.e. with a number of commonly used distributions there would be a fast update).
One example I'm not aware of any direct way to either calculate or update is the estimate for the location of the Cauchy distribution (e.g. with unit scale, to make the problem a simple one-parameter problem). There may be a faster update, however, that I simply haven't noticed - I can't say I've really done more than glance at it for considering the updating case.
On the other hand, with MLEs that are obtained via numerical optimization methods, the previous estimate would in many cases be a great starting point, since typically the previous estimate would be very close to the updated estimate; in that sense at least, rapid updating should often be possible. Even this isn't the general case, though -- with multimodal likelihood functions (again, see the Cauchy for an example), a new observation might lead to the highest mode being some distance from the previous one (even if the locations of each of the biggest few modes didn't shift much, which one is highest could well change).
edited Mar 18 at 23:25
answered Mar 18 at 22:57
Glen_b♦Glen_b
214k23415765
214k23415765
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
add a comment |
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
1
1
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
$begingroup$
Thanks! The point about the MLE possibly switching modes midstream is particularly helpful for understanding why this would be hard in general.
$endgroup$
– bamts
Mar 19 at 1:54
1
1
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
You can see this for yourself with the above unit-scale Cauchy model and the data (0.1,0.11,0.12,2.91,2.921,2.933). The log-likelihood for the location of the modes are near 0.5 and 2.5, and the (slightly) higher peak is the one near 0.5. Now make the next observation 10 and the mode of each of the two peaks barely moves but the second peak is now substantially higher. Gradient descent won't help you when that happens, it's almost like starting again. If your population is a mixture of two similar-size subgroups with different locations, such circumstances could occur -- . ... ctd
$endgroup$
– Glen_b♦
Mar 19 at 3:31
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
ctd... even in a relatively large sample. In the right situation, mode switching may occur fairly often.
$endgroup$
– Glen_b♦
Mar 19 at 4:22
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
A condition preventing multi-modality is that the likelihood should be log-concave w.r.t. the parameter vector for all $n$. This implies limitations on the model, however.
$endgroup$
– Yves
Mar 19 at 9:45
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
$begingroup$
Yes, correct; I debated with myself over whether to discuss that in the answer.
$endgroup$
– Glen_b♦
Mar 19 at 9:55
add a comment |
$begingroup$
In machine learning, this is referred to as online learning.
As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.
A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.
$endgroup$
add a comment |
$begingroup$
In machine learning, this is referred to as online learning.
As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.
A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.
$endgroup$
add a comment |
$begingroup$
In machine learning, this is referred to as online learning.
As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.
A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.
$endgroup$
In machine learning, this is referred to as online learning.
As @Glen_b pointed out, there are special cases in which the MLE can be updated without needing to access all the previous data. As he also points out, I don't believe there's a generic solution for finding the MLE.
A fairly generic approach for finding the approximate solution is to use something like stochastic gradient descent. In this case, as each observation comes in, we compute the gradient with respect to this individual observation and move the parameter values a very small amount in this direction. Under certain conditions, we can show that this will converge to a neighborhood of the MLE with high probability; the neighborhood is tighter and tighter as we reduce the step size, but more data is required for convergence. However, these stochastic methods in general require much more fiddling to obtain good performance than, say, closed form updates.
answered Mar 19 at 0:10
Cliff ABCliff AB
13.6k12567
13.6k12567
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f398220%2frecursively-updating-the-mle-as-new-observations-stream-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
$begingroup$
Don't forget the inverse of this problem: updating the estimator as old observations are deleted.
$endgroup$
– Hong Ooi
Mar 19 at 0:22
$begingroup$
Recursive least squares (RLS) is a (very famous) solution to one particular instance of this problem, isn't it? Generally, I would believe that stochastic filtering literature might be useful to look into.
$endgroup$
– jhin
2 days ago