In Bayesian inference, why are some terms dropped from the posterior predictive? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression
How would a mousetrap for use in space work?
Most bit efficient text communication method?
"Lost his faith in humanity in the trenches of Verdun" — last line of an SF story
Sum letters are not two different
What would you call this weird metallic apparatus that allows you to lift people?
AppleTVs create a chatty alternate WiFi network
Why limits give us the exact value of the slope of the tangent line?
How could we fake a moon landing now?
Why weren't discrete x86 CPUs ever used in game hardware?
Is it possible to give , in economics, an example of a relation ( set of ordered pairs) that is not a function?
A term for a woman complaining about things/begging in a cute/childish way
Converted a Scalar function to a TVF function for parallel execution-Still running in Serial mode
Project Euler #1 in C++
What does the distribution of bootstrapped values in this Cullen and Frey Graph tell me?
preposition before coffee
Is there public access to the Meteor Crater in Arizona?
How were pictures turned from film to a big picture in a picture frame before digital scanning?
Did Mueller's report provide an evidentiary basis for the claim of Russian govt election interference via social media?
What does it mean that physics no longer uses mechanical models to describe phenomena?
QGIS virtual layer functionality does not seem to support memory layers
Has negative voting ever been officially implemented in elections, or seriously proposed, or even studied?
How to plot logistic regression decision boundary?
Is it possible for SQL statements to execute concurrently within a single session in SQL Server?
How can I prevent/balance waiting and turtling as a response to cooldown mechanics
In Bayesian inference, why are some terms dropped from the posterior predictive?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$
where again, since $D = x_t, r_t$, I would have expected
$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
add a comment |
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$
where again, since $D = x_t, r_t$, I would have expected
$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
add a comment |
$begingroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$
where again, since $D = x_t, r_t$, I would have expected
$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$
bayesian predictive-models inference posterior
$endgroup$
In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is
$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$
where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:
$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$
Question: Why does the dependence on $D$ in term $star$ disappear?
For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as
$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$
where again, since $D = x_t, r_t$, I would have expected
$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$
bayesian predictive-models inference posterior
bayesian predictive-models inference posterior
asked Apr 2 at 16:04
gwggwg
224314
224314
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
add a comment |
$begingroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
$endgroup$
This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.
In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).
edited Apr 2 at 17:27
answered Apr 2 at 16:26
Ruben van BergenRuben van Bergen
4,1941925
4,1941925
add a comment |
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
add a comment |
$begingroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
$endgroup$
It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.
edited Apr 2 at 16:55
answered Apr 2 at 16:26
JP TrawinskiJP Trawinski
603310
603310
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown