In Bayesian inference, why are some terms dropped from the posterior predictive? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression

How would a mousetrap for use in space work?

Most bit efficient text communication method?

"Lost his faith in humanity in the trenches of Verdun" — last line of an SF story

Sum letters are not two different

What would you call this weird metallic apparatus that allows you to lift people?

AppleTVs create a chatty alternate WiFi network

Why limits give us the exact value of the slope of the tangent line?

How could we fake a moon landing now?

Why weren't discrete x86 CPUs ever used in game hardware?

Is it possible to give , in economics, an example of a relation ( set of ordered pairs) that is not a function?

A term for a woman complaining about things/begging in a cute/childish way

Converted a Scalar function to a TVF function for parallel execution-Still running in Serial mode

Project Euler #1 in C++

What does the distribution of bootstrapped values in this Cullen and Frey Graph tell me?

preposition before coffee

Is there public access to the Meteor Crater in Arizona?

How were pictures turned from film to a big picture in a picture frame before digital scanning?

Did Mueller's report provide an evidentiary basis for the claim of Russian govt election interference via social media?

What does it mean that physics no longer uses mechanical models to describe phenomena?

QGIS virtual layer functionality does not seem to support memory layers

Has negative voting ever been officially implemented in elections, or seriously proposed, or even studied?

How to plot logistic regression decision boundary?

Is it possible for SQL statements to execute concurrently within a single session in SQL Server?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics



In Bayesian inference, why are some terms dropped from the posterior predictive?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








11












$begingroup$


In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$



where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$



Question: Why does the dependence on $D$ in term $star$ disappear?




For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$



where again, since $D = x_t, r_t$, I would have expected



$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$










share|cite|improve this question









$endgroup$


















    11












    $begingroup$


    In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



    $$
    p(x mid D) = int p(x mid theta) p(theta mid D) d theta
    $$



    where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



    $$
    beginalign
    p(a) &= int p(a mid c) p(c) dc
    \
    p(a mid b) &= int p(a mid c, b) p(c mid b) dc
    \
    &downarrow
    \
    p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
    endalign
    $$



    Question: Why does the dependence on $D$ in term $star$ disappear?




    For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



    $$
    p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
    $$



    where again, since $D = x_t, r_t$, I would have expected



    $$
    p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
    $$










    share|cite|improve this question









    $endgroup$














      11












      11








      11


      1



      $begingroup$


      In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



      $$
      p(x mid D) = int p(x mid theta) p(theta mid D) d theta
      $$



      where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



      $$
      beginalign
      p(a) &= int p(a mid c) p(c) dc
      \
      p(a mid b) &= int p(a mid c, b) p(c mid b) dc
      \
      &downarrow
      \
      p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
      endalign
      $$



      Question: Why does the dependence on $D$ in term $star$ disappear?




      For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



      $$
      p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
      $$



      where again, since $D = x_t, r_t$, I would have expected



      $$
      p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
      $$










      share|cite|improve this question









      $endgroup$




      In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



      $$
      p(x mid D) = int p(x mid theta) p(theta mid D) d theta
      $$



      where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



      $$
      beginalign
      p(a) &= int p(a mid c) p(c) dc
      \
      p(a mid b) &= int p(a mid c, b) p(c mid b) dc
      \
      &downarrow
      \
      p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
      endalign
      $$



      Question: Why does the dependence on $D$ in term $star$ disappear?




      For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



      $$
      p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
      $$



      where again, since $D = x_t, r_t$, I would have expected



      $$
      p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
      $$







      bayesian predictive-models inference posterior






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Apr 2 at 16:04









      gwggwg

      224314




      224314




















          2 Answers
          2






          active

          oldest

          votes


















          13












          $begingroup$

          This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



          In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






          share|cite|improve this answer











          $endgroup$




















            9












            $begingroup$

            It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






            share|cite|improve this answer











            $endgroup$













              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              13












              $begingroup$

              This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



              In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






              share|cite|improve this answer











              $endgroup$

















                13












                $begingroup$

                This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






                share|cite|improve this answer











                $endgroup$















                  13












                  13








                  13





                  $begingroup$

                  This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                  In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






                  share|cite|improve this answer











                  $endgroup$



                  This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                  In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Apr 2 at 17:27

























                  answered Apr 2 at 16:26









                  Ruben van BergenRuben van Bergen

                  4,1941925




                  4,1941925























                      9












                      $begingroup$

                      It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                      share|cite|improve this answer











                      $endgroup$

















                        9












                        $begingroup$

                        It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                        share|cite|improve this answer











                        $endgroup$















                          9












                          9








                          9





                          $begingroup$

                          It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                          share|cite|improve this answer











                          $endgroup$



                          It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.







                          share|cite|improve this answer














                          share|cite|improve this answer



                          share|cite|improve this answer








                          edited Apr 2 at 16:55

























                          answered Apr 2 at 16:26









                          JP TrawinskiJP Trawinski

                          603310




                          603310



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

                              Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

                              Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?