Variational auto-encoders (VAE): why the random sample?Transform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderTraining the Variational Autoencoder After applying the reparameterization trickWhy random sample from replay for DQN?Adapting the Keras variational autoencoder for denoising imagesHow were auto encoders used to intialize deep neural networks?What loss function should I use for auto encoders for textLatent loss in variational autoencoder drowns generative lossQuantifying feature importances using Auto-encodersWhat mu and sigma vector really mean in VAE?normalization in auto-encoder

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

What can I do if I am asked to learn different programming languages very frequently?

What does "Four-F." mean?

Print last inputted byte

Is it true that good novels will automatically sell themselves on Amazon (and so on) and there is no need for one to waste time promoting?

Would it be believable to defy demographics in a story?

Brake pads destroying wheels

How to generate binary array whose elements with values 1 are randomly drawn

Is it possible to stack the damage done by the Absorb Elements spell?

How could an airship be repaired midflight?

Why is there so much iron?

What (if any) is the reason to buy in small local stores?

Loading the leaflet Map in Lightning Web Component

Is it insecure to send a password in a `curl` command?

Do native speakers use "ultima" and "proxima" frequently in spoken English?

Right piano pedal is bright

In what cases must I use 了 and in what cases not?

Why didn't Héctor fade away after this character died in the movie Coco?

What are substitutions for coconut in curry?

Light propagating through a sound wave

Hausdorff dimension of the boundary of fibres of Lipschitz maps

Is honey really a supersaturated solution? Does heating to un-crystalize redissolve it or melt it?

Variable completely messes up echoed string

Violin - Can double stops be played when the strings are not next to each other?



Variational auto-encoders (VAE): why the random sample?


Transform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderTraining the Variational Autoencoder After applying the reparameterization trickWhy random sample from replay for DQN?Adapting the Keras variational autoencoder for denoising imagesHow were auto encoders used to intialize deep neural networks?What loss function should I use for auto encoders for textLatent loss in variational autoencoder drowns generative lossQuantifying feature importances using Auto-encodersWhat mu and sigma vector really mean in VAE?normalization in auto-encoder













1












$begingroup$


Why do people train variational auto-encoders (VAE) to encode means and variances (regularised towards 0 and 1), and then sample a random Gaussian, rather that simply encode latent vectors and regularise them to follow a standard N(0,I), which would appear as a more natural choice?










share|improve this question









$endgroup$
















    1












    $begingroup$


    Why do people train variational auto-encoders (VAE) to encode means and variances (regularised towards 0 and 1), and then sample a random Gaussian, rather that simply encode latent vectors and regularise them to follow a standard N(0,I), which would appear as a more natural choice?










    share|improve this question









    $endgroup$














      1












      1








      1


      1



      $begingroup$


      Why do people train variational auto-encoders (VAE) to encode means and variances (regularised towards 0 and 1), and then sample a random Gaussian, rather that simply encode latent vectors and regularise them to follow a standard N(0,I), which would appear as a more natural choice?










      share|improve this question









      $endgroup$




      Why do people train variational auto-encoders (VAE) to encode means and variances (regularised towards 0 and 1), and then sample a random Gaussian, rather that simply encode latent vectors and regularise them to follow a standard N(0,I), which would appear as a more natural choice?







      deep-learning autoencoder






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 days ago









      Antoine SavineAntoine Savine

      1162




      1162




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          To have a common mental image of AE and VAE please take a look at this answer first.



          Lets go through this "why not?" thought process step by step:



          1. Why not deterministic? lets directly encode the latent vector $z$ inside a layer of neural network. But this way, $z$ would be deterministic, meaning a fix input $x$ always produces a fix latent vector $z$, thus, $z$ would not have distribution $q(z|x)$. This is the ordinary auto-encoder. Sampling $z$ from $q(z|x)$ means if we try the same input $x$ twice, we should get two different values for $z$. Deterministic computations (layers) cannot achieve this. Therefore, we need to inject a random element in the calculation of $z$, otherwise the same $x$ always gives the same $z$. Note that getting a different $z$ for a different training point $x$ implies the existence of distribution $p(z)$, which exists for ordinary auto-encoder too, not the existence of conditional distribution $q(z|x)$,


          2. Why not only mean? OK, lets add these random elements to a layer to get our beloved probabilistic $z$, lets call this layer $mu$. So, suppose $mu$ is calculated from previous layer $y$ as $mu=mboxtanh(y)$, then we select the random elements $epsilon_d sim N(0,1)$ per dimension of $mu$, and then calculate $z= mu + epsilon$ which is the same as sampling $z_d sim N(mu_d, 1)$, now our $z$ is probabilistic and we have got rid of standard deviation $sigma$ and its regularization all together, we now only need to regularize mean $mu$ to $0$,


          3. Why not only random elements? Lets go one step further, lets throw $mu$ away too and set $z_d = epsilon_d sim N(0,1)$, and get rid of all the regularizations. But wait a minute! now $z$ is completely disconnected from previous layers, so no information is delivered from input $x$ to latent variable $z$.


          Going from VAE with two parameters $(mu, sigma)$ to (2) with one parameter $mu$ and then (3) with zero parameter is the same as saying instead of using parameter $w$ and regularizing it via $parallel w parallel^2$, lets set $w=0$ and get rid of the regularization. Even though we want parameters to be close to zero but still we want them not to be zero and carry information. This is the same as $(mu, sigma)$ in VAE, we want them to carry information, although being closer to $(0, 1)$ is favorable. $(mu, sigma)$ are two channels through which information is distilled from $x$ to $z$. We may even add a third variable (channel) to our distribution if complexity-performance trade-off is favorable. It is safe to say that researchers have experimented with only $mu$ and observed that adding another channel for variance would be beneficial, so here we are with two channels $(mu, sigma)$ and a crucial random vector $epsilon$.



          Why would VAE work better than AE?



          The main difference that VAE introduces compared to AE is that now $z$ is loosely (probabilistically) related to $x$ compared to AE, this creates an additional regularization effect that throws some information of $x$ out by not obeying the exact network computations for $z$, thus this extra regularization could improve the performance in practice. Of course, this is not a complete dominance. In some tasks this extra regularization may work against VAE.



          In my opinion, the why's could be answered this far as they reach a similar level to "Full Bayesian vs regularized least squares regression".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
            $endgroup$
            – Antoine Savine
            yesterday










          • $begingroup$
            Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
            $endgroup$
            – Antoine Savine
            yesterday











          • $begingroup$
            @AntoineSavine I added updates for the "why"
            $endgroup$
            – Esmailian
            yesterday










          • $begingroup$
            Thank you @esmailian
            $endgroup$
            – Antoine Savine
            yesterday











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47385%2fvariational-auto-encoders-vae-why-the-random-sample%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          To have a common mental image of AE and VAE please take a look at this answer first.



          Lets go through this "why not?" thought process step by step:



          1. Why not deterministic? lets directly encode the latent vector $z$ inside a layer of neural network. But this way, $z$ would be deterministic, meaning a fix input $x$ always produces a fix latent vector $z$, thus, $z$ would not have distribution $q(z|x)$. This is the ordinary auto-encoder. Sampling $z$ from $q(z|x)$ means if we try the same input $x$ twice, we should get two different values for $z$. Deterministic computations (layers) cannot achieve this. Therefore, we need to inject a random element in the calculation of $z$, otherwise the same $x$ always gives the same $z$. Note that getting a different $z$ for a different training point $x$ implies the existence of distribution $p(z)$, which exists for ordinary auto-encoder too, not the existence of conditional distribution $q(z|x)$,


          2. Why not only mean? OK, lets add these random elements to a layer to get our beloved probabilistic $z$, lets call this layer $mu$. So, suppose $mu$ is calculated from previous layer $y$ as $mu=mboxtanh(y)$, then we select the random elements $epsilon_d sim N(0,1)$ per dimension of $mu$, and then calculate $z= mu + epsilon$ which is the same as sampling $z_d sim N(mu_d, 1)$, now our $z$ is probabilistic and we have got rid of standard deviation $sigma$ and its regularization all together, we now only need to regularize mean $mu$ to $0$,


          3. Why not only random elements? Lets go one step further, lets throw $mu$ away too and set $z_d = epsilon_d sim N(0,1)$, and get rid of all the regularizations. But wait a minute! now $z$ is completely disconnected from previous layers, so no information is delivered from input $x$ to latent variable $z$.


          Going from VAE with two parameters $(mu, sigma)$ to (2) with one parameter $mu$ and then (3) with zero parameter is the same as saying instead of using parameter $w$ and regularizing it via $parallel w parallel^2$, lets set $w=0$ and get rid of the regularization. Even though we want parameters to be close to zero but still we want them not to be zero and carry information. This is the same as $(mu, sigma)$ in VAE, we want them to carry information, although being closer to $(0, 1)$ is favorable. $(mu, sigma)$ are two channels through which information is distilled from $x$ to $z$. We may even add a third variable (channel) to our distribution if complexity-performance trade-off is favorable. It is safe to say that researchers have experimented with only $mu$ and observed that adding another channel for variance would be beneficial, so here we are with two channels $(mu, sigma)$ and a crucial random vector $epsilon$.



          Why would VAE work better than AE?



          The main difference that VAE introduces compared to AE is that now $z$ is loosely (probabilistically) related to $x$ compared to AE, this creates an additional regularization effect that throws some information of $x$ out by not obeying the exact network computations for $z$, thus this extra regularization could improve the performance in practice. Of course, this is not a complete dominance. In some tasks this extra regularization may work against VAE.



          In my opinion, the why's could be answered this far as they reach a similar level to "Full Bayesian vs regularized least squares regression".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
            $endgroup$
            – Antoine Savine
            yesterday










          • $begingroup$
            Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
            $endgroup$
            – Antoine Savine
            yesterday











          • $begingroup$
            @AntoineSavine I added updates for the "why"
            $endgroup$
            – Esmailian
            yesterday










          • $begingroup$
            Thank you @esmailian
            $endgroup$
            – Antoine Savine
            yesterday
















          2












          $begingroup$

          To have a common mental image of AE and VAE please take a look at this answer first.



          Lets go through this "why not?" thought process step by step:



          1. Why not deterministic? lets directly encode the latent vector $z$ inside a layer of neural network. But this way, $z$ would be deterministic, meaning a fix input $x$ always produces a fix latent vector $z$, thus, $z$ would not have distribution $q(z|x)$. This is the ordinary auto-encoder. Sampling $z$ from $q(z|x)$ means if we try the same input $x$ twice, we should get two different values for $z$. Deterministic computations (layers) cannot achieve this. Therefore, we need to inject a random element in the calculation of $z$, otherwise the same $x$ always gives the same $z$. Note that getting a different $z$ for a different training point $x$ implies the existence of distribution $p(z)$, which exists for ordinary auto-encoder too, not the existence of conditional distribution $q(z|x)$,


          2. Why not only mean? OK, lets add these random elements to a layer to get our beloved probabilistic $z$, lets call this layer $mu$. So, suppose $mu$ is calculated from previous layer $y$ as $mu=mboxtanh(y)$, then we select the random elements $epsilon_d sim N(0,1)$ per dimension of $mu$, and then calculate $z= mu + epsilon$ which is the same as sampling $z_d sim N(mu_d, 1)$, now our $z$ is probabilistic and we have got rid of standard deviation $sigma$ and its regularization all together, we now only need to regularize mean $mu$ to $0$,


          3. Why not only random elements? Lets go one step further, lets throw $mu$ away too and set $z_d = epsilon_d sim N(0,1)$, and get rid of all the regularizations. But wait a minute! now $z$ is completely disconnected from previous layers, so no information is delivered from input $x$ to latent variable $z$.


          Going from VAE with two parameters $(mu, sigma)$ to (2) with one parameter $mu$ and then (3) with zero parameter is the same as saying instead of using parameter $w$ and regularizing it via $parallel w parallel^2$, lets set $w=0$ and get rid of the regularization. Even though we want parameters to be close to zero but still we want them not to be zero and carry information. This is the same as $(mu, sigma)$ in VAE, we want them to carry information, although being closer to $(0, 1)$ is favorable. $(mu, sigma)$ are two channels through which information is distilled from $x$ to $z$. We may even add a third variable (channel) to our distribution if complexity-performance trade-off is favorable. It is safe to say that researchers have experimented with only $mu$ and observed that adding another channel for variance would be beneficial, so here we are with two channels $(mu, sigma)$ and a crucial random vector $epsilon$.



          Why would VAE work better than AE?



          The main difference that VAE introduces compared to AE is that now $z$ is loosely (probabilistically) related to $x$ compared to AE, this creates an additional regularization effect that throws some information of $x$ out by not obeying the exact network computations for $z$, thus this extra regularization could improve the performance in practice. Of course, this is not a complete dominance. In some tasks this extra regularization may work against VAE.



          In my opinion, the why's could be answered this far as they reach a similar level to "Full Bayesian vs regularized least squares regression".






          share|improve this answer











          $endgroup$












          • $begingroup$
            Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
            $endgroup$
            – Antoine Savine
            yesterday










          • $begingroup$
            Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
            $endgroup$
            – Antoine Savine
            yesterday











          • $begingroup$
            @AntoineSavine I added updates for the "why"
            $endgroup$
            – Esmailian
            yesterday










          • $begingroup$
            Thank you @esmailian
            $endgroup$
            – Antoine Savine
            yesterday














          2












          2








          2





          $begingroup$

          To have a common mental image of AE and VAE please take a look at this answer first.



          Lets go through this "why not?" thought process step by step:



          1. Why not deterministic? lets directly encode the latent vector $z$ inside a layer of neural network. But this way, $z$ would be deterministic, meaning a fix input $x$ always produces a fix latent vector $z$, thus, $z$ would not have distribution $q(z|x)$. This is the ordinary auto-encoder. Sampling $z$ from $q(z|x)$ means if we try the same input $x$ twice, we should get two different values for $z$. Deterministic computations (layers) cannot achieve this. Therefore, we need to inject a random element in the calculation of $z$, otherwise the same $x$ always gives the same $z$. Note that getting a different $z$ for a different training point $x$ implies the existence of distribution $p(z)$, which exists for ordinary auto-encoder too, not the existence of conditional distribution $q(z|x)$,


          2. Why not only mean? OK, lets add these random elements to a layer to get our beloved probabilistic $z$, lets call this layer $mu$. So, suppose $mu$ is calculated from previous layer $y$ as $mu=mboxtanh(y)$, then we select the random elements $epsilon_d sim N(0,1)$ per dimension of $mu$, and then calculate $z= mu + epsilon$ which is the same as sampling $z_d sim N(mu_d, 1)$, now our $z$ is probabilistic and we have got rid of standard deviation $sigma$ and its regularization all together, we now only need to regularize mean $mu$ to $0$,


          3. Why not only random elements? Lets go one step further, lets throw $mu$ away too and set $z_d = epsilon_d sim N(0,1)$, and get rid of all the regularizations. But wait a minute! now $z$ is completely disconnected from previous layers, so no information is delivered from input $x$ to latent variable $z$.


          Going from VAE with two parameters $(mu, sigma)$ to (2) with one parameter $mu$ and then (3) with zero parameter is the same as saying instead of using parameter $w$ and regularizing it via $parallel w parallel^2$, lets set $w=0$ and get rid of the regularization. Even though we want parameters to be close to zero but still we want them not to be zero and carry information. This is the same as $(mu, sigma)$ in VAE, we want them to carry information, although being closer to $(0, 1)$ is favorable. $(mu, sigma)$ are two channels through which information is distilled from $x$ to $z$. We may even add a third variable (channel) to our distribution if complexity-performance trade-off is favorable. It is safe to say that researchers have experimented with only $mu$ and observed that adding another channel for variance would be beneficial, so here we are with two channels $(mu, sigma)$ and a crucial random vector $epsilon$.



          Why would VAE work better than AE?



          The main difference that VAE introduces compared to AE is that now $z$ is loosely (probabilistically) related to $x$ compared to AE, this creates an additional regularization effect that throws some information of $x$ out by not obeying the exact network computations for $z$, thus this extra regularization could improve the performance in practice. Of course, this is not a complete dominance. In some tasks this extra regularization may work against VAE.



          In my opinion, the why's could be answered this far as they reach a similar level to "Full Bayesian vs regularized least squares regression".






          share|improve this answer











          $endgroup$



          To have a common mental image of AE and VAE please take a look at this answer first.



          Lets go through this "why not?" thought process step by step:



          1. Why not deterministic? lets directly encode the latent vector $z$ inside a layer of neural network. But this way, $z$ would be deterministic, meaning a fix input $x$ always produces a fix latent vector $z$, thus, $z$ would not have distribution $q(z|x)$. This is the ordinary auto-encoder. Sampling $z$ from $q(z|x)$ means if we try the same input $x$ twice, we should get two different values for $z$. Deterministic computations (layers) cannot achieve this. Therefore, we need to inject a random element in the calculation of $z$, otherwise the same $x$ always gives the same $z$. Note that getting a different $z$ for a different training point $x$ implies the existence of distribution $p(z)$, which exists for ordinary auto-encoder too, not the existence of conditional distribution $q(z|x)$,


          2. Why not only mean? OK, lets add these random elements to a layer to get our beloved probabilistic $z$, lets call this layer $mu$. So, suppose $mu$ is calculated from previous layer $y$ as $mu=mboxtanh(y)$, then we select the random elements $epsilon_d sim N(0,1)$ per dimension of $mu$, and then calculate $z= mu + epsilon$ which is the same as sampling $z_d sim N(mu_d, 1)$, now our $z$ is probabilistic and we have got rid of standard deviation $sigma$ and its regularization all together, we now only need to regularize mean $mu$ to $0$,


          3. Why not only random elements? Lets go one step further, lets throw $mu$ away too and set $z_d = epsilon_d sim N(0,1)$, and get rid of all the regularizations. But wait a minute! now $z$ is completely disconnected from previous layers, so no information is delivered from input $x$ to latent variable $z$.


          Going from VAE with two parameters $(mu, sigma)$ to (2) with one parameter $mu$ and then (3) with zero parameter is the same as saying instead of using parameter $w$ and regularizing it via $parallel w parallel^2$, lets set $w=0$ and get rid of the regularization. Even though we want parameters to be close to zero but still we want them not to be zero and carry information. This is the same as $(mu, sigma)$ in VAE, we want them to carry information, although being closer to $(0, 1)$ is favorable. $(mu, sigma)$ are two channels through which information is distilled from $x$ to $z$. We may even add a third variable (channel) to our distribution if complexity-performance trade-off is favorable. It is safe to say that researchers have experimented with only $mu$ and observed that adding another channel for variance would be beneficial, so here we are with two channels $(mu, sigma)$ and a crucial random vector $epsilon$.



          Why would VAE work better than AE?



          The main difference that VAE introduces compared to AE is that now $z$ is loosely (probabilistically) related to $x$ compared to AE, this creates an additional regularization effect that throws some information of $x$ out by not obeying the exact network computations for $z$, thus this extra regularization could improve the performance in practice. Of course, this is not a complete dominance. In some tasks this extra regularization may work against VAE.



          In my opinion, the why's could be answered this far as they reach a similar level to "Full Bayesian vs regularized least squares regression".







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered 2 days ago









          EsmailianEsmailian

          1,346113




          1,346113











          • $begingroup$
            Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
            $endgroup$
            – Antoine Savine
            yesterday










          • $begingroup$
            Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
            $endgroup$
            – Antoine Savine
            yesterday











          • $begingroup$
            @AntoineSavine I added updates for the "why"
            $endgroup$
            – Esmailian
            yesterday










          • $begingroup$
            Thank you @esmailian
            $endgroup$
            – Antoine Savine
            yesterday

















          • $begingroup$
            Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
            $endgroup$
            – Antoine Savine
            yesterday










          • $begingroup$
            Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
            $endgroup$
            – Antoine Savine
            yesterday











          • $begingroup$
            @AntoineSavine I added updates for the "why"
            $endgroup$
            – Esmailian
            yesterday










          • $begingroup$
            Thank you @esmailian
            $endgroup$
            – Antoine Savine
            yesterday
















          $begingroup$
          Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
          $endgroup$
          – Antoine Savine
          yesterday




          $begingroup$
          Thank you but I guess my question is really this: why do we need z to be random given x? If z is a deterministic function f of x, and f is such that q(z) = q[f(x)] is a N(0,1), then we have achieved our goal, i.e. we can sample z in N(0,1) and decode it as x=(f-1)(z). So it would seem natural to make a standard auto-encoder with the additional condition that the empirical distribution of z over the training set is as close as possible to a N(0,1). Instead, VAE make z a stochastic function of x and I guess what I don't understand is why this should work better.
          $endgroup$
          – Antoine Savine
          yesterday












          $begingroup$
          Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
          $endgroup$
          – Antoine Savine
          yesterday





          $begingroup$
          Maybe it was just shown to work better in practice but I don't get the theoretical point behind it, if any?
          $endgroup$
          – Antoine Savine
          yesterday













          $begingroup$
          @AntoineSavine I added updates for the "why"
          $endgroup$
          – Esmailian
          yesterday




          $begingroup$
          @AntoineSavine I added updates for the "why"
          $endgroup$
          – Esmailian
          yesterday












          $begingroup$
          Thank you @esmailian
          $endgroup$
          – Antoine Savine
          yesterday





          $begingroup$
          Thank you @esmailian
          $endgroup$
          – Antoine Savine
          yesterday


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47385%2fvariational-auto-encoders-vae-why-the-random-sample%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High