Variational AutoEncoder giving negative loss Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsTransform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderUnderstanding autoencoder loss functionTraining the Variational Autoencoder After applying the reparameterization trickTensorflow regression predicting 1 for all inputsAdapting the Keras variational autoencoder for denoising imagesVariational Autoencoder TIme SeriesLatent loss in variational autoencoder drowns generative lossFluctuating accuracy of AutoencoderWhy is my Keras model not learning image segmentation?Transform an Autoencoder to a Variational Autoencoder?

If gravity precedes the formation of a solar system, where did the mass come from that caused the gravity?

Does Prince Arnaud cause someone holding the Princess to lose?

Will the Antimagic Field spell cause elementals not summoned by magic to dissipate?

Can I ask an author to send me his ebook?

What is the evidence that custom checks in Northern Ireland are going to result in violence?

How to produce a PS1 prompt in bash or ksh93 similar to tcsh

Why did Israel vote against lifting the American embargo on Cuba?

Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?

Lights are flickering on and off after accidentally bumping into light switch

Why can't we divide the electromagnetic spectrum into arbitrarily many radio frequency bands?

A journey... into the MIND

What is the definining line between a helicopter and a drone a person can ride in?

Marquee sign letters

Providing direct feedback to a product salesperson

How to leave only the following strings?

Can gravitational waves pass through a black hole?

What is the ongoing value of the Kanban board to the developers as opposed to management

Why these surprising proportionalities of integrals involving odd zeta values?

Can the van der Waals coefficients be negative in the van der Waals equation for real gases?

How was Lagrange appointed professor of mathematics so early?

Is it OK if I do not take the receipt in Germany?

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Sorting the characters in a utf-16 string in java

What's the difference between using dependency injection with a container and using a service locator?



Variational AutoEncoder giving negative loss



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsTransform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderUnderstanding autoencoder loss functionTraining the Variational Autoencoder After applying the reparameterization trickTensorflow regression predicting 1 for all inputsAdapting the Keras variational autoencoder for denoising imagesVariational Autoencoder TIme SeriesLatent loss in variational autoencoder drowns generative lossFluctuating accuracy of AutoencoderWhy is my Keras model not learning image segmentation?Transform an Autoencoder to a Variational Autoencoder?










1












$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



 Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question











$endgroup$







  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    Apr 5 at 16:43











  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    Apr 5 at 18:15






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    Apr 5 at 18:18







  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    Apr 5 at 18:24










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    Apr 5 at 18:43















1












$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



 Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question











$endgroup$







  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    Apr 5 at 16:43











  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    Apr 5 at 18:15






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    Apr 5 at 18:18







  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    Apr 5 at 18:24










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    Apr 5 at 18:43













1












1








1





$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



 Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question











$endgroup$




I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



 Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)






python keras tensorflow loss-function autoencoder






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 5 at 18:42







Jed

















asked Apr 5 at 14:25









JedJed

63




63







  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    Apr 5 at 16:43











  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    Apr 5 at 18:15






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    Apr 5 at 18:18







  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    Apr 5 at 18:24










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    Apr 5 at 18:43












  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    Apr 5 at 16:43











  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    Apr 5 at 18:15






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    Apr 5 at 18:18







  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    Apr 5 at 18:24










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    Apr 5 at 18:43







2




2




$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
Apr 5 at 16:43





$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
Apr 5 at 16:43













$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15




$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15




2




2




$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
Apr 5 at 18:18





$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
Apr 5 at 18:18





2




2




$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
Apr 5 at 18:24




$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
Apr 5 at 18:24












$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43




$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High