Variational AutoEncoder giving negative loss Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsTransform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderUnderstanding autoencoder loss functionTraining the Variational Autoencoder After applying the reparameterization trickTensorflow regression predicting 1 for all inputsAdapting the Keras variational autoencoder for denoising imagesVariational Autoencoder TIme SeriesLatent loss in variational autoencoder drowns generative lossFluctuating accuracy of AutoencoderWhy is my Keras model not learning image segmentation?Transform an Autoencoder to a Variational Autoencoder?

Multi tool use
If gravity precedes the formation of a solar system, where did the mass come from that caused the gravity?
Does Prince Arnaud cause someone holding the Princess to lose?
Will the Antimagic Field spell cause elementals not summoned by magic to dissipate?
Can I ask an author to send me his ebook?
What is the evidence that custom checks in Northern Ireland are going to result in violence?
How to produce a PS1 prompt in bash or ksh93 similar to tcsh
Why did Israel vote against lifting the American embargo on Cuba?
Is "ein Herz wie das meine" an antiquated or colloquial use of the possesive pronoun?
Lights are flickering on and off after accidentally bumping into light switch
Why can't we divide the electromagnetic spectrum into arbitrarily many radio frequency bands?
A journey... into the MIND
What is the definining line between a helicopter and a drone a person can ride in?
Marquee sign letters
Providing direct feedback to a product salesperson
How to leave only the following strings?
Can gravitational waves pass through a black hole?
What is the ongoing value of the Kanban board to the developers as opposed to management
Why these surprising proportionalities of integrals involving odd zeta values?
Can the van der Waals coefficients be negative in the van der Waals equation for real gases?
How was Lagrange appointed professor of mathematics so early?
Is it OK if I do not take the receipt in Germany?
How is an IPA symbol that lacks a name (e.g. ɲ) called?
Sorting the characters in a utf-16 string in java
What's the difference between using dependency injection with a container and using a service locator?
Variational AutoEncoder giving negative loss
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsTransform an Autoencoder to a Variational Autoencoder?Behavioral Differences between Standard Autoencoder and Variational AutoencoderUnderstanding autoencoder loss functionTraining the Variational Autoencoder After applying the reparameterization trickTensorflow regression predicting 1 for all inputsAdapting the Keras variational autoencoder for denoising imagesVariational Autoencoder TIme SeriesLatent loss in variational autoencoder drowns generative lossFluctuating accuracy of AutoencoderWhy is my Keras model not learning image segmentation?Transform an Autoencoder to a Variational Autoencoder?
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
$endgroup$
|
show 2 more comments
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
$endgroup$
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
Apr 5 at 18:18
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43
|
show 2 more comments
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
$endgroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
python keras tensorflow loss-function autoencoder
edited Apr 5 at 18:42
Jed
asked Apr 5 at 14:25
JedJed
63
63
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
Apr 5 at 18:18
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43
|
show 2 more comments
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
Apr 5 at 18:18
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43
2
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
2
2
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why -0.5
works for them.$endgroup$
– Esmailian
Apr 5 at 18:18
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why -0.5
works for them.$endgroup$
– Esmailian
Apr 5 at 18:18
2
2
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead of sigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead of sigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43
|
show 2 more comments
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
HrTR5Xc3wG mTFrHHvh,rVgVa5Hmug,Lel,qrHzgmdmX,HU84p3J ApYg7Kjhu3iBbVRBSjBeJ7 tRRBxoLam72aVMBi,ZfuJ1,bK
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
Apr 5 at 16:43
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
Apr 5 at 18:15
2
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why-0.5
works for them.$endgroup$
– Esmailian
Apr 5 at 18:18
2
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
Apr 5 at 18:24
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
Apr 5 at 18:43