Why is my generator loss function increasing with iterations? The Next CEO of Stack Overflow2019 Community Moderator ElectionKelly Criterion in xgboost loss functionLoss given Activation Function and Probability ModelLoss function in GANMaximize profit with loss functionGenerator loss not decreasing- text to image synthesisGAN - why doesn't the generator nullify the noise input?Precision recall loss functionUsing a custom R generator function with fit_generator (Keras, R)how to implement infoGAN's loss function in Keras' functional APIWhy do most GAN (Generative Adversarial Network) implementations have symmetric discriminator and generator architectures?

What does "Its cash flow is deeply negative" mean?

Is it possible to replace duplicates of a character with one character using tr

No sign flipping while figuring out the emf of voltaic cell?

Is there a way to bypass a component in series in a circuit if that component fails?

Is micro rebar a better way to reinforce concrete than rebar?

Would this house-rule that treats advantage as a +1 to the roll instead (and disadvantage as -1) and allows them to stack be balanced?

Is it ever safe to open a suspicious HTML file (e.g. email attachment)?

Inappropriate reference requests from Journal reviewers

If a black hole is created from light, can this black hole then move at the speed of light?

How to scale a tikZ image which is within a figure environment

Why does standard notation not preserve intervals (visually)

Why is information "lost" when it got into a black hole?

What happened in Rome, when the western empire "fell"?

How to get the end in algorithm2e

What connection does MS Office have to Netscape Navigator?

Measuring resistivity of dielectric liquid

Plot of histogram similar to output from @risk

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Is there a difference between "Fahrstuhl" and "Aufzug"

A Man With a Stainless Steel Endoskeleton (like The Terminator) Fighting Cloaked Aliens Only He Can See

is it ok to reduce charging current for li ion 18650 battery?

Why don't programming languages automatically manage the synchronous/asynchronous problem?

Can MTA send mail via a relay without being told so?

Does soap repel water?



Why is my generator loss function increasing with iterations?



The Next CEO of Stack Overflow
2019 Community Moderator ElectionKelly Criterion in xgboost loss functionLoss given Activation Function and Probability ModelLoss function in GANMaximize profit with loss functionGenerator loss not decreasing- text to image synthesisGAN - why doesn't the generator nullify the noise input?Precision recall loss functionUsing a custom R generator function with fit_generator (Keras, R)how to implement infoGAN's loss function in Keras' functional APIWhy do most GAN (Generative Adversarial Network) implementations have symmetric discriminator and generator architectures?










1












$begingroup$


I'm trying to train a DC-GAN on CIFAR-10 Dataset. I'm using Binary Cross Entropy as my loss function for both discriminator and generator (appended with non-trainable discriminator). If I train using Adam optimizer, the GAN is training fine. But if I replace the optimizer by SGD, the training is going haywire. The generator accuracy starts at some higher point and with iterations, it goes to 0 and stays there. The discriminator accuracy starts at some lower point and reaches somewhere around 0.5 (expected, right?). The peculiar thing is the generator loss function is increasing with iterations. I though may be the step is too high. I tried changing the step size. I tried using momentum with SGD. In all these cases, the generator may or may not decrease in the beginning, but then increases for sure. So, I think there is something inherently wrong in my model. I know training Deep Models is difficult and GANs still more, but there has to be some reason/heuristic as to why this is happening. Any inputs in appreciated. I'm new to Neural Networks, Deep Learning and hence new to GANs as well.



Here is my code:
Cifar10Models.py



from keras import Sequential
from keras.initializers import TruncatedNormal
from keras.layers import Activation, BatchNormalization, Conv2D, Conv2DTranspose, Dense, Flatten, LeakyReLU, Reshape
from keras.optimizers import SGD


class DcGan:
def __init__(self, print_model_summary: bool = False):
self.generator_model = None
self.discriminator_model = None
self.concatenated_model = None
self.print_model_summary = print_model_summary

def build_generator_model(self):
if self.generator_model:
return self.generator_model

self.generator_model = Sequential()
self.generator_model.add(Dense(4 * 4 * 512, input_dim=100,
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.generator_model.add(Activation('relu'))
self.generator_model.add(Reshape((4, 4, 512)))

self.generator_model.add(Conv2DTranspose(256, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.generator_model.add(Activation('relu'))

self.generator_model.add(Conv2DTranspose(128, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.generator_model.add(Activation('relu'))

self.generator_model.add(Conv2DTranspose(64, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.generator_model.add(Activation('relu'))

self.generator_model.add(Conv2D(3, 3, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(Activation('tanh'))

if self.print_model_summary:
self.generator_model.summary()

return self.generator_model

def build_discriminator_model(self):
if self.discriminator_model:
return self.discriminator_model

self.discriminator_model = Sequential()
self.discriminator_model.add(Conv2D(128, 3, strides=2, input_shape=(32, 32, 3), padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.discriminator_model.add(LeakyReLU(alpha=0.2))

self.discriminator_model.add(Conv2D(256, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.discriminator_model.add(LeakyReLU(alpha=0.2))

self.discriminator_model.add(Conv2D(512, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.discriminator_model.add(LeakyReLU(alpha=0.2))

self.discriminator_model.add(Conv2D(1024, 3, strides=2, padding='same',
kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.discriminator_model.add(LeakyReLU(alpha=0.2))

self.discriminator_model.add(Flatten())
self.discriminator_model.add(Dense(1, kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
self.generator_model.add(BatchNormalization(momentum=0.5))
self.discriminator_model.add(Activation('sigmoid'))

if self.print_model_summary:
self.discriminator_model.summary()

return self.discriminator_model

def build_concatenated_model(self):
if self.concatenated_model:
return self.concatenated_model

self.concatenated_model = Sequential()
self.concatenated_model.add(self.generator_model)
self.concatenated_model.add(self.discriminator_model)

if self.print_model_summary:
self.concatenated_model.summary()

return self.concatenated_model

def build_dc_gan(self):
self.build_generator_model()
self.build_discriminator_model()
self.build_concatenated_model()

self.discriminator_model.trainable = True
optimizer = SGD(lr=0.0002)
self.discriminator_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
self.discriminator_model.trainable = False
optimizer = SGD(lr=0.0001)
self.concatenated_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
self.discriminator_model.trainable = True


Cifar10Trainer.py:



# Based on https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0

import os

import datetime
import numpy
import time
from keras.datasets import cifar10
from keras.utils import np_utils
from matplotlib import pyplot as plt

import Cifar10Models

log_file_name = 'logs.csv'


class Cifar10Trainer:
def __init__(self):
self.x_train, self.y_train = self.get_train_and_test_data()
self.dc_gan = Cifar10Models.DcGan()
self.dc_gan.build_dc_gan()

@staticmethod
def get_train_and_test_data():
(x_train, y_train), _ = cifar10.load_data()
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 3)
# Generator output has tanh activation whose range is [-1,1]
x_train = (x_train.astype('float32') * 2 / 255) - 1
y_train = np_utils.to_categorical(y_train, 10)
return x_train, y_train

def train(self, train_steps=10000, batch_size=128, log_interval=10, save_interval=100,
output_folder_path='./Trained_Models/'):
self.initialize_log(output_folder_path)
self.sample_real_images(output_folder_path)
for i in range(train_steps):
# Get real (Database) Images
images_real = self.x_train[numpy.random.randint(0, self.x_train.shape[0], size=batch_size), :, :, :]

# Generate Fake Images
noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
images_fake = self.dc_gan.generator_model.predict(noise)

# Train discriminator on both real and fake images
x = numpy.concatenate((images_real, images_fake), axis=0)
y = numpy.ones([2 * batch_size, 1])
y[batch_size:, :] = 0
d_loss = self.dc_gan.discriminator_model.train_on_batch(x, y)

# Train generator i.e. concatenated model
noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
y = numpy.ones([batch_size, 1])
g_loss = self.dc_gan.concatenated_model.train_on_batch(noise, y)

# Print Logs, Save Models, generate sample images
if (i + 1) % log_interval == 0:
self.log_progress(output_folder_path, i + 1, g_loss, d_loss)
if (i + 1) % save_interval == 0:
self.save_models(output_folder_path, i + 1)
self.generate_images(output_folder_path, i + 1)

@staticmethod
def initialize_log(output_folder_path):
log_line = 'Iteration No, Generator Loss, Generator Accuracy, Discriminator Loss, Discriminator Accuracy, '
'Timen'
with open(os.path.join(output_folder_path, log_file_name), 'w') as log_file:
log_file.write(log_line)

@staticmethod
def log_progress(output_folder_path, iteration_no, g_loss, d_loss):
log_line = '0:05,1:2.4f,2:0.4f,3:2.4f,4:0.4f,5n'
.format(iteration_no, g_loss[0], g_loss[1], d_loss[0], d_loss[1],
datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
with open(os.path.join(output_folder_path, log_file_name), 'a') as log_file:
log_file.write(log_line)
print(log_line)

def save_models(self, output_folder_path, iteration_no):
self.dc_gan.generator_model.save(
os.path.join(output_folder_path, 'generator_model_0.h5'.format(iteration_no)))
self.dc_gan.discriminator_model.save(
os.path.join(output_folder_path, 'discriminator_model_0.h5'.format(iteration_no)))
self.dc_gan.concatenated_model.save(
os.path.join(output_folder_path, 'concatenated_model_0.h5'.format(iteration_no)))

def sample_real_images(self, output_folder_path):
filepath = os.path.join(output_folder_path, 'CIFAR10_Sample_Real_Images.png')
i = numpy.random.randint(0, self.x_train.shape[0], 16)
images = self.x_train[i, :, :, :]
plt.figure(figsize=(10, 10))
for i in range(16):
plt.subplot(4, 4, i + 1)
image = images[i, :, :, :]
image = numpy.reshape(image, [32, 32, 3])
plt.imshow(image)
plt.axis('off')
plt.tight_layout()
plt.savefig(filepath)
plt.close('all')

def generate_images(self, output_folder_path, iteration_no, noise=None):
filepath = os.path.join(output_folder_path, 'CIFAR10_Gen_Image0.png'.format(iteration_no))
if noise is None:
noise = numpy.random.uniform(-1, 1, size=[16, 100])
# Generator output has tanh activation whose range is [-1,1]
images = (self.dc_gan.generator_model.predict(noise) + 1) / 2
plt.figure(figsize=(10, 10))
for i in range(16):
plt.subplot(4, 4, i + 1)
image = images[i, :, :, :]
image = numpy.reshape(image, [32, 32, 3])
plt.imshow(image)
plt.axis('off')
plt.tight_layout()
plt.savefig(filepath)
plt.close('all')


def main():
cifar10_trainer = Cifar10Trainer()
cifar10_trainer.train(train_steps=10000, log_interval=10, save_interval=100)
del cifar10_trainer.dc_gan
return


if __name__ == '__main__':
start_time = time.time()
print('Program Started at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))))
main()
end_time = time.time()
print('Program Ended at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time))))
print('Total Execution Time: 0s'.format(datetime.timedelta(seconds=end_time - start_time)))


Some of the graphs are as below:



  1. Discriminator Optimizer: Adam(lr=0.0001, beta1=0.5)

    Generator Optimizer: Adam(lr=0.0001, beta1=0.5)
    enter image description here


  2. Discriminator Optimizer: SGD(lr=0.0001)

    Generator Optimizer: SGD(lr=0.0001)
    enter image description here


  3. Discriminator Optimizer: SGD(lr=0.0001)

    Generator Optimizer: SGD(lr=0.001)
    enter image description here


  4. Discriminator Optimizer: SGD(lr=0.0001)

    Generator Optimizer: SGD(lr=0.0005)
    enter image description here


Note:

This question was originally asked in StackOverflow and then re-asked here as per suggestions in SO



Edit1:

Adding some generated images for reference



  1. Images generated by Adam Optimizer
    enter image description here


  2. Images generated by SGD Optimizer
    enter image description here










share|improve this question











$endgroup$
















    1












    $begingroup$


    I'm trying to train a DC-GAN on CIFAR-10 Dataset. I'm using Binary Cross Entropy as my loss function for both discriminator and generator (appended with non-trainable discriminator). If I train using Adam optimizer, the GAN is training fine. But if I replace the optimizer by SGD, the training is going haywire. The generator accuracy starts at some higher point and with iterations, it goes to 0 and stays there. The discriminator accuracy starts at some lower point and reaches somewhere around 0.5 (expected, right?). The peculiar thing is the generator loss function is increasing with iterations. I though may be the step is too high. I tried changing the step size. I tried using momentum with SGD. In all these cases, the generator may or may not decrease in the beginning, but then increases for sure. So, I think there is something inherently wrong in my model. I know training Deep Models is difficult and GANs still more, but there has to be some reason/heuristic as to why this is happening. Any inputs in appreciated. I'm new to Neural Networks, Deep Learning and hence new to GANs as well.



    Here is my code:
    Cifar10Models.py



    from keras import Sequential
    from keras.initializers import TruncatedNormal
    from keras.layers import Activation, BatchNormalization, Conv2D, Conv2DTranspose, Dense, Flatten, LeakyReLU, Reshape
    from keras.optimizers import SGD


    class DcGan:
    def __init__(self, print_model_summary: bool = False):
    self.generator_model = None
    self.discriminator_model = None
    self.concatenated_model = None
    self.print_model_summary = print_model_summary

    def build_generator_model(self):
    if self.generator_model:
    return self.generator_model

    self.generator_model = Sequential()
    self.generator_model.add(Dense(4 * 4 * 512, input_dim=100,
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.generator_model.add(Activation('relu'))
    self.generator_model.add(Reshape((4, 4, 512)))

    self.generator_model.add(Conv2DTranspose(256, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.generator_model.add(Activation('relu'))

    self.generator_model.add(Conv2DTranspose(128, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.generator_model.add(Activation('relu'))

    self.generator_model.add(Conv2DTranspose(64, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.generator_model.add(Activation('relu'))

    self.generator_model.add(Conv2D(3, 3, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(Activation('tanh'))

    if self.print_model_summary:
    self.generator_model.summary()

    return self.generator_model

    def build_discriminator_model(self):
    if self.discriminator_model:
    return self.discriminator_model

    self.discriminator_model = Sequential()
    self.discriminator_model.add(Conv2D(128, 3, strides=2, input_shape=(32, 32, 3), padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.discriminator_model.add(LeakyReLU(alpha=0.2))

    self.discriminator_model.add(Conv2D(256, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.discriminator_model.add(LeakyReLU(alpha=0.2))

    self.discriminator_model.add(Conv2D(512, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.discriminator_model.add(LeakyReLU(alpha=0.2))

    self.discriminator_model.add(Conv2D(1024, 3, strides=2, padding='same',
    kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.discriminator_model.add(LeakyReLU(alpha=0.2))

    self.discriminator_model.add(Flatten())
    self.discriminator_model.add(Dense(1, kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
    self.generator_model.add(BatchNormalization(momentum=0.5))
    self.discriminator_model.add(Activation('sigmoid'))

    if self.print_model_summary:
    self.discriminator_model.summary()

    return self.discriminator_model

    def build_concatenated_model(self):
    if self.concatenated_model:
    return self.concatenated_model

    self.concatenated_model = Sequential()
    self.concatenated_model.add(self.generator_model)
    self.concatenated_model.add(self.discriminator_model)

    if self.print_model_summary:
    self.concatenated_model.summary()

    return self.concatenated_model

    def build_dc_gan(self):
    self.build_generator_model()
    self.build_discriminator_model()
    self.build_concatenated_model()

    self.discriminator_model.trainable = True
    optimizer = SGD(lr=0.0002)
    self.discriminator_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    self.discriminator_model.trainable = False
    optimizer = SGD(lr=0.0001)
    self.concatenated_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    self.discriminator_model.trainable = True


    Cifar10Trainer.py:



    # Based on https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0

    import os

    import datetime
    import numpy
    import time
    from keras.datasets import cifar10
    from keras.utils import np_utils
    from matplotlib import pyplot as plt

    import Cifar10Models

    log_file_name = 'logs.csv'


    class Cifar10Trainer:
    def __init__(self):
    self.x_train, self.y_train = self.get_train_and_test_data()
    self.dc_gan = Cifar10Models.DcGan()
    self.dc_gan.build_dc_gan()

    @staticmethod
    def get_train_and_test_data():
    (x_train, y_train), _ = cifar10.load_data()
    x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 3)
    # Generator output has tanh activation whose range is [-1,1]
    x_train = (x_train.astype('float32') * 2 / 255) - 1
    y_train = np_utils.to_categorical(y_train, 10)
    return x_train, y_train

    def train(self, train_steps=10000, batch_size=128, log_interval=10, save_interval=100,
    output_folder_path='./Trained_Models/'):
    self.initialize_log(output_folder_path)
    self.sample_real_images(output_folder_path)
    for i in range(train_steps):
    # Get real (Database) Images
    images_real = self.x_train[numpy.random.randint(0, self.x_train.shape[0], size=batch_size), :, :, :]

    # Generate Fake Images
    noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
    images_fake = self.dc_gan.generator_model.predict(noise)

    # Train discriminator on both real and fake images
    x = numpy.concatenate((images_real, images_fake), axis=0)
    y = numpy.ones([2 * batch_size, 1])
    y[batch_size:, :] = 0
    d_loss = self.dc_gan.discriminator_model.train_on_batch(x, y)

    # Train generator i.e. concatenated model
    noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
    y = numpy.ones([batch_size, 1])
    g_loss = self.dc_gan.concatenated_model.train_on_batch(noise, y)

    # Print Logs, Save Models, generate sample images
    if (i + 1) % log_interval == 0:
    self.log_progress(output_folder_path, i + 1, g_loss, d_loss)
    if (i + 1) % save_interval == 0:
    self.save_models(output_folder_path, i + 1)
    self.generate_images(output_folder_path, i + 1)

    @staticmethod
    def initialize_log(output_folder_path):
    log_line = 'Iteration No, Generator Loss, Generator Accuracy, Discriminator Loss, Discriminator Accuracy, '
    'Timen'
    with open(os.path.join(output_folder_path, log_file_name), 'w') as log_file:
    log_file.write(log_line)

    @staticmethod
    def log_progress(output_folder_path, iteration_no, g_loss, d_loss):
    log_line = '0:05,1:2.4f,2:0.4f,3:2.4f,4:0.4f,5n'
    .format(iteration_no, g_loss[0], g_loss[1], d_loss[0], d_loss[1],
    datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    with open(os.path.join(output_folder_path, log_file_name), 'a') as log_file:
    log_file.write(log_line)
    print(log_line)

    def save_models(self, output_folder_path, iteration_no):
    self.dc_gan.generator_model.save(
    os.path.join(output_folder_path, 'generator_model_0.h5'.format(iteration_no)))
    self.dc_gan.discriminator_model.save(
    os.path.join(output_folder_path, 'discriminator_model_0.h5'.format(iteration_no)))
    self.dc_gan.concatenated_model.save(
    os.path.join(output_folder_path, 'concatenated_model_0.h5'.format(iteration_no)))

    def sample_real_images(self, output_folder_path):
    filepath = os.path.join(output_folder_path, 'CIFAR10_Sample_Real_Images.png')
    i = numpy.random.randint(0, self.x_train.shape[0], 16)
    images = self.x_train[i, :, :, :]
    plt.figure(figsize=(10, 10))
    for i in range(16):
    plt.subplot(4, 4, i + 1)
    image = images[i, :, :, :]
    image = numpy.reshape(image, [32, 32, 3])
    plt.imshow(image)
    plt.axis('off')
    plt.tight_layout()
    plt.savefig(filepath)
    plt.close('all')

    def generate_images(self, output_folder_path, iteration_no, noise=None):
    filepath = os.path.join(output_folder_path, 'CIFAR10_Gen_Image0.png'.format(iteration_no))
    if noise is None:
    noise = numpy.random.uniform(-1, 1, size=[16, 100])
    # Generator output has tanh activation whose range is [-1,1]
    images = (self.dc_gan.generator_model.predict(noise) + 1) / 2
    plt.figure(figsize=(10, 10))
    for i in range(16):
    plt.subplot(4, 4, i + 1)
    image = images[i, :, :, :]
    image = numpy.reshape(image, [32, 32, 3])
    plt.imshow(image)
    plt.axis('off')
    plt.tight_layout()
    plt.savefig(filepath)
    plt.close('all')


    def main():
    cifar10_trainer = Cifar10Trainer()
    cifar10_trainer.train(train_steps=10000, log_interval=10, save_interval=100)
    del cifar10_trainer.dc_gan
    return


    if __name__ == '__main__':
    start_time = time.time()
    print('Program Started at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))))
    main()
    end_time = time.time()
    print('Program Ended at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time))))
    print('Total Execution Time: 0s'.format(datetime.timedelta(seconds=end_time - start_time)))


    Some of the graphs are as below:



    1. Discriminator Optimizer: Adam(lr=0.0001, beta1=0.5)

      Generator Optimizer: Adam(lr=0.0001, beta1=0.5)
      enter image description here


    2. Discriminator Optimizer: SGD(lr=0.0001)

      Generator Optimizer: SGD(lr=0.0001)
      enter image description here


    3. Discriminator Optimizer: SGD(lr=0.0001)

      Generator Optimizer: SGD(lr=0.001)
      enter image description here


    4. Discriminator Optimizer: SGD(lr=0.0001)

      Generator Optimizer: SGD(lr=0.0005)
      enter image description here


    Note:

    This question was originally asked in StackOverflow and then re-asked here as per suggestions in SO



    Edit1:

    Adding some generated images for reference



    1. Images generated by Adam Optimizer
      enter image description here


    2. Images generated by SGD Optimizer
      enter image description here










    share|improve this question











    $endgroup$














      1












      1








      1





      $begingroup$


      I'm trying to train a DC-GAN on CIFAR-10 Dataset. I'm using Binary Cross Entropy as my loss function for both discriminator and generator (appended with non-trainable discriminator). If I train using Adam optimizer, the GAN is training fine. But if I replace the optimizer by SGD, the training is going haywire. The generator accuracy starts at some higher point and with iterations, it goes to 0 and stays there. The discriminator accuracy starts at some lower point and reaches somewhere around 0.5 (expected, right?). The peculiar thing is the generator loss function is increasing with iterations. I though may be the step is too high. I tried changing the step size. I tried using momentum with SGD. In all these cases, the generator may or may not decrease in the beginning, but then increases for sure. So, I think there is something inherently wrong in my model. I know training Deep Models is difficult and GANs still more, but there has to be some reason/heuristic as to why this is happening. Any inputs in appreciated. I'm new to Neural Networks, Deep Learning and hence new to GANs as well.



      Here is my code:
      Cifar10Models.py



      from keras import Sequential
      from keras.initializers import TruncatedNormal
      from keras.layers import Activation, BatchNormalization, Conv2D, Conv2DTranspose, Dense, Flatten, LeakyReLU, Reshape
      from keras.optimizers import SGD


      class DcGan:
      def __init__(self, print_model_summary: bool = False):
      self.generator_model = None
      self.discriminator_model = None
      self.concatenated_model = None
      self.print_model_summary = print_model_summary

      def build_generator_model(self):
      if self.generator_model:
      return self.generator_model

      self.generator_model = Sequential()
      self.generator_model.add(Dense(4 * 4 * 512, input_dim=100,
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))
      self.generator_model.add(Reshape((4, 4, 512)))

      self.generator_model.add(Conv2DTranspose(256, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2DTranspose(128, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2DTranspose(64, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2D(3, 3, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(Activation('tanh'))

      if self.print_model_summary:
      self.generator_model.summary()

      return self.generator_model

      def build_discriminator_model(self):
      if self.discriminator_model:
      return self.discriminator_model

      self.discriminator_model = Sequential()
      self.discriminator_model.add(Conv2D(128, 3, strides=2, input_shape=(32, 32, 3), padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(256, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(512, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(1024, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Flatten())
      self.discriminator_model.add(Dense(1, kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(Activation('sigmoid'))

      if self.print_model_summary:
      self.discriminator_model.summary()

      return self.discriminator_model

      def build_concatenated_model(self):
      if self.concatenated_model:
      return self.concatenated_model

      self.concatenated_model = Sequential()
      self.concatenated_model.add(self.generator_model)
      self.concatenated_model.add(self.discriminator_model)

      if self.print_model_summary:
      self.concatenated_model.summary()

      return self.concatenated_model

      def build_dc_gan(self):
      self.build_generator_model()
      self.build_discriminator_model()
      self.build_concatenated_model()

      self.discriminator_model.trainable = True
      optimizer = SGD(lr=0.0002)
      self.discriminator_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
      self.discriminator_model.trainable = False
      optimizer = SGD(lr=0.0001)
      self.concatenated_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
      self.discriminator_model.trainable = True


      Cifar10Trainer.py:



      # Based on https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0

      import os

      import datetime
      import numpy
      import time
      from keras.datasets import cifar10
      from keras.utils import np_utils
      from matplotlib import pyplot as plt

      import Cifar10Models

      log_file_name = 'logs.csv'


      class Cifar10Trainer:
      def __init__(self):
      self.x_train, self.y_train = self.get_train_and_test_data()
      self.dc_gan = Cifar10Models.DcGan()
      self.dc_gan.build_dc_gan()

      @staticmethod
      def get_train_and_test_data():
      (x_train, y_train), _ = cifar10.load_data()
      x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 3)
      # Generator output has tanh activation whose range is [-1,1]
      x_train = (x_train.astype('float32') * 2 / 255) - 1
      y_train = np_utils.to_categorical(y_train, 10)
      return x_train, y_train

      def train(self, train_steps=10000, batch_size=128, log_interval=10, save_interval=100,
      output_folder_path='./Trained_Models/'):
      self.initialize_log(output_folder_path)
      self.sample_real_images(output_folder_path)
      for i in range(train_steps):
      # Get real (Database) Images
      images_real = self.x_train[numpy.random.randint(0, self.x_train.shape[0], size=batch_size), :, :, :]

      # Generate Fake Images
      noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
      images_fake = self.dc_gan.generator_model.predict(noise)

      # Train discriminator on both real and fake images
      x = numpy.concatenate((images_real, images_fake), axis=0)
      y = numpy.ones([2 * batch_size, 1])
      y[batch_size:, :] = 0
      d_loss = self.dc_gan.discriminator_model.train_on_batch(x, y)

      # Train generator i.e. concatenated model
      noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
      y = numpy.ones([batch_size, 1])
      g_loss = self.dc_gan.concatenated_model.train_on_batch(noise, y)

      # Print Logs, Save Models, generate sample images
      if (i + 1) % log_interval == 0:
      self.log_progress(output_folder_path, i + 1, g_loss, d_loss)
      if (i + 1) % save_interval == 0:
      self.save_models(output_folder_path, i + 1)
      self.generate_images(output_folder_path, i + 1)

      @staticmethod
      def initialize_log(output_folder_path):
      log_line = 'Iteration No, Generator Loss, Generator Accuracy, Discriminator Loss, Discriminator Accuracy, '
      'Timen'
      with open(os.path.join(output_folder_path, log_file_name), 'w') as log_file:
      log_file.write(log_line)

      @staticmethod
      def log_progress(output_folder_path, iteration_no, g_loss, d_loss):
      log_line = '0:05,1:2.4f,2:0.4f,3:2.4f,4:0.4f,5n'
      .format(iteration_no, g_loss[0], g_loss[1], d_loss[0], d_loss[1],
      datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
      with open(os.path.join(output_folder_path, log_file_name), 'a') as log_file:
      log_file.write(log_line)
      print(log_line)

      def save_models(self, output_folder_path, iteration_no):
      self.dc_gan.generator_model.save(
      os.path.join(output_folder_path, 'generator_model_0.h5'.format(iteration_no)))
      self.dc_gan.discriminator_model.save(
      os.path.join(output_folder_path, 'discriminator_model_0.h5'.format(iteration_no)))
      self.dc_gan.concatenated_model.save(
      os.path.join(output_folder_path, 'concatenated_model_0.h5'.format(iteration_no)))

      def sample_real_images(self, output_folder_path):
      filepath = os.path.join(output_folder_path, 'CIFAR10_Sample_Real_Images.png')
      i = numpy.random.randint(0, self.x_train.shape[0], 16)
      images = self.x_train[i, :, :, :]
      plt.figure(figsize=(10, 10))
      for i in range(16):
      plt.subplot(4, 4, i + 1)
      image = images[i, :, :, :]
      image = numpy.reshape(image, [32, 32, 3])
      plt.imshow(image)
      plt.axis('off')
      plt.tight_layout()
      plt.savefig(filepath)
      plt.close('all')

      def generate_images(self, output_folder_path, iteration_no, noise=None):
      filepath = os.path.join(output_folder_path, 'CIFAR10_Gen_Image0.png'.format(iteration_no))
      if noise is None:
      noise = numpy.random.uniform(-1, 1, size=[16, 100])
      # Generator output has tanh activation whose range is [-1,1]
      images = (self.dc_gan.generator_model.predict(noise) + 1) / 2
      plt.figure(figsize=(10, 10))
      for i in range(16):
      plt.subplot(4, 4, i + 1)
      image = images[i, :, :, :]
      image = numpy.reshape(image, [32, 32, 3])
      plt.imshow(image)
      plt.axis('off')
      plt.tight_layout()
      plt.savefig(filepath)
      plt.close('all')


      def main():
      cifar10_trainer = Cifar10Trainer()
      cifar10_trainer.train(train_steps=10000, log_interval=10, save_interval=100)
      del cifar10_trainer.dc_gan
      return


      if __name__ == '__main__':
      start_time = time.time()
      print('Program Started at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))))
      main()
      end_time = time.time()
      print('Program Ended at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time))))
      print('Total Execution Time: 0s'.format(datetime.timedelta(seconds=end_time - start_time)))


      Some of the graphs are as below:



      1. Discriminator Optimizer: Adam(lr=0.0001, beta1=0.5)

        Generator Optimizer: Adam(lr=0.0001, beta1=0.5)
        enter image description here


      2. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.0001)
        enter image description here


      3. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.001)
        enter image description here


      4. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.0005)
        enter image description here


      Note:

      This question was originally asked in StackOverflow and then re-asked here as per suggestions in SO



      Edit1:

      Adding some generated images for reference



      1. Images generated by Adam Optimizer
        enter image description here


      2. Images generated by SGD Optimizer
        enter image description here










      share|improve this question











      $endgroup$




      I'm trying to train a DC-GAN on CIFAR-10 Dataset. I'm using Binary Cross Entropy as my loss function for both discriminator and generator (appended with non-trainable discriminator). If I train using Adam optimizer, the GAN is training fine. But if I replace the optimizer by SGD, the training is going haywire. The generator accuracy starts at some higher point and with iterations, it goes to 0 and stays there. The discriminator accuracy starts at some lower point and reaches somewhere around 0.5 (expected, right?). The peculiar thing is the generator loss function is increasing with iterations. I though may be the step is too high. I tried changing the step size. I tried using momentum with SGD. In all these cases, the generator may or may not decrease in the beginning, but then increases for sure. So, I think there is something inherently wrong in my model. I know training Deep Models is difficult and GANs still more, but there has to be some reason/heuristic as to why this is happening. Any inputs in appreciated. I'm new to Neural Networks, Deep Learning and hence new to GANs as well.



      Here is my code:
      Cifar10Models.py



      from keras import Sequential
      from keras.initializers import TruncatedNormal
      from keras.layers import Activation, BatchNormalization, Conv2D, Conv2DTranspose, Dense, Flatten, LeakyReLU, Reshape
      from keras.optimizers import SGD


      class DcGan:
      def __init__(self, print_model_summary: bool = False):
      self.generator_model = None
      self.discriminator_model = None
      self.concatenated_model = None
      self.print_model_summary = print_model_summary

      def build_generator_model(self):
      if self.generator_model:
      return self.generator_model

      self.generator_model = Sequential()
      self.generator_model.add(Dense(4 * 4 * 512, input_dim=100,
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))
      self.generator_model.add(Reshape((4, 4, 512)))

      self.generator_model.add(Conv2DTranspose(256, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2DTranspose(128, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2DTranspose(64, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.generator_model.add(Activation('relu'))

      self.generator_model.add(Conv2D(3, 3, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(Activation('tanh'))

      if self.print_model_summary:
      self.generator_model.summary()

      return self.generator_model

      def build_discriminator_model(self):
      if self.discriminator_model:
      return self.discriminator_model

      self.discriminator_model = Sequential()
      self.discriminator_model.add(Conv2D(128, 3, strides=2, input_shape=(32, 32, 3), padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(256, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(512, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Conv2D(1024, 3, strides=2, padding='same',
      kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(LeakyReLU(alpha=0.2))

      self.discriminator_model.add(Flatten())
      self.discriminator_model.add(Dense(1, kernel_initializer=TruncatedNormal(mean=0.0, stddev=0.02)))
      self.generator_model.add(BatchNormalization(momentum=0.5))
      self.discriminator_model.add(Activation('sigmoid'))

      if self.print_model_summary:
      self.discriminator_model.summary()

      return self.discriminator_model

      def build_concatenated_model(self):
      if self.concatenated_model:
      return self.concatenated_model

      self.concatenated_model = Sequential()
      self.concatenated_model.add(self.generator_model)
      self.concatenated_model.add(self.discriminator_model)

      if self.print_model_summary:
      self.concatenated_model.summary()

      return self.concatenated_model

      def build_dc_gan(self):
      self.build_generator_model()
      self.build_discriminator_model()
      self.build_concatenated_model()

      self.discriminator_model.trainable = True
      optimizer = SGD(lr=0.0002)
      self.discriminator_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
      self.discriminator_model.trainable = False
      optimizer = SGD(lr=0.0001)
      self.concatenated_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
      self.discriminator_model.trainable = True


      Cifar10Trainer.py:



      # Based on https://towardsdatascience.com/gan-by-example-using-keras-on-tensorflow-backend-1a6d515a60d0

      import os

      import datetime
      import numpy
      import time
      from keras.datasets import cifar10
      from keras.utils import np_utils
      from matplotlib import pyplot as plt

      import Cifar10Models

      log_file_name = 'logs.csv'


      class Cifar10Trainer:
      def __init__(self):
      self.x_train, self.y_train = self.get_train_and_test_data()
      self.dc_gan = Cifar10Models.DcGan()
      self.dc_gan.build_dc_gan()

      @staticmethod
      def get_train_and_test_data():
      (x_train, y_train), _ = cifar10.load_data()
      x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 3)
      # Generator output has tanh activation whose range is [-1,1]
      x_train = (x_train.astype('float32') * 2 / 255) - 1
      y_train = np_utils.to_categorical(y_train, 10)
      return x_train, y_train

      def train(self, train_steps=10000, batch_size=128, log_interval=10, save_interval=100,
      output_folder_path='./Trained_Models/'):
      self.initialize_log(output_folder_path)
      self.sample_real_images(output_folder_path)
      for i in range(train_steps):
      # Get real (Database) Images
      images_real = self.x_train[numpy.random.randint(0, self.x_train.shape[0], size=batch_size), :, :, :]

      # Generate Fake Images
      noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
      images_fake = self.dc_gan.generator_model.predict(noise)

      # Train discriminator on both real and fake images
      x = numpy.concatenate((images_real, images_fake), axis=0)
      y = numpy.ones([2 * batch_size, 1])
      y[batch_size:, :] = 0
      d_loss = self.dc_gan.discriminator_model.train_on_batch(x, y)

      # Train generator i.e. concatenated model
      noise = numpy.random.uniform(-1.0, 1.0, size=[batch_size, 100])
      y = numpy.ones([batch_size, 1])
      g_loss = self.dc_gan.concatenated_model.train_on_batch(noise, y)

      # Print Logs, Save Models, generate sample images
      if (i + 1) % log_interval == 0:
      self.log_progress(output_folder_path, i + 1, g_loss, d_loss)
      if (i + 1) % save_interval == 0:
      self.save_models(output_folder_path, i + 1)
      self.generate_images(output_folder_path, i + 1)

      @staticmethod
      def initialize_log(output_folder_path):
      log_line = 'Iteration No, Generator Loss, Generator Accuracy, Discriminator Loss, Discriminator Accuracy, '
      'Timen'
      with open(os.path.join(output_folder_path, log_file_name), 'w') as log_file:
      log_file.write(log_line)

      @staticmethod
      def log_progress(output_folder_path, iteration_no, g_loss, d_loss):
      log_line = '0:05,1:2.4f,2:0.4f,3:2.4f,4:0.4f,5n'
      .format(iteration_no, g_loss[0], g_loss[1], d_loss[0], d_loss[1],
      datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
      with open(os.path.join(output_folder_path, log_file_name), 'a') as log_file:
      log_file.write(log_line)
      print(log_line)

      def save_models(self, output_folder_path, iteration_no):
      self.dc_gan.generator_model.save(
      os.path.join(output_folder_path, 'generator_model_0.h5'.format(iteration_no)))
      self.dc_gan.discriminator_model.save(
      os.path.join(output_folder_path, 'discriminator_model_0.h5'.format(iteration_no)))
      self.dc_gan.concatenated_model.save(
      os.path.join(output_folder_path, 'concatenated_model_0.h5'.format(iteration_no)))

      def sample_real_images(self, output_folder_path):
      filepath = os.path.join(output_folder_path, 'CIFAR10_Sample_Real_Images.png')
      i = numpy.random.randint(0, self.x_train.shape[0], 16)
      images = self.x_train[i, :, :, :]
      plt.figure(figsize=(10, 10))
      for i in range(16):
      plt.subplot(4, 4, i + 1)
      image = images[i, :, :, :]
      image = numpy.reshape(image, [32, 32, 3])
      plt.imshow(image)
      plt.axis('off')
      plt.tight_layout()
      plt.savefig(filepath)
      plt.close('all')

      def generate_images(self, output_folder_path, iteration_no, noise=None):
      filepath = os.path.join(output_folder_path, 'CIFAR10_Gen_Image0.png'.format(iteration_no))
      if noise is None:
      noise = numpy.random.uniform(-1, 1, size=[16, 100])
      # Generator output has tanh activation whose range is [-1,1]
      images = (self.dc_gan.generator_model.predict(noise) + 1) / 2
      plt.figure(figsize=(10, 10))
      for i in range(16):
      plt.subplot(4, 4, i + 1)
      image = images[i, :, :, :]
      image = numpy.reshape(image, [32, 32, 3])
      plt.imshow(image)
      plt.axis('off')
      plt.tight_layout()
      plt.savefig(filepath)
      plt.close('all')


      def main():
      cifar10_trainer = Cifar10Trainer()
      cifar10_trainer.train(train_steps=10000, log_interval=10, save_interval=100)
      del cifar10_trainer.dc_gan
      return


      if __name__ == '__main__':
      start_time = time.time()
      print('Program Started at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(start_time))))
      main()
      end_time = time.time()
      print('Program Ended at 0'.format(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(end_time))))
      print('Total Execution Time: 0s'.format(datetime.timedelta(seconds=end_time - start_time)))


      Some of the graphs are as below:



      1. Discriminator Optimizer: Adam(lr=0.0001, beta1=0.5)

        Generator Optimizer: Adam(lr=0.0001, beta1=0.5)
        enter image description here


      2. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.0001)
        enter image description here


      3. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.001)
        enter image description here


      4. Discriminator Optimizer: SGD(lr=0.0001)

        Generator Optimizer: SGD(lr=0.0005)
        enter image description here


      Note:

      This question was originally asked in StackOverflow and then re-asked here as per suggestions in SO



      Edit1:

      Adding some generated images for reference



      1. Images generated by Adam Optimizer
        enter image description here


      2. Images generated by SGD Optimizer
        enter image description here







      python deep-learning keras optimization gan






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 24 at 12:01







      Nagabhushan S N

















      asked Mar 24 at 5:18









      Nagabhushan S NNagabhushan S N

      1597




      1597




















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          I think that there are several issues with your model:



          First of all - Your generator's loss is not the generator's loss. You have on binary cross-entropy loss function for the discriminator, and you have another binary cross-entropy loss function for the concatenated model whose output is again the discriminator's output (on generated images).
          The "generator loss" you are showing is the discriminator's loss when dealing with generated images. You want this loss to go up, it means that your model successfully generates images that you discriminator fails to catch (as can be seen in the overall discriminator's accuracy which is at 0.5).



          Another issue, is that you should add some generator regularization in the form of an actual generator loss ("generator objective function"). You can read about the different options in GAN Objective Functions: GANs and Their Variations.



          A final issue that I see is that you are passing the generated images thru a final hyperbolic tangent activation function, and I don't really understand why? The generator in your case is supposed to generate a "believable" CIFAR10 image, which is a 32x32x3 tensor with values in the range [0,255] or [0,1]. Your generator's output has a potential range of [-1,1] (as you state in your code).






          share|improve this answer









          $endgroup$












          • $begingroup$
            Happy 1K! Comments must be at least 15 characters in length.
            $endgroup$
            – Esmailian
            Mar 24 at 8:26






          • 1




            $begingroup$
            Thank you very much :)
            $endgroup$
            – Mark.F
            Mar 24 at 10:15










          • $begingroup$
            Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:54










          • $begingroup$
            2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:58










          • $begingroup$
            I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 12:02











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47875%2fwhy-is-my-generator-loss-function-increasing-with-iterations%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          I think that there are several issues with your model:



          First of all - Your generator's loss is not the generator's loss. You have on binary cross-entropy loss function for the discriminator, and you have another binary cross-entropy loss function for the concatenated model whose output is again the discriminator's output (on generated images).
          The "generator loss" you are showing is the discriminator's loss when dealing with generated images. You want this loss to go up, it means that your model successfully generates images that you discriminator fails to catch (as can be seen in the overall discriminator's accuracy which is at 0.5).



          Another issue, is that you should add some generator regularization in the form of an actual generator loss ("generator objective function"). You can read about the different options in GAN Objective Functions: GANs and Their Variations.



          A final issue that I see is that you are passing the generated images thru a final hyperbolic tangent activation function, and I don't really understand why? The generator in your case is supposed to generate a "believable" CIFAR10 image, which is a 32x32x3 tensor with values in the range [0,255] or [0,1]. Your generator's output has a potential range of [-1,1] (as you state in your code).






          share|improve this answer









          $endgroup$












          • $begingroup$
            Happy 1K! Comments must be at least 15 characters in length.
            $endgroup$
            – Esmailian
            Mar 24 at 8:26






          • 1




            $begingroup$
            Thank you very much :)
            $endgroup$
            – Mark.F
            Mar 24 at 10:15










          • $begingroup$
            Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:54










          • $begingroup$
            2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:58










          • $begingroup$
            I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 12:02















          1












          $begingroup$

          I think that there are several issues with your model:



          First of all - Your generator's loss is not the generator's loss. You have on binary cross-entropy loss function for the discriminator, and you have another binary cross-entropy loss function for the concatenated model whose output is again the discriminator's output (on generated images).
          The "generator loss" you are showing is the discriminator's loss when dealing with generated images. You want this loss to go up, it means that your model successfully generates images that you discriminator fails to catch (as can be seen in the overall discriminator's accuracy which is at 0.5).



          Another issue, is that you should add some generator regularization in the form of an actual generator loss ("generator objective function"). You can read about the different options in GAN Objective Functions: GANs and Their Variations.



          A final issue that I see is that you are passing the generated images thru a final hyperbolic tangent activation function, and I don't really understand why? The generator in your case is supposed to generate a "believable" CIFAR10 image, which is a 32x32x3 tensor with values in the range [0,255] or [0,1]. Your generator's output has a potential range of [-1,1] (as you state in your code).






          share|improve this answer









          $endgroup$












          • $begingroup$
            Happy 1K! Comments must be at least 15 characters in length.
            $endgroup$
            – Esmailian
            Mar 24 at 8:26






          • 1




            $begingroup$
            Thank you very much :)
            $endgroup$
            – Mark.F
            Mar 24 at 10:15










          • $begingroup$
            Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:54










          • $begingroup$
            2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:58










          • $begingroup$
            I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 12:02













          1












          1








          1





          $begingroup$

          I think that there are several issues with your model:



          First of all - Your generator's loss is not the generator's loss. You have on binary cross-entropy loss function for the discriminator, and you have another binary cross-entropy loss function for the concatenated model whose output is again the discriminator's output (on generated images).
          The "generator loss" you are showing is the discriminator's loss when dealing with generated images. You want this loss to go up, it means that your model successfully generates images that you discriminator fails to catch (as can be seen in the overall discriminator's accuracy which is at 0.5).



          Another issue, is that you should add some generator regularization in the form of an actual generator loss ("generator objective function"). You can read about the different options in GAN Objective Functions: GANs and Their Variations.



          A final issue that I see is that you are passing the generated images thru a final hyperbolic tangent activation function, and I don't really understand why? The generator in your case is supposed to generate a "believable" CIFAR10 image, which is a 32x32x3 tensor with values in the range [0,255] or [0,1]. Your generator's output has a potential range of [-1,1] (as you state in your code).






          share|improve this answer









          $endgroup$



          I think that there are several issues with your model:



          First of all - Your generator's loss is not the generator's loss. You have on binary cross-entropy loss function for the discriminator, and you have another binary cross-entropy loss function for the concatenated model whose output is again the discriminator's output (on generated images).
          The "generator loss" you are showing is the discriminator's loss when dealing with generated images. You want this loss to go up, it means that your model successfully generates images that you discriminator fails to catch (as can be seen in the overall discriminator's accuracy which is at 0.5).



          Another issue, is that you should add some generator regularization in the form of an actual generator loss ("generator objective function"). You can read about the different options in GAN Objective Functions: GANs and Their Variations.



          A final issue that I see is that you are passing the generated images thru a final hyperbolic tangent activation function, and I don't really understand why? The generator in your case is supposed to generate a "believable" CIFAR10 image, which is a 32x32x3 tensor with values in the range [0,255] or [0,1]. Your generator's output has a potential range of [-1,1] (as you state in your code).







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 24 at 8:15









          Mark.FMark.F

          1,0241421




          1,0241421











          • $begingroup$
            Happy 1K! Comments must be at least 15 characters in length.
            $endgroup$
            – Esmailian
            Mar 24 at 8:26






          • 1




            $begingroup$
            Thank you very much :)
            $endgroup$
            – Mark.F
            Mar 24 at 10:15










          • $begingroup$
            Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:54










          • $begingroup$
            2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:58










          • $begingroup$
            I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 12:02
















          • $begingroup$
            Happy 1K! Comments must be at least 15 characters in length.
            $endgroup$
            – Esmailian
            Mar 24 at 8:26






          • 1




            $begingroup$
            Thank you very much :)
            $endgroup$
            – Mark.F
            Mar 24 at 10:15










          • $begingroup$
            Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:54










          • $begingroup$
            2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 11:58










          • $begingroup$
            I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
            $endgroup$
            – Nagabhushan S N
            Mar 24 at 12:02















          $begingroup$
          Happy 1K! Comments must be at least 15 characters in length.
          $endgroup$
          – Esmailian
          Mar 24 at 8:26




          $begingroup$
          Happy 1K! Comments must be at least 15 characters in length.
          $endgroup$
          – Esmailian
          Mar 24 at 8:26




          1




          1




          $begingroup$
          Thank you very much :)
          $endgroup$
          – Mark.F
          Mar 24 at 10:15




          $begingroup$
          Thank you very much :)
          $endgroup$
          – Mark.F
          Mar 24 at 10:15












          $begingroup$
          Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 11:54




          $begingroup$
          Thanks. 1. What I've defined as generator_loss, it is the binary cross entropy between the discriminator output and the desired output, which is 1 while training generator. Now, if my generator is able to fool the discriminator, then discriminator output should be close to 1, right?. So, the bce value should decrease. Right? Also, if you see the first graph where I've used Adam instead of SGD, the loss didn't increase. In that case, the generated images are better. When using SGD, the generated images are noise. Since generator accuracy is 0, the discriminator accuracy of 0.5 doesn't mean much
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 11:54












          $begingroup$
          2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 11:58




          $begingroup$
          2. I'll look into GAN objective functions. I was trying to implement plain DCGAN paper. 3. I'm using tanh function because DC-GAN paper says so. Yes, even though tanh outputs in the range [-1,1], if you see the generate_images function in Trainer.py file, I'm doing this: images = (self.dc_gan.generator_model.predict(noise) + 1) / 2 So, this is a valid Image right? Also, like the first graph showed, even with this, using Adam Optimizer, I'm getting good results. Using SGD is causes this problem. The only change is SGD is used instead of Adam. So, there must be some problem with this.
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 11:58












          $begingroup$
          I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 12:02




          $begingroup$
          I've added some generated images for reference. Please check them as well. Again, thanks a lot for your time and suggestions
          $endgroup$
          – Nagabhushan S N
          Mar 24 at 12:02

















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47875%2fwhy-is-my-generator-loss-function-increasing-with-iterations%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

          Do these cracks on my tires look bad? The Next CEO of Stack OverflowDry rot tire should I replace?Having to replace tiresFishtailed so easily? Bad tires? ABS?Filling the tires with something other than air, to avoid puncture hassles?Used Michelin tires safe to install?Do these tyre cracks necessitate replacement?Rumbling noise: tires or mechanicalIs it possible to fix noisy feathered tires?Are bad winter tires still better than summer tires in winter?Torque converter failure - Related to replacing only 2 tires?Why use snow tires on all 4 wheels on 2-wheel-drive cars?