Why does my minimal CNN example show strongly fluctuating validation loss?2019 Community Moderator ElectionConvolution Neural Network Loss and performanceTensorflow regression predicting 1 for all inputsKeras stateful LSTM returns NaN for validation lossHow to add non-image features along side images as the input of CNNskeras categorical CNN prediction always output [[ 0. 0. 0. 0. 0. 0. 1. 0.]] No probability for other classes?ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Value error in Merging two different models in kerasValue of loss and accuracy does not change over EpochsHow can I solve error “allocation exceeds 10 of system memory” on keras?Stop CNN model at high accuracy and low loss rate?

when is out of tune ok?

Applicability of Single Responsibility Principle

Teaching indefinite integrals that require special-casing

What are the ramifications of creating a homebrew world without an Astral Plane?

Failed to fetch jessie backports repository

Print name if parameter passed to function

What is the oldest known work of fiction?

There is only s̶i̶x̶t̶y one place he can be

Minimal reference content

What is the opposite of 'gravitas'?

Do I need a multiple entry visa for a trip UK -> Sweden -> UK?

The plural of 'stomach"

Will it be accepted, if there is no ''Main Character" stereotype?

How do I keep an essay about "feeling flat" from feeling flat?

Tiptoe or tiphoof? Adjusting words to better fit fantasy races

Can a monster with multiattack use this ability if they are missing a limb?

Are there any comparative studies done between Ashtavakra Gita and Buddhim?

Implement the Thanos sorting algorithm

At which point does a character regain all their Hit Dice?

Was Spock the First Vulcan in Starfleet?

How does residential electricity work?

Was the picture area of a CRT a parallelogram (instead of a true rectangle)?

Go Pregnant or Go Home

What defines a dissertation?



Why does my minimal CNN example show strongly fluctuating validation loss?



2019 Community Moderator ElectionConvolution Neural Network Loss and performanceTensorflow regression predicting 1 for all inputsKeras stateful LSTM returns NaN for validation lossHow to add non-image features along side images as the input of CNNskeras categorical CNN prediction always output [[ 0. 0. 0. 0. 0. 0. 1. 0.]] No probability for other classes?ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Value error in Merging two different models in kerasValue of loss and accuracy does not change over EpochsHow can I solve error “allocation exceeds 10 of system memory” on keras?Stop CNN model at high accuracy and low loss rate?










2












$begingroup$


I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner. enter image description here



enter image description here



I am trying to use as many high level functions as possible and have ended up with the following code:



from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense


inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128

# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output

# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb

x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)


# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)

# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)

train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')

# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)


I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!



EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify



EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers










share|improve this question









New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
    $endgroup$
    – kbrose
    Mar 21 at 13:20










  • $begingroup$
    As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:40










  • $begingroup$
    Could be. I don’t use generators for validation data so I’m not sure if the specifics
    $endgroup$
    – kbrose
    Mar 21 at 16:19















2












$begingroup$


I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner. enter image description here



enter image description here



I am trying to use as many high level functions as possible and have ended up with the following code:



from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense


inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128

# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output

# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb

x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)


# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)

# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)

train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')

# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)


I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!



EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify



EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers










share|improve this question









New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$











  • $begingroup$
    Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
    $endgroup$
    – kbrose
    Mar 21 at 13:20










  • $begingroup$
    As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:40










  • $begingroup$
    Could be. I don’t use generators for validation data so I’m not sure if the specifics
    $endgroup$
    – kbrose
    Mar 21 at 16:19













2












2








2





$begingroup$


I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner. enter image description here



enter image description here



I am trying to use as many high level functions as possible and have ended up with the following code:



from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense


inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128

# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output

# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb

x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)


# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)

# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)

train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')

# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)


I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!



EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify



EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers










share|improve this question









New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner. enter image description here



enter image description here



I am trying to use as many high level functions as possible and have ended up with the following code:



from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense


inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128

# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output

# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb

x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)


# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)

# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)

train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')

validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')

# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)


I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!



EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify



EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers







python neural-network keras cnn






share|improve this question









New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Mar 22 at 9:17







Djaik Navidson













New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Mar 21 at 11:27









Djaik NavidsonDjaik Navidson

133




133




New contributor




Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Djaik Navidson is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • $begingroup$
    Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
    $endgroup$
    – kbrose
    Mar 21 at 13:20










  • $begingroup$
    As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:40










  • $begingroup$
    Could be. I don’t use generators for validation data so I’m not sure if the specifics
    $endgroup$
    – kbrose
    Mar 21 at 16:19
















  • $begingroup$
    Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
    $endgroup$
    – kbrose
    Mar 21 at 13:20










  • $begingroup$
    As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:40










  • $begingroup$
    Could be. I don’t use generators for validation data so I’m not sure if the specifics
    $endgroup$
    – kbrose
    Mar 21 at 16:19















$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20




$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20












$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40




$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40












$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19




$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19










1 Answer
1






active

oldest

votes


















0












$begingroup$

There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.



In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.



When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).



Suggestions:



I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.



You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.






share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:36






  • 1




    $begingroup$
    Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
    $endgroup$
    – Djaik Navidson
    Mar 22 at 9:15










Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47735%2fwhy-does-my-minimal-cnn-example-show-strongly-fluctuating-validation-loss%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.



In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.



When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).



Suggestions:



I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.



You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.






share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:36






  • 1




    $begingroup$
    Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
    $endgroup$
    – Djaik Navidson
    Mar 22 at 9:15















0












$begingroup$

There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.



In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.



When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).



Suggestions:



I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.



You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.






share|improve this answer









$endgroup$








  • 1




    $begingroup$
    Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:36






  • 1




    $begingroup$
    Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
    $endgroup$
    – Djaik Navidson
    Mar 22 at 9:15













0












0








0





$begingroup$

There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.



In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.



When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).



Suggestions:



I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.



You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.






share|improve this answer









$endgroup$



There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.



In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.



When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).



Suggestions:



I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.



You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 21 at 12:46









Mark.FMark.F

1,0191421




1,0191421







  • 1




    $begingroup$
    Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:36






  • 1




    $begingroup$
    Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
    $endgroup$
    – Djaik Navidson
    Mar 22 at 9:15












  • 1




    $begingroup$
    Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
    $endgroup$
    – Djaik Navidson
    Mar 21 at 13:36






  • 1




    $begingroup$
    Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
    $endgroup$
    – Djaik Navidson
    Mar 22 at 9:15







1




1




$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36




$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36




1




1




$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15




$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15










Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.









draft saved

draft discarded


















Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.












Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.











Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.














Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47735%2fwhy-does-my-minimal-cnn-example-show-strongly-fluctuating-validation-loss%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High