Why does my minimal CNN example show strongly fluctuating validation loss?2019 Community Moderator ElectionConvolution Neural Network Loss and performanceTensorflow regression predicting 1 for all inputsKeras stateful LSTM returns NaN for validation lossHow to add non-image features along side images as the input of CNNskeras categorical CNN prediction always output [[ 0. 0. 0. 0. 0. 0. 1. 0.]] No probability for other classes?ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Value error in Merging two different models in kerasValue of loss and accuracy does not change over EpochsHow can I solve error “allocation exceeds 10 of system memory” on keras?Stop CNN model at high accuracy and low loss rate?
when is out of tune ok?
Applicability of Single Responsibility Principle
Teaching indefinite integrals that require special-casing
What are the ramifications of creating a homebrew world without an Astral Plane?
Failed to fetch jessie backports repository
Print name if parameter passed to function
What is the oldest known work of fiction?
There is only s̶i̶x̶t̶y one place he can be
Minimal reference content
What is the opposite of 'gravitas'?
Do I need a multiple entry visa for a trip UK -> Sweden -> UK?
The plural of 'stomach"
Will it be accepted, if there is no ''Main Character" stereotype?
How do I keep an essay about "feeling flat" from feeling flat?
Tiptoe or tiphoof? Adjusting words to better fit fantasy races
Can a monster with multiattack use this ability if they are missing a limb?
Are there any comparative studies done between Ashtavakra Gita and Buddhim?
Implement the Thanos sorting algorithm
At which point does a character regain all their Hit Dice?
Was Spock the First Vulcan in Starfleet?
How does residential electricity work?
Was the picture area of a CRT a parallelogram (instead of a true rectangle)?
Go Pregnant or Go Home
What defines a dissertation?
Why does my minimal CNN example show strongly fluctuating validation loss?
2019 Community Moderator ElectionConvolution Neural Network Loss and performanceTensorflow regression predicting 1 for all inputsKeras stateful LSTM returns NaN for validation lossHow to add non-image features along side images as the input of CNNskeras categorical CNN prediction always output [[ 0. 0. 0. 0. 0. 0. 1. 0.]] No probability for other classes?ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Value error in Merging two different models in kerasValue of loss and accuracy does not change over EpochsHow can I solve error “allocation exceeds 10 of system memory” on keras?Stop CNN model at high accuracy and low loss rate?
$begingroup$
I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner.
I am trying to use as many high level functions as possible and have ended up with the following code:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense
inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128
# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output
# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb
x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)
# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)
train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')
# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)
I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!
EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify
EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers
python neural-network keras cnn
New contributor
$endgroup$
add a comment |
$begingroup$
I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner.
I am trying to use as many high level functions as possible and have ended up with the following code:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense
inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128
# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output
# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb
x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)
# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)
train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')
# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)
I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!
EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify
EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers
python neural-network keras cnn
New contributor
$endgroup$
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19
add a comment |
$begingroup$
I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner.
I am trying to use as many high level functions as possible and have ended up with the following code:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense
inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128
# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output
# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb
x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)
# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)
train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')
# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)
I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!
EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify
EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers
python neural-network keras cnn
New contributor
$endgroup$
I'm fairly new at working with neural networks and think I am making some basic mistake. I am trying to assign simulated images to 5 classes to test, what (if any) networks are helpful for a large data problem we have in our group. I am training the ResNet50 CNN included in Keras on 40000 (256 by 256 pixel greyscale) images. While training loss is improving quickly within the first epoch, validation loss fluctuates wildly in (to me) fairly random manner.
I am trying to use as many high level functions as possible and have ended up with the following code:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.layers import GlobalAveragePooling2D, Dense
inputShape = (256, 256,1)
targetSize = (256, 256 )
batchSize = 128
# First, load resnet
base_model = ResNet50(include_top =False,
weights =None,
input_shape=inputShape)
x = base_model.output
# modified based on the article https://github.com/priya-dwivedi/
# Deep-Learning/blob/master/resnet_keras/Residual_Networks_yourself.ipynb
x = GlobalAveragePooling2D()(x)
predictions = Dense(5, activation= 'softmax')(x)
model = Model(inputs = base_model.input, outputs = predictions)
# Compile model, might want to change loss and metrics
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics = ["acc"]
)
# Define the generators to read the data. Training generator performs data argumentation as well
train_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(
samplewise_center =True,
samplewise_std_normalization = True,
)
train_generator = train_datagen.flow_from_directory(
'trainingData/FolderBasedTrain256',
target_size=targetSize,
color_mode = 'grayscale',
batch_size=batchSize,
save_to_dir = 'trainingData/argumentedImages',
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
'trainingData/FolderBasedValidation256',
target_size=targetSize,
batch_size=batchSize,
color_mode = 'grayscale',
class_mode='categorical')
# Fit the Modell, saving the history for later plotting
history = model.fit_generator(
train_generator,
epochs=20,
validation_data=validation_generator,
#steps_per_epoch = 62,
steps_per_epoch = 31,
#callbacks=[checkpointer],
validation_steps=9)
I can always create more images or train longer, but to me, this looks as if something fundamental went wrong somewhere. I would be very gratefull for any and all ideas. Thank you!
EDIT: I was told to stress that validation and training set are both created by exaclty the same simulation routine, so they should be relativly easy to classify
EDIT2: Found the error! My batch size did not fit with the amount of data and the steps by epoch, resulting in the CNN not seeing all the training data. Now everything converges nicely and I can evaluate the choice of modell. Thanks to all contributers
python neural-network keras cnn
python neural-network keras cnn
New contributor
New contributor
edited Mar 22 at 9:17
Djaik Navidson
New contributor
asked Mar 21 at 11:27
Djaik NavidsonDjaik Navidson
133
133
New contributor
New contributor
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19
add a comment |
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.
In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
Suggestions:
I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.
You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.
$endgroup$
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47735%2fwhy-does-my-minimal-cnn-example-show-strongly-fluctuating-validation-loss%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.
In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
Suggestions:
I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.
You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.
$endgroup$
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
add a comment |
$begingroup$
There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.
In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
Suggestions:
I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.
You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.
$endgroup$
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
add a comment |
$begingroup$
There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.
In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
Suggestions:
I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.
You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.
$endgroup$
There is nothing fundamentally wrong with your code, but maybe your model is not right for your current toy-problem.
In general, this is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "improving quickly". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
Suggestions:
I think that your model is an over-kill for a 256x256 grayscale dataset with just 5 classes (Resnet was designed for ImageNet which contains RGB images from 1000 categories). In current state, the model finds its very easy to memorize the training set and overfit. You should look for models that are meant to be used for MNIST, or at most CIFAR10.
You can insist on this model and attempt to increase regularization, but I'm not sure if it will be enough to prevent the model from overfitting in this case.
answered Mar 21 at 12:46
Mark.FMark.F
1,0191421
1,0191421
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
add a comment |
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
1
1
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
$begingroup$
Thank you very much! I chose a bigger network because the data is fairly complicated (Im trying to predict electron distribution types from scattering images that vary wildly with orientation) but I will try with a smaller net as per your suggestion. Thank you again, I will report back with the results and mark the questions as solved if it works!
$endgroup$
– Djaik Navidson
Mar 21 at 13:36
1
1
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
$begingroup$
Thank you again! I got it to work by telling it to actually look at ALL my images (my batch size was too low). Now everything converges as I expected. The overfitting issue you brought up might still be relevant however, I will now evaluate the model on some new data and see how well it holds up compared to something simpler. Thank you again for your time!
$endgroup$
– Djaik Navidson
Mar 22 at 9:15
add a comment |
Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.
Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.
Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.
Djaik Navidson is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47735%2fwhy-does-my-minimal-cnn-example-show-strongly-fluctuating-validation-loss%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Are you computing validation metrics on only batchSize examples? Maybe you should use more examples to get a more robust estimate of those metrics?
$endgroup$
– kbrose
Mar 21 at 13:20
$begingroup$
As far I understand, "validation_steps=9" in model.fit_generator should make it so that the entire validation set is iterated over. But I might be wrong there, i will look into it
$endgroup$
– Djaik Navidson
Mar 21 at 13:40
$begingroup$
Could be. I don’t use generators for validation data so I’m not sure if the specifics
$endgroup$
– kbrose
Mar 21 at 16:19