DNN practice: errors and strange behavior Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?
How to avoid introduction cliches
Double-nominative constructions and “von”
Can you stand up from being prone using Skirmisher outside of your turn?
How can I close the quickfix window and go back to the file I was editing
A strange hotel
How to not starve gigantic beasts
Does Mathematica have an implementation of the Poisson binomial distribution?
"My boss was furious with me and I have been fired" vs. "My boss was furious with me and I was fired"
A Paper Record is What I Hamper
All ASCII characters with a given bit count
What's the difference between using dependency injection with a container and using a service locator?
Why did C use the -> operator instead of reusing the . operator?
Is there really no use for MD5 anymore?
Suing a Police Officer Instead of the Police Department
How to keep bees out of canned beverages?
Tikz positioning above circle exact alignment
As an international instructor, should I openly talk about my accent?
std::unique_ptr of base class holding reference of derived class does not show warning in gcc compiler while naked pointer shows it. Why?
Island of Knights, Knaves and Spies
What is the ongoing value of the Kanban board to the developers as opposed to management
Did the Roman Empire have penal colonies?
Is there metaphorical meaning of "aus der Haft entlassen"?
Will I lose my paid in full property
What is this word supposed to be?
DNN practice: errors and strange behavior
Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=[];
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weightsend,2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValuesn, ln-1*weightsln-1;
%netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
end
end
yhat(n,:) = netValuesn,end;
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdWn,ln = netValuesn,ln';
dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
dLdVn,ln = ydelta(n,:);
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]
else
%logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdWn,ln = netValuesn,ln';
%dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdUn,ln = sign(reluvalue); %relu derivative
dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln .* deltan,ln ;
% [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdLn,ln/size(dWdL,1);
else
meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweightsln = weightsln - meandWdL*nue;
end
end
neural-network matlab
$endgroup$
add a comment |
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=[];
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weightsend,2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValuesn, ln-1*weightsln-1;
%netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
end
end
yhat(n,:) = netValuesn,end;
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdWn,ln = netValuesn,ln';
dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
dLdVn,ln = ydelta(n,:);
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]
else
%logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdWn,ln = netValuesn,ln';
%dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdUn,ln = sign(reluvalue); %relu derivative
dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln .* deltan,ln ;
% [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdLn,ln/size(dWdL,1);
else
meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweightsln = weightsln - meandWdL*nue;
end
end
neural-network matlab
$endgroup$
add a comment |
$begingroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=[];
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weightsend,2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValuesn, ln-1*weightsln-1;
%netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
end
end
yhat(n,:) = netValuesn,end;
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdWn,ln = netValuesn,ln';
dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
dLdVn,ln = ydelta(n,:);
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]
else
%logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdWn,ln = netValuesn,ln';
%dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdUn,ln = sign(reluvalue); %relu derivative
dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln .* deltan,ln ;
% [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdLn,ln/size(dWdL,1);
else
meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweightsln = weightsln - meandWdL*nue;
end
end
neural-network matlab
$endgroup$
I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.
The test outputs are a sin function and a linear function of the inputs, with no noise.
In short I have two questions:
- define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.
- When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.
I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.
Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2
The entire matlab code is below:
rng('default')
nue = .01;
batchsize = 1;
X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;
%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively
%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);
% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);
%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')
subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')
figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')
%************* functions below **********************
function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer
%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end
k=0;losslog=[];
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end
function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weightsend,2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValuesn, ln-1*weightsln-1;
%netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
end
end
yhat(n,:) = netValuesn,end;
end
end
function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end
function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdWn,ln = netValuesn,ln';
dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
dLdVn,ln = ydelta(n,:);
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]
else
%logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdWn,ln = netValuesn,ln';
%dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdUn,ln = sign(reluvalue); %relu derivative
dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln .* deltan,ln ;
% [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
end
end
end
end
function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdLn,ln/size(dWdL,1);
else
meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweightsln = weightsln - meandWdL*nue;
end
end
neural-network matlab
neural-network matlab
edited Apr 7 at 14:36
Tasos
1,64011138
1,64011138
asked Apr 6 at 5:19
DKreitzmanDKreitzman
12
12
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown