DNN practice: errors and strange behavior Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?

How to avoid introduction cliches

Double-nominative constructions and “von”

Can you stand up from being prone using Skirmisher outside of your turn?

How can I close the quickfix window and go back to the file I was editing

A strange hotel

How to not starve gigantic beasts

Does Mathematica have an implementation of the Poisson binomial distribution?

"My boss was furious with me and I have been fired" vs. "My boss was furious with me and I was fired"

A Paper Record is What I Hamper

All ASCII characters with a given bit count

What's the difference between using dependency injection with a container and using a service locator?

Why did C use the -> operator instead of reusing the . operator?

Is there really no use for MD5 anymore?

Suing a Police Officer Instead of the Police Department

How to keep bees out of canned beverages?

Tikz positioning above circle exact alignment

As an international instructor, should I openly talk about my accent?

std::unique_ptr of base class holding reference of derived class does not show warning in gcc compiler while naked pointer shows it. Why?

Island of Knights, Knaves and Spies

What is the ongoing value of the Kanban board to the developers as opposed to management

Did the Roman Empire have penal colonies?

Is there metaphorical meaning of "aus der Haft entlassen"?

Will I lose my paid in full property

What is this word supposed to be?

DNN practice: errors and strange behavior

Unicorn Meta Zoo #1: Why another podcast?

Announcing the arrival of Valued Associate #679: Cesar Manara

2019 Moderator Election Q&A - Questionnaire

2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure; 
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
 numdata = size(X,1); %num data points
 dimIn = size(X,2); %dim of data
 dimOut = size(y,2); %num outputs we are modeling 
 numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
 layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

 %create and initialize weights
 weights = cell(1,numLayers);
 rng('default');
 for ln = 1:numLayers
 if ln == 1
 weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
 else
 weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
 end 
 end

 k=0;losslog=[];
 for n = batchsize:batchsize:numdata
 theseidx = n-batchsize+1:n;
 [netValues yhat] = projectforward(X(theseidx,:), weights);
 [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
 dLdW = calculatePartials(netValues, weights, ydelta);
 weights = updateweights(dLdW, weights, nue);
 k=k+1; losslog(k)=mean(loss);
 end
 finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
 netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
 yhat = nan(size(X,1), size(weightsend,2));
 for n = 1:size(X,1)
 for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
 if ln ==1
 netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
 elseif ln < length(weights)+1
 tempvals = netValuesn, ln-1*weightsln-1;
 %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
 netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
 elseif ln == length(weights)+1 
 netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
 end
 end
 yhat(n,:) = netValuesn,end;
 end
end

function [loss ydelta]= calculateLoss(yhat, y)
 ydelta = yhat-y;
 loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
 numexamples=size(netValues,1); 
 dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
 dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
 dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
 dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
 delta = cell(numexamples,length(weights));%dVdU .* dLdV
 for n = 1:numexamples
 for ln = length(weights):-1:1
 if ln == length(weights)
 dUdWn,ln = netValuesn,ln';
 dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
 dLdVn,ln = ydelta(n,:);
 deltan,ln = dVdUn,ln.*dLdVn,ln;
 dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
 % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

 else
 %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function 
 reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
 dUdWn,ln = netValuesn,ln';
 %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
 dVdUn,ln = sign(reluvalue); %relu derivative
 dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
 deltan,ln = dVdUn,ln.*dLdVn,ln; 
 dLdWn,ln = dUdWn,ln .* deltan,ln ; 
 % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
 end 
 end
 end
end

function newweights = updateweights(dWdL, weights, nue)
 newweights = cell(size(weights));
 for ln = 1:length(weights)
 for n = 1:size(dWdL,1)
 if n==1
 meandWdL = dWdLn,ln/size(dWdL,1);
 else
 meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
 end
 end
 newweightsln = weightsln - meandWdL*nue;
 end
end

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

add a comment |

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure; 
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
 numdata = size(X,1); %num data points
 dimIn = size(X,2); %dim of data
 dimOut = size(y,2); %num outputs we are modeling 
 numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
 layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

 %create and initialize weights
 weights = cell(1,numLayers);
 rng('default');
 for ln = 1:numLayers
 if ln == 1
 weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
 else
 weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
 end 
 end

 k=0;losslog=[];
 for n = batchsize:batchsize:numdata
 theseidx = n-batchsize+1:n;
 [netValues yhat] = projectforward(X(theseidx,:), weights);
 [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
 dLdW = calculatePartials(netValues, weights, ydelta);
 weights = updateweights(dLdW, weights, nue);
 k=k+1; losslog(k)=mean(loss);
 end
 finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
 netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
 yhat = nan(size(X,1), size(weightsend,2));
 for n = 1:size(X,1)
 for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
 if ln ==1
 netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
 elseif ln < length(weights)+1
 tempvals = netValuesn, ln-1*weightsln-1;
 %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
 netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
 elseif ln == length(weights)+1 
 netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
 end
 end
 yhat(n,:) = netValuesn,end;
 end
end

function [loss ydelta]= calculateLoss(yhat, y)
 ydelta = yhat-y;
 loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
 numexamples=size(netValues,1); 
 dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
 dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
 dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
 dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
 delta = cell(numexamples,length(weights));%dVdU .* dLdV
 for n = 1:numexamples
 for ln = length(weights):-1:1
 if ln == length(weights)
 dUdWn,ln = netValuesn,ln';
 dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
 dLdVn,ln = ydelta(n,:);
 deltan,ln = dVdUn,ln.*dLdVn,ln;
 dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
 % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

 else
 %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function 
 reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
 dUdWn,ln = netValuesn,ln';
 %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
 dVdUn,ln = sign(reluvalue); %relu derivative
 dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
 deltan,ln = dVdUn,ln.*dLdVn,ln; 
 dLdWn,ln = dUdWn,ln .* deltan,ln ; 
 % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
 end 
 end
 end
end

function newweights = updateweights(dWdL, weights, nue)
 newweights = cell(size(weights));
 for ln = 1:length(weights)
 for n = 1:size(dWdL,1)
 if n==1
 meandWdL = dWdLn,ln/size(dWdL,1);
 else
 meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
 end
 end
 newweightsln = weightsln - meandWdL*nue;
 end
end

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

add a comment |

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure; 
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
 numdata = size(X,1); %num data points
 dimIn = size(X,2); %dim of data
 dimOut = size(y,2); %num outputs we are modeling 
 numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
 layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

 %create and initialize weights
 weights = cell(1,numLayers);
 rng('default');
 for ln = 1:numLayers
 if ln == 1
 weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
 else
 weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
 end 
 end

 k=0;losslog=[];
 for n = batchsize:batchsize:numdata
 theseidx = n-batchsize+1:n;
 [netValues yhat] = projectforward(X(theseidx,:), weights);
 [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
 dLdW = calculatePartials(netValues, weights, ydelta);
 weights = updateweights(dLdW, weights, nue);
 k=k+1; losslog(k)=mean(loss);
 end
 finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
 netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
 yhat = nan(size(X,1), size(weightsend,2));
 for n = 1:size(X,1)
 for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
 if ln ==1
 netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
 elseif ln < length(weights)+1
 tempvals = netValuesn, ln-1*weightsln-1;
 %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
 netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
 elseif ln == length(weights)+1 
 netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
 end
 end
 yhat(n,:) = netValuesn,end;
 end
end

function [loss ydelta]= calculateLoss(yhat, y)
 ydelta = yhat-y;
 loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
 numexamples=size(netValues,1); 
 dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
 dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
 dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
 dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
 delta = cell(numexamples,length(weights));%dVdU .* dLdV
 for n = 1:numexamples
 for ln = length(weights):-1:1
 if ln == length(weights)
 dUdWn,ln = netValuesn,ln';
 dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
 dLdVn,ln = ydelta(n,:);
 deltan,ln = dVdUn,ln.*dLdVn,ln;
 dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
 % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

 else
 %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function 
 reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
 dUdWn,ln = netValuesn,ln';
 %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
 dVdUn,ln = sign(reluvalue); %relu derivative
 dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
 deltan,ln = dVdUn,ln.*dLdVn,ln; 
 dLdWn,ln = dUdWn,ln .* deltan,ln ; 
 % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
 end 
 end
 end
end

function newweights = updateweights(dWdL, weights, nue)
 newweights = cell(size(weights));
 for ln = 1:length(weights)
 for n = 1:size(dWdL,1)
 if n==1
 meandWdL = dWdLn,ln/size(dWdL,1);
 else
 meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
 end
 end
 newweightsln = weightsln - meandWdL*nue;
 end
end

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.

The test outputs are a sin function and a linear function of the inputs, with no noise.

In short I have two questions:

define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.

Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2

The entire matlab code is below:

rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure; 
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
 numdata = size(X,1); %num data points
 dimIn = size(X,2); %dim of data
 dimOut = size(y,2); %num outputs we are modeling 
 numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
 layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

 %create and initialize weights
 weights = cell(1,numLayers);
 rng('default');
 for ln = 1:numLayers
 if ln == 1
 weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
 else
 weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
 end 
 end

 k=0;losslog=[];
 for n = batchsize:batchsize:numdata
 theseidx = n-batchsize+1:n;
 [netValues yhat] = projectforward(X(theseidx,:), weights);
 [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
 dLdW = calculatePartials(netValues, weights, ydelta);
 weights = updateweights(dLdW, weights, nue);
 k=k+1; losslog(k)=mean(loss);
 end
 finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
 netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
 yhat = nan(size(X,1), size(weightsend,2));
 for n = 1:size(X,1)
 for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
 if ln ==1
 netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
 elseif ln < length(weights)+1
 tempvals = netValuesn, ln-1*weightsln-1;
 %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
 netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
 elseif ln == length(weights)+1 
 netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
 end
 end
 yhat(n,:) = netValuesn,end;
 end
end

function [loss ydelta]= calculateLoss(yhat, y)
 ydelta = yhat-y;
 loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
 numexamples=size(netValues,1); 
 dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
 dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
 dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
 dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
 delta = cell(numexamples,length(weights));%dVdU .* dLdV
 for n = 1:numexamples
 for ln = length(weights):-1:1
 if ln == length(weights)
 dUdWn,ln = netValuesn,ln';
 dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
 dLdVn,ln = ydelta(n,:);
 deltan,ln = dVdUn,ln.*dLdVn,ln;
 dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
 % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

 else
 %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function 
 reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
 dUdWn,ln = netValuesn,ln';
 %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
 dVdUn,ln = sign(reluvalue); %relu derivative
 dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
 deltan,ln = dVdUn,ln.*dLdVn,ln; 
 dLdWn,ln = dUdWn,ln .* deltan,ln ; 
 % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
 end 
 end
 end
end

function newweights = updateweights(dWdL, weights, nue)
 newweights = cell(size(weights));
 for ln = 1:length(weights)
 for n = 1:size(dWdL,1)
 if n==1
 meandWdL = dWdLn,ln/size(dWdL,1);
 else
 meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
 end
 end
 newweightsln = weightsln - meandWdL*nue;
 end
end

neural-network matlab

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

edited Apr 7 at 14:36

Tasos

1,64011138

edited Apr 7 at 14:36

Tasos

1,64011138

edited Apr 7 at 14:36

Tasos

1,64011138

asked Apr 6 at 5:19

DKreitzman

asked Apr 6 at 5:19

DKreitzman

asked Apr 6 at 5:19

DKreitzman

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli