DNN practice: errors and strange behavior Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?

How to avoid introduction cliches

Double-nominative constructions and “von”

Can you stand up from being prone using Skirmisher outside of your turn?

How can I close the quickfix window and go back to the file I was editing

A strange hotel

How to not starve gigantic beasts

Does Mathematica have an implementation of the Poisson binomial distribution?

"My boss was furious with me and I have been fired" vs. "My boss was furious with me and I was fired"

A Paper Record is What I Hamper

All ASCII characters with a given bit count

What's the difference between using dependency injection with a container and using a service locator?

Why did C use the -> operator instead of reusing the . operator?

Is there really no use for MD5 anymore?

Suing a Police Officer Instead of the Police Department

How to keep bees out of canned beverages?

Tikz positioning above circle exact alignment

As an international instructor, should I openly talk about my accent?

std::unique_ptr of base class holding reference of derived class does not show warning in gcc compiler while naked pointer shows it. Why?

Island of Knights, Knaves and Spies

What is the ongoing value of the Kanban board to the developers as opposed to management

Did the Roman Empire have penal colonies?

Is there metaphorical meaning of "aus der Haft entlassen"?

Will I lose my paid in full property

What is this word supposed to be?



DNN practice: errors and strange behavior



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow do I calculate the delta term of a Convolutional Layer, given the delta terms and weights of the previous Convolutional Layer?How flexible is the link between objective function and output layer activation function?Backpropagation with multiple different activation functionsError in Neural NetworkProperly using activation functions of neural networkhow to optimize the weights of a neural net when feeding it with multiple training samples?Obtaining correctly gradient in neural network of output with respect to input. Is relu a bad option as the activation function?Neural Networks - Back PropogationCan we use ReLU activation function as the output layer's non-linearity?










0












$begingroup$


I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



The test outputs are a sin function and a linear function of the inputs, with no noise.



In short I have two questions:



  1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

  2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



Notes:
The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
Loss is (yModel-y)^2



The entire matlab code is below:



rng('default')
nue = .01;
batchsize = 1;

X = rand(200100,3);
y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
numTestDays=100;

%define hidden layer structure
HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

%run NN machinery
[modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

% predict y for out of sample data
[netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

%plot output
figure;
subplot(1,2,1)
scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
hold all
scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
title('y1 and y1 NN model')

subplot(1,2,2)
scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
hold all
scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
title('y2 and y2 NN model')

figure; plot(losslog(5:end))
xlabel('n'); ylabel('loss'); title('loss of training example n')

%************* functions below **********************

function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
numdata = size(X,1); %num data points
dimIn = size(X,2); %dim of data
dimOut = size(y,2); %num outputs we are modeling
numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

%create and initialize weights
weights = cell(1,numLayers);
rng('default');
for ln = 1:numLayers
if ln == 1
weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
else
weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
end
end

k=0;losslog=[];
for n = batchsize:batchsize:numdata
theseidx = n-batchsize+1:n;
[netValues yhat] = projectforward(X(theseidx,:), weights);
[loss ydelta] = calculateLoss(yhat, y(theseidx,:));
dLdW = calculatePartials(netValues, weights, ydelta);
weights = updateweights(dLdW, weights, nue);
k=k+1; losslog(k)=mean(loss);
end
finalweights=weights;
end

function [netValues yhat] = projectforward(X, weights)
netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
yhat = nan(size(X,1), size(weightsend,2));
for n = 1:size(X,1)
for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
if ln ==1
netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
elseif ln < length(weights)+1
tempvals = netValuesn, ln-1*weightsln-1;
%netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
elseif ln == length(weights)+1
netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
end
end
yhat(n,:) = netValuesn,end;
end
end

function [loss ydelta]= calculateLoss(yhat, y)
ydelta = yhat-y;
loss = sum(ydelta.^2, 2)/2;
end

function dLdW = calculatePartials(netValues, weights, ydelta)
numexamples=size(netValues,1);
dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
delta = cell(numexamples,length(weights));%dVdU .* dLdV
for n = 1:numexamples
for ln = length(weights):-1:1
if ln == length(weights)
dUdWn,ln = netValuesn,ln';
dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
dLdVn,ln = ydelta(n,:);
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
% [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

else
%logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
dUdWn,ln = netValuesn,ln';
%dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
dVdUn,ln = sign(reluvalue); %relu derivative
dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
deltan,ln = dVdUn,ln.*dLdVn,ln;
dLdWn,ln = dUdWn,ln .* deltan,ln ;
% [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
end
end
end
end

function newweights = updateweights(dWdL, weights, nue)
newweights = cell(size(weights));
for ln = 1:length(weights)
for n = 1:size(dWdL,1)
if n==1
meandWdL = dWdLn,ln/size(dWdL,1);
else
meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
end
end
newweightsln = weightsln - meandWdL*nue;
end
end









share|improve this question











$endgroup$
















    0












    $begingroup$


    I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



    The test outputs are a sin function and a linear function of the inputs, with no noise.



    In short I have two questions:



    1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

    2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

    I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



    Notes:
    The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
    Loss is (yModel-y)^2



    The entire matlab code is below:



    rng('default')
    nue = .01;
    batchsize = 1;

    X = rand(200100,3);
    y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
    numTestDays=100;

    %define hidden layer structure
    HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

    %run NN machinery
    [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

    % predict y for out of sample data
    [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

    %plot output
    figure;
    subplot(1,2,1)
    scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
    hold all
    scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
    title('y1 and y1 NN model')

    subplot(1,2,2)
    scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
    hold all
    scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
    title('y2 and y2 NN model')

    figure; plot(losslog(5:end))
    xlabel('n'); ylabel('loss'); title('loss of training example n')

    %************* functions below **********************

    function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
    numdata = size(X,1); %num data points
    dimIn = size(X,2); %dim of data
    dimOut = size(y,2); %num outputs we are modeling
    numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
    layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

    %create and initialize weights
    weights = cell(1,numLayers);
    rng('default');
    for ln = 1:numLayers
    if ln == 1
    weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
    else
    weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
    end
    end

    k=0;losslog=[];
    for n = batchsize:batchsize:numdata
    theseidx = n-batchsize+1:n;
    [netValues yhat] = projectforward(X(theseidx,:), weights);
    [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
    dLdW = calculatePartials(netValues, weights, ydelta);
    weights = updateweights(dLdW, weights, nue);
    k=k+1; losslog(k)=mean(loss);
    end
    finalweights=weights;
    end

    function [netValues yhat] = projectforward(X, weights)
    netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
    yhat = nan(size(X,1), size(weightsend,2));
    for n = 1:size(X,1)
    for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
    if ln ==1
    netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
    elseif ln < length(weights)+1
    tempvals = netValuesn, ln-1*weightsln-1;
    %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
    netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
    elseif ln == length(weights)+1
    netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
    end
    end
    yhat(n,:) = netValuesn,end;
    end
    end

    function [loss ydelta]= calculateLoss(yhat, y)
    ydelta = yhat-y;
    loss = sum(ydelta.^2, 2)/2;
    end

    function dLdW = calculatePartials(netValues, weights, ydelta)
    numexamples=size(netValues,1);
    dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
    dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
    dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
    dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
    delta = cell(numexamples,length(weights));%dVdU .* dLdV
    for n = 1:numexamples
    for ln = length(weights):-1:1
    if ln == length(weights)
    dUdWn,ln = netValuesn,ln';
    dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
    dLdVn,ln = ydelta(n,:);
    deltan,ln = dVdUn,ln.*dLdVn,ln;
    dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
    % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

    else
    %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
    reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
    dUdWn,ln = netValuesn,ln';
    %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
    dVdUn,ln = sign(reluvalue); %relu derivative
    dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
    deltan,ln = dVdUn,ln.*dLdVn,ln;
    dLdWn,ln = dUdWn,ln .* deltan,ln ;
    % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
    end
    end
    end
    end

    function newweights = updateweights(dWdL, weights, nue)
    newweights = cell(size(weights));
    for ln = 1:length(weights)
    for n = 1:size(dWdL,1)
    if n==1
    meandWdL = dWdLn,ln/size(dWdL,1);
    else
    meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
    end
    end
    newweightsln = weightsln - meandWdL*nue;
    end
    end









    share|improve this question











    $endgroup$














      0












      0








      0





      $begingroup$


      I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



      The test outputs are a sin function and a linear function of the inputs, with no noise.



      In short I have two questions:



      1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

      2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

      I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



      Notes:
      The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
      Loss is (yModel-y)^2



      The entire matlab code is below:



      rng('default')
      nue = .01;
      batchsize = 1;

      X = rand(200100,3);
      y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
      numTestDays=100;

      %define hidden layer structure
      HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

      %run NN machinery
      [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

      % predict y for out of sample data
      [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

      %plot output
      figure;
      subplot(1,2,1)
      scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
      hold all
      scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
      title('y1 and y1 NN model')

      subplot(1,2,2)
      scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
      hold all
      scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
      title('y2 and y2 NN model')

      figure; plot(losslog(5:end))
      xlabel('n'); ylabel('loss'); title('loss of training example n')

      %************* functions below **********************

      function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
      numdata = size(X,1); %num data points
      dimIn = size(X,2); %dim of data
      dimOut = size(y,2); %num outputs we are modeling
      numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
      layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

      %create and initialize weights
      weights = cell(1,numLayers);
      rng('default');
      for ln = 1:numLayers
      if ln == 1
      weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
      else
      weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
      end
      end

      k=0;losslog=[];
      for n = batchsize:batchsize:numdata
      theseidx = n-batchsize+1:n;
      [netValues yhat] = projectforward(X(theseidx,:), weights);
      [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
      dLdW = calculatePartials(netValues, weights, ydelta);
      weights = updateweights(dLdW, weights, nue);
      k=k+1; losslog(k)=mean(loss);
      end
      finalweights=weights;
      end

      function [netValues yhat] = projectforward(X, weights)
      netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
      yhat = nan(size(X,1), size(weightsend,2));
      for n = 1:size(X,1)
      for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
      if ln ==1
      netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
      elseif ln < length(weights)+1
      tempvals = netValuesn, ln-1*weightsln-1;
      %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
      netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
      elseif ln == length(weights)+1
      netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
      end
      end
      yhat(n,:) = netValuesn,end;
      end
      end

      function [loss ydelta]= calculateLoss(yhat, y)
      ydelta = yhat-y;
      loss = sum(ydelta.^2, 2)/2;
      end

      function dLdW = calculatePartials(netValues, weights, ydelta)
      numexamples=size(netValues,1);
      dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
      dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
      dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
      dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
      delta = cell(numexamples,length(weights));%dVdU .* dLdV
      for n = 1:numexamples
      for ln = length(weights):-1:1
      if ln == length(weights)
      dUdWn,ln = netValuesn,ln';
      dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
      dLdVn,ln = ydelta(n,:);
      deltan,ln = dVdUn,ln.*dLdVn,ln;
      dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
      % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

      else
      %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
      reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
      dUdWn,ln = netValuesn,ln';
      %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
      dVdUn,ln = sign(reluvalue); %relu derivative
      dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
      deltan,ln = dVdUn,ln.*dLdVn,ln;
      dLdWn,ln = dUdWn,ln .* deltan,ln ;
      % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
      end
      end
      end
      end

      function newweights = updateweights(dWdL, weights, nue)
      newweights = cell(size(weights));
      for ln = 1:length(weights)
      for n = 1:size(dWdL,1)
      if n==1
      meandWdL = dWdLn,ln/size(dWdL,1);
      else
      meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
      end
      end
      newweightsln = weightsln - meandWdL*nue;
      end
      end









      share|improve this question











      $endgroup$




      I've built a neural net for regression, with stochastic updates, for practice (shared below). It's having trouble modeling test data if more than one hidden layer is used.



      The test outputs are a sin function and a linear function of the inputs, with no noise.



      In short I have two questions:



      1. define nue = .01 and HiddenLayers = [25] (one hidden layer w/ 25 nodes), the loss goes down very sharply around n=100,000, after spending the a lot of time going nowhere - I can't think of why it might behave that way, rather than trend down more consistently.

      2. When I add a second layer (define HiddenLayers = [5 25] for example), the NN will predict a constant value for all inputs.

      I hope the machinery is correct, it does seem to give reasonable results with a single hidden layer, but this could all be the result of a coding error.



      Notes:
      The hidden layers have a relu activation function, while the final layer has no activation function (ie activationf(x) = x).
      Loss is (yModel-y)^2



      The entire matlab code is below:



      rng('default')
      nue = .01;
      batchsize = 1;

      X = rand(200100,3);
      y = [sin(X*[1;0;0]*7) X*[3;2;1]]; %out1 = sin(linear combination of inputs), out2 = linear combination of inputs
      numTestDays=100;

      %define hidden layer structure
      HiddenLayers = [25]; %for example, [5 10 20] would denote 3 hidden layers, with 5, 10 and 20 neurons respectively

      %run NN machinery
      [modelNNweights losslog] = trainStochasticNN(X(1:end-100,:), y(1:end-100,:), HiddenLayers, nue, batchsize);

      % predict y for out of sample data
      [netValues yhat] = projectforward(X(end-100:end,:), modelNNweights);

      %plot output
      figure;
      subplot(1,2,1)
      scatter(X(end-100:end,:)*[1;0;0]*7, [y(end-100:end,1)])
      hold all
      scatter(X(end-100:end,:)*[1;0;0]*7,yhat(:,1))
      title('y1 and y1 NN model')

      subplot(1,2,2)
      scatter(X(end-100:end,:)*[3;2;1], [y(end-100:end,2)])
      hold all
      scatter(X(end-100:end,:)*[3;2;1],yhat(:,2))
      title('y2 and y2 NN model')

      figure; plot(losslog(5:end))
      xlabel('n'); ylabel('loss'); title('loss of training example n')

      %************* functions below **********************

      function [finalweights losslog]= trainStochasticNN(X,y, HiddenLayers, nue, batchsize)
      numdata = size(X,1); %num data points
      dimIn = size(X,2); %dim of data
      dimOut = size(y,2); %num outputs we are modeling
      numLayers = length(HiddenLayers)+1; %hidden layers + 1 output layer
      layerNumNeurons = [HiddenLayers, dimOut]; %hidden layers + output layer

      %create and initialize weights
      weights = cell(1,numLayers);
      rng('default');
      for ln = 1:numLayers
      if ln == 1
      weightsln = rand(dimIn+1, layerNumNeurons(ln)); %+1 for bias term
      else
      weightsln = rand(layerNumNeurons(ln-1)+1,layerNumNeurons(ln)); %+1 for bias term
      end
      end

      k=0;losslog=[];
      for n = batchsize:batchsize:numdata
      theseidx = n-batchsize+1:n;
      [netValues yhat] = projectforward(X(theseidx,:), weights);
      [loss ydelta] = calculateLoss(yhat, y(theseidx,:));
      dLdW = calculatePartials(netValues, weights, ydelta);
      weights = updateweights(dLdW, weights, nue);
      k=k+1; losslog(k)=mean(loss);
      end
      finalweights=weights;
      end

      function [netValues yhat] = projectforward(X, weights)
      netValues = cell(size(X,1), length(weights)+1,1); %+1 since netVales(1) contains data inputs
      yhat = nan(size(X,1), size(weightsend,2));
      for n = 1:size(X,1)
      for ln = 1:length(weights)+1 %for layernumber, datainput-layer to output-layer
      if ln ==1
      netValuesn, ln = [1 X(n,:)]; % add bias to inputs, this in values in layer 1, normally denoted layer 0
      elseif ln < length(weights)+1
      tempvals = netValuesn, ln-1*weightsln-1;
      %netValuesn, ln = [1 1./(1+exp(-tempvals))]; %activation is logistical
      netValuesn, ln = [1 max(0, tempvals)]; % activation is relu(x)
      elseif ln == length(weights)+1
      netValuesn, ln = netValuesn, ln-1*weightsln-1; %last layer activationf(x) = x
      end
      end
      yhat(n,:) = netValuesn,end;
      end
      end

      function [loss ydelta]= calculateLoss(yhat, y)
      ydelta = yhat-y;
      loss = sum(ydelta.^2, 2)/2;
      end

      function dLdW = calculatePartials(netValues, weights, ydelta)
      numexamples=size(netValues,1);
      dLdW = cell(numexamples,length(weights)); %dLoss/dWeights
      dLdV = cell(numexamples,length(weights)); %dLoss/dNodeOutput
      dVdU = cell(numexamples,length(weights)); %dNodeOutput/dNodeInput (derivative of activation function)
      dUdW = cell(numexamples,length(weights)); %dNodeInput/dWeights
      delta = cell(numexamples,length(weights));%dVdU .* dLdV
      for n = 1:numexamples
      for ln = length(weights):-1:1
      if ln == length(weights)
      dUdWn,ln = netValuesn,ln';
      dVdUn,ln = ones(size(netValuesn,ln+1)); %d/dx f(x), where f(x)=x in the output layer
      dLdVn,ln = ydelta(n,:);
      deltan,ln = dVdUn,ln.*dLdVn,ln;
      dLdWn,ln = dUdWn,ln.*deltan,ln ; %using L = (yhat-y)^2/2 and linear activation function
      % [ size(dLdVn,ln) size(dVdUn,ln) size(dUdWn,ln) size(deltan,ln) size( dLdWn,ln)]

      else
      %logisticvalue = 1./(1+exp(-netValuesn,ln+1(2:end))); %logistic activation function
      reluvalue = max(0,netValuesn,ln+1(2:end)); %start from index2 because index1 is the bias one level up which has no effect downstream
      dUdWn,ln = netValuesn,ln';
      %dVdUn,ln = logisticvalue.*(1-logisticvalue); %logistic derivative
      dVdUn,ln = sign(reluvalue); %relu derivative
      dLdVn,ln = (weightsln+1(2:end,:) * deltan,ln+1')'; %start from index2 because index1 has holds the weight for the bias one level up
      deltan,ln = dVdUn,ln.*dLdVn,ln;
      dLdWn,ln = dUdWn,ln .* deltan,ln ;
      % [ size(dUdWn,ln) size(dVdUn,ln) size(dLdVn,ln) size(weightsln+1(2:end,:)) size(deltan,ln+1) size( dLdWn,ln)]
      end
      end
      end
      end

      function newweights = updateweights(dWdL, weights, nue)
      newweights = cell(size(weights));
      for ln = 1:length(weights)
      for n = 1:size(dWdL,1)
      if n==1
      meandWdL = dWdLn,ln/size(dWdL,1);
      else
      meandWdL = meandWdL + dWdLn,ln/size(dWdL,1); %average dWdL over all training examples in this batch
      end
      end
      newweightsln = weightsln - meandWdL*nue;
      end
      end






      neural-network matlab






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 7 at 14:36









      Tasos

      1,64011138




      1,64011138










      asked Apr 6 at 5:19









      DKreitzmanDKreitzman

      12




      12




















          0






          active

          oldest

          votes












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "557"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48725%2fdnn-practice-errors-and-strange-behavior%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Adding axes to figuresAdding axes labels to LaTeX figuresLaTeX equivalent of ConTeXt buffersRotate a node but not its content: the case of the ellipse decorationHow to define the default vertical distance between nodes?TikZ scaling graphic and adjust node position and keep font sizeNumerical conditional within tikz keys?adding axes to shapesAlign axes across subfiguresAdding figures with a certain orderLine up nested tikz enviroments or how to get rid of themAdding axes labels to LaTeX figures

          Luettelo Yhdysvaltain laivaston lentotukialuksista Lähteet | Navigointivalikko

          Gary (muusikko) Sisällysluettelo Historia | Rockin' High | Lähteet | Aiheesta muualla | NavigointivalikkoInfobox OKTuomas "Gary" Keskinen Ancaran kitaristiksiProjekti Rockin' High