Recently I tried to control machine learning better from an implementation point, not statistics. I have read several explanations of an implementation of a neural network through a pseudocode and this is the result: a toy neural network.

I have used several sources from medium.com and towardsdatascience.com (if it is necessary to list all sources, I will make an edition).

I originally created a naive custom Matrix class with matrix multiplication O (N ^ 3), but deleted to use ujmp.

I used the Matrix ujmp.org class for faster matrix multiplication, but due to my lack of understanding of how to use accelerations, I think

This is the final code. Please comment and suggest improvements! Thank you. I will include SGD, backpropagation, advance and calculation of mini lots. Backprop is private due to a wrap method called train.

The NetworkInput class is a container for attributes and a DenseMatrix tag.

All functions here are interfaces for activation functions, functions for evaluating test data and calculating losses and errors.

This is the SGD.

```
/**
* Provides an implementation of SGD for this neural network.
*
* @param training a Collections object with {@link NetworkInput }objects,
* NetworkInput.getData() is the data, NetworkInput.getLabel()is the label.
* @param test a Collections object with {@link NetworkInput} objects,
* NetworkInput.getData() is the data, NetworkInput.getLabel is the label.
* @param epochs how many iterations are we doing SGD for
* @param batchSize how big is the batch size, typically 32. See https://stats.stackexchange.com/q/326663
*/
public void stochasticGradientDescent(@NotNull List training,
@NotNull List test,
int epochs,
int batchSize) {
int trDataSize = training.size();
int teDataSize = test.size();
for (int i = 0; i < epochs; i++) {
// Randomize training sample.
Collections.shuffle(training);
System.out.println("Calculating epoch: " + (i + 1) + ".");
// Do backpropagation.
for (int j = 0; j < trDataSize - batchSize; j += batchSize) {
calculateMiniBatch(training.subList(j, j + batchSize));
}
// Feed forward the test data
List feedForwardData = this.feedForwardData(test);
// Evaluate prediction with the interface EvaluationFunction.
int correct = this.evaluationFunction.evaluatePrediction(feedForwardData).intValue();
// Calculate loss with the interface ErrorFunction
double loss = errorFunction.calculateCostFunction(feedForwardData);
// Add the plotting data, x, y_1, y_2 to the global
// lists of xValues, correctValues, lossValues.
addPlotData(i, correct, loss);
System.out.println("Loss: " + loss);
System.out.println("Epoch " + (i + 1) + ": " + correct + "/" + teDataSize);
// Lower learning rate each iteration?. Might implement? Don't know how to.
// ADAM? Is that here? Are they different algorithms all together?
// TODO: Implement Adam, RMSProp, Momentum?
// this.learningRate = i % 10 == 0 ? this.learningRate / 4 : this.learningRate;
}
}
```

Here we calculate the mini lots and update our weights with an average.

```
private void calculateMiniBatch(List subList) {
int size = subList.size();
double scaleFactor = this.learningRate / size;
DenseMatrix() dB = new DenseMatrix(this.totalLayers - 1);
DenseMatrix() dW = new DenseMatrix(this.totalLayers - 1);
for (int i = 0; i < this.totalLayers - 1; i++) {
DenseMatrix bias = getBias(i);
DenseMatrix weight = getWeight(i);
dB(i) = Matrix.Factory.zeros(bias.getRowCount(), bias.getColumnCount());
dW(i) = Matrix.Factory
.zeros(weight.getRowCount(), weight.getColumnCount());
}
for (NetworkInput data : subList) {
DenseMatrix dataIn = data.getData();
DenseMatrix label = data.getLabel();
List deltas = backPropagate(dataIn, label);
DenseMatrix() deltaB = deltas.get(0);
DenseMatrix() deltaW = deltas.get(1);
for (int j = 0; j < this.totalLayers - 1; j++) {
dB(j) = (DenseMatrix) dB(j).plus(deltaB(j));
dW(j) = (DenseMatrix) dW(j).plus(deltaW(j));
}
}
for (int i = 0; i < this.totalLayers - 1; i++) {
DenseMatrix cW = getWeight(i);
DenseMatrix cB = getBias(i);
DenseMatrix scaledDeltaB = (DenseMatrix) dB(i).times(scaleFactor);
DenseMatrix scaledDeltaW = (DenseMatrix) dW(i).times(scaleFactor);
DenseMatrix nW = (DenseMatrix) cW.minus(scaledDeltaW);
DenseMatrix nB = (DenseMatrix) cB.minus(scaledDeltaB);
setWeight(i, nW);
setLayerBias(i, nB);
}
}
```

This is the backward propagation algorithm.

```
private List backPropagate(DenseMatrix toPredict, DenseMatrix correct) {
List totalDeltas = new ArrayList<>();
DenseMatrix() weights = getWeights();
DenseMatrix() biases = getBiasesAsMatrices();
DenseMatrix() deltaBiases = this.initializeDeltas(biases);
DenseMatrix() deltaWeights = this.initializeDeltas(weights);
// Perform Feed Forward here...
List activations = new ArrayList<>();
List xVector = new ArrayList<>();
// Alters all arrays and lists.
this.backPropFeedForward(toPredict, activations, xVector, weights, biases);
// End feedforward
// Calculate error signal for last layer
DenseMatrix deltaError;
// Applies the error function to the last layer, create
deltaError = errorFunction
.applyErrorFunctionGradient(activations.get(activations.size() - 1), correct);
// Set the deltas to the error signals of bias and weight.
deltaBiases(deltaBiases.length - 1) = deltaError;
deltaWeights(deltaWeights.length - 1) = (DenseMatrix) deltaError
.mtimes(activations.get(activations.size() - 2).transpose());
// Now iteratively apply the rule
for (int k = deltaBiases.length - 2; k >= 0; k--) {
DenseMatrix z = xVector.get(k);
DenseMatrix differentiate = functions(k + 1).applyDerivative(z);
deltaError = (DenseMatrix) weights(k + 1).transpose().mtimes(deltaError)
.times(differentiate);
deltaBiases(k) = deltaError;
deltaWeights(k) = (DenseMatrix) deltaError.mtimes(activations.get(k).transpose());
}
totalDeltas.add(deltaBiases);
totalDeltas.add(deltaWeights);
return totalDeltas;
}
```

**EDIT**

I forgot to include the advance algorithm.

```
private void backPropFeedForward(DenseMatrix starter, List actives,
List vectors,
DenseMatrix() weights, DenseMatrix() biases) {
DenseMatrix toPredict = starter;
//actives.add(toPredict);
actives.add(Matrix.Factory.zeros(starter.getRowCount(), starter.getColumnCount()));
for (int i = 0; i < getTotalLayers() - 1; i++) {
DenseMatrix x = (DenseMatrix) weights(i).mtimes(toPredict).plus(biases(i));
vectors.add(x);
toPredict = this.functions(i + 1).applyFunction(x);
actives.add(toPredict);
}
}
```