Week 5 content of scribble AI
This week's content is probably the most important of all the weeks. Because we will actually talk about "back-propagation" which consolidates how the Neural Network actually learns on a given dataset and is then able to predict what an unknown data might be.
For this week only, we will be giving away the code for the back-propagation portion of the neural network. It will be upto you to make sure all the Neural Network content up until this week is done and is working, only then will you be able to integrate the code given here into your existing project.
If you all remember about weights and biases, they are the defining features of a neural network. Recall the fundamental formula for how neurons pass data to another neuron? The dot product:
np.dot(self.weights, inputs) + self.bias
The inputs we cannot do anything about; however, if we can tweak the weights and biases in such a way so that the neural network is able to predict what a given image is, won't that be awesome?
That is exactly what back-propagation is all about.
- Let's say we are trying to train our NN on the MNIST dataset. This is a dataset of handwritten digits (0 to 9). It contains thousands of images of each digit. Essentially in a numeric form.
- We get the dataset and "train" our nn. This is how backpropagation works:
- X is our MNIST dataset input data and Y is the labels (what digit the X corresponds to)
- We do one "forward-pass" with the data as inputs. This is basically just passing the MNIST dataset as inputs and then calculating the dot product once.
- Now, when training while we pass the MNIST dataset, we also know the Y values. We also pass the Y values to model while training.
- Now, after Pass 1 and generating the outputs, the model predicts what digit the inputs might be. Since this is Pass 1, the predicted output is completely random.
- Then, the model compares. It compares the digit it predicted with the digit that should have been the correct digit. And it calculates a numeric loss.
- Now, the model goes inside each neuron and tries to updates its weights and biases to minimize the loss. You see the vision on this? By trying to minimize the loss, the model will try to update the weights and biases in such a way that it is more accurate when predicting a digit.
- How do we update the weights and biases? We reverse the layers. We are currently at the output layer, we go backwards from the output layer all the way to the first layer (the input layer), updating the weights and biases along the way.
- For pass 2, the output will be slightly closer to the real output. As we keep on doing it, you can imagine that the model will get better in time. Each pass is called an "epoch".
Watch the following video:
https://www.youtube.com/watch?v=w8yWXqWQYmU
Do you remember how we applied the activation function during forward feeding? The ReLU and Softmax activation functions. When doing back-propagation we need to undo those functions. How? Calculus. We take the derivative of those functions and apply them. This is the code for the Neuron level class. You can directly copy them.
class Neuron:
def init(self, num_inputs, activation_function):
self.num_inputs = num_inputs
self.activation_function = activation_function
self.weights = np.random.randn(num_inputs) * np.sqrt(2. / num_inputs) # He initialization
self.bias = np.random.randn()
self.output = 0
self.inputs = None
self.d_weights = None
self.d_bias = None
self.delta = 0
# Have your code from previous weeks
def backward(self, delta, learning_rate):
if self.activation_function == "relu":
delta *= relu_derivative(self.output)
self.d_weights = delta * self.inputs
self.d_bias = delta
self.delta = delta
return np.dot(delta, self.weights)
def update(self, learning_rate):
self.weights -= learning_rate * self.d_weights
self.bias -= learning_rate * self.d_bias
Analyzing:
We set the self.d_weights and self.d_bias which are the derivatives with respect to weights and biases. The self.delta variable keeps track of a small change in the derivative of the ReLU function of the Neurons. For self.weights We use He initialization to randomly initialize the weights. Make note of this change in your version of the code.
In the backward function, we update the derivatives with respect to the weights and the biases by multiplying them with the small delta. Why exactly? It is hard to explain but the video I linked should cover the math behind it.
Lastly, the update function actually updates the self.weights and self.bias by multiplying the derivatives with a learning rate. The learning rate essentially controls how fast or slow the NN learns.
Just like how we cascade the "forward" function, we also do the same for backprop. We also update the Layer class with a backward and update method.
class Layer:
def init(self, num_inputs, num_neurons, activation_function):
self.num_inputs = num_inputs
self.num_neurons = num_neurons
self.activation_function = activation_function
self.neurons = [Neuron(num_inputs, activation_function) for in range(numneurons)]
# Your forward function goes here before backward
def backward(self, delta, learning_rate):
next_delta = np.zeros(self.num_inputs)
for i, neuron in enumerate(self.neurons):
next_delta += neuron.backward(delta[i], learning_rate)
return next_delta
def update(self, learning_rate):
for neuron in self.neurons:
neuron.update(learning_rate)
The backward function here is much simpler
Based on the num_inputs for the layer, we create a matrix of deltas and update the matrix by calling the backward function of all the neurons in that layer. In the update method, we call the update method of each neuron in that layer.
Now, time to update the NN class
class NeuralNetwork:
# No changes to your init or forward functions
def calc_loss_delta(self, predicted_outputs, actual_outputs):
return predicted_outputs - actual_outputs
def train(self, X, y, epochs, learning_rate):
for epoch in range(epochs):
total_loss = 0
for i in range(X.shape[0]):
inputs = X[i]
expected_output = y[i]
# Forward pass
predicted_output = self.forward(inputs)
# Calculate loss
loss = categorical_cross_entropy(predicted_output, expected_output)
total_loss += loss
# Calculate loss gradient (delta for the output layer)
loss_delta = self.calc_loss_delta(predicted_output, expected_output)
# Backward pass
delta = loss_delta
for layer in reversed(self.layers):
delta = layer.backward(delta, learning_rate)
layer.update(learning_rate)
average_loss = total_loss / X.shape[0]
print(f'Epoch {epoch+1}/{epochs}, Loss: {average_loss:.4f}')
The calc_loss_delta function calculates the loss between the NN predicted outputs and the actual outputs. In the train method, we run the training for the number of epochs or passes. Going forward, the comments on the code are self-explanatory. We predict the outputs, calculate the loss, calculating the loss delta, backpropagate through the layers in a reversed manner. At the end, we output the loss for each epoch.
Now, what is the categorical_cross_entropy function? It is the function that calculates the loss.
You can learn about categorical cross entropy here. There are other loss functions too, but we'll use this one:
https://www.youtube.com/watch?v=dEXPMQXoiLc
In the top of your file, you can define this function as well as the ReLU derivative:
def relu_derivative(x):
return np.where(x > 0, 1, 0)
def categorical_cross_entropy(predicted, actual):
epsilon = 1e-12 # To avoid log(0)
predicted = np.clip(predicted, epsilon, 1. - epsilon)
return -np.sum(actual * np.log(predicted))
I am attaching a file week5_code.py that contains the code for the content covered above.
Make sure to not write over this file. Instead, use the code in the file to copy the necessary and new functions and code over to your existing project that you have been working on for the past month.
While the content for this week might seem very intimidating, I promise if you read the entire article with a calm mind and watch the two videos linked, backpropagation is very understandable. Moreover, I have provided every piece of code needed for back-propagation to work.
Next week, we will cover the MNIST dataset in detail and actually load the dataset with python and train our network on the dataset. At the end, we will test our NN on a random digit from the dataset and you'll see how your NN is able to detect what that digit is. Before those, we will also talk briefly about:
- One-hot encoding and why we need it
- Normalizing the data before training
Once we have our model trained and running, we will have accomplished the most important part of this project. I hope to see you all there!