# Backpropagation from Scratch
import numpy as np
import os
from collections import OrderedDict
from copy import deepcopy
Most beginners try understanding back-propagation by rote learning formulae. This assignment is intended to convince you that such an approach is not required, as back propagation is quite inutitive if taught corretly. In this assignment I aim to teach you how to think about losses and gradients and how to derive back propagation by yourself without having to rely on deep learning frameworks.
Consider the simple network below. There is a single neuron with two inputs $X_{2}$ and $X_{1}$ connected by two weights ($W_{2}$ and $W_{1}$).
As shown below, let $W_{1} = 0.3$ and $W_{2}=0.7$.
So the network below can be summarized as $\hat{Y} = W_{2}*X_{2} + W_{1}*X{1}$
Let the ground truth be $Y_{g}$, which along with $\hat{Y}$ is fed to the squared error loss function.
So the $Loss = (1/2)*(Y_{g} - \hat{Y})^{2}$
%%html
<img src="Images/Simple_Network.jpg" width=300 height=300>
To motivate this concept, let $X_{1} = -100$ and $X_{2}=1000$. Hence $\hat{Y} = 670$. Now let us say, that the $Y_{g} = 1000$.
It is apparent, that in order to reduce the loss, we need to increase $\hat{Y}$. Let us study how our loss changes by varying the weights of the network for this input.
Observations about Varying $W_{1}$
Observations about Varying $W_{2}$
What do we want?
Hence if we change any parameter ($W_{i}$) along the opposite direction of $\frac{\partial Loss}{\partial W_{i}}$, we will end up reducing the loss. This is called moving along the anti-gradient.
def apply_gradient_change(weight:np.ndarray,
gradient:np.ndarray,
alpha:float) -> np.ndarray:
""" Takes the weight and applies the gradient change as mentioned above. It then
returns the changed weight.
Max Marks : 1
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
assert np.array_equal(apply_gradient_change(np.array([1.,2.,3.,4.]),np.array([1.,2.,3.,4.]),0.001),np.array([0.999,1.998,2.997,3.996]))
Let us add a few more neurons to the simple network in order to introduce another concept. Consider the network below. We will refer to this network as Complex Network in the following discussions.
$ \hat{Y} = W_{1}*Z_{1} + W_{2}*Z_{2}$ : Here Pred is the same as $\hat{Y}$
$Z_{1} = W_{3}*Z_{3} + W_{4}*Z_{4}$
$Z_{2} = W_{5}*Z_{4}$
$Z_{4} = W_{7}*X_{2}$
$Z_{3} = W_{6}*X_{1}$
$Loss = (1/2)*(Y_{g} - \hat{Y})^{2}$
Note, that when we feed many points in a mini-batch at the same time, the mini-batch loss is the average of the loss calculated for each point.
$Loss = (1/2m)*(Y_{g} - \hat{Y})^{2}$ where m is the number of points in the mini-batch.
%%html
<img src="Images/Complex_Network.jpg" width=500 height=500>
As explained below, we observe a pattern when calculating the $(\frac{\partial Loss}{\partial Z_{i}})$ for each $Z_{i}$. If you are not interested in the math, then atleast take a look at the Simplified Meaning statements. We assume that m data points are sent in a mini-batch.
1) $Error_{Pred} = \frac{\partial Loss}{\partial \hat{Y}} = (1/m)*(\hat{Y}-Y_{g})$
2) $Error_{Z_{1}} = \frac{\partial Loss}{\partial Z_{1}} = \frac{\partial Loss}{\partial \hat{Y}}*\frac{\partial \hat{Y}}{\partial Z_{1}} = \frac{\partial Loss}{\partial \hat{Y}}*W_{1}$
----> Simplified Meaning : $Error_{Z_{1}} = Error_{Pred} * W_{1}$
3) $Error_{Z_{2}} = \frac{\partial Loss}{\partial Z_{2}} = \frac{\partial Loss}{\partial \hat{Y}}*\frac{\partial \hat{Y}}{\partial Z_{2}} = \frac{\partial Loss}{\partial \hat{Y}}*W_{2}$
----> Simplified Meaning : $Error_{Z_{2}} = Error_{Pred} * W_{2}$
4) $Loss_{Z_{3}} = \frac{\partial Loss}{\partial Z_{3}} = \frac{\partial Loss}{\partial \hat{Y}}*\frac{\partial \hat{Y}}{\partial Z_{3}} = \frac{\partial Loss}{\partial \hat{Y}}*( W_{1}*\frac{\partial Z_{1}}{\partial Z_{3}} + W_{2}*\frac{\partial Z_{2}}{\partial Z_{3}} ) = (\frac{\partial Loss}{\partial \hat{Y}}*W_{1})*\frac{\partial Z_{1}}{\partial Z_{3}} = (\frac{\partial Loss}{\partial Z_{1}})*W_{3}$
----> Simplified Meaning : $Error_{Z_{3}} = Error_{Z_{1}} * W_{3}$
5) $Loss_{Z_{4}} = \frac{\partial Loss}{\partial Z_{4}} = \frac{\partial Loss}{\partial \hat{Y}}*\frac{\partial \hat{Y}}{\partial Z_{4}} = \frac{\partial Loss}{\partial \hat{Y}}*( W_{1}*\frac{\partial Z_{1}}{\partial Z_{4}} + W_{2}*\frac{\partial Z_{2}}{\partial Z_{4}} ) = (\frac{\partial Loss}{\partial \hat{Y}}*W_{1})*\frac{\partial Z_{1}}{\partial Z_{4}} + (\frac{\partial Loss}{\partial \hat{Y}}*W_{2})*\frac{\partial Z_{2}}{\partial Z_{4}} = (\frac{\partial Loss}{\partial Z_{1}})*W_{4} + (\frac{\partial Loss}{\partial Z_{2}})*W_{5}$
----> Simplified Meaning : $Error_{Z_{4}} = Error_{Z_{1}} * W_{4} + Error_{Z_{2}} * W_{5}$
A pictorial representation of these errors is shown below. This graph representation of errors is known as a Error Graph. If you observe this carefully, this is the same network but with the edges reversed. Hence, it is said, that the flow of errors is exactly opposite to the flow of computation.
%%html
<img src="Images/Error_Graph.jpg" width=500 height=500>
If you have understood the error graph, it will be easy for you to grasp concept 2. Concept 2 states that in order to find the error of a neuron, we only require to know the errors of the neurons it is connected to. This can be verified by all the error equations we had calculated.
Connected neurons are like family.
def get_Errors_of_Complex_Network(y_hat:np.ndarray,
y_ground:np.ndarray,
All_weights:OrderedDict,
All_Errors:OrderedDict) -> None:
""" Calculates the errors for the Complex Network and stores these errors in "All_Errors".
Max Marks : 2
Arguments:
y_hat : Prediciton Array with size = [batch_size]
y_ground : Ground truth Array with size = [batch_size]
All_weights : Dictionary with all stored weights. The keys and values are listed below.
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
All_Errors : Dictionary with all stored errors. The keys and values are listed below.
All_Errors["Pred"] should contain the Error of Pred
All_Errors["Z1"] should contain the Error of Z1
All_Errors["Z2"] should contain the Error of Z2
All_Errors["Z3"] should contain the Error of Z3
All_Errors["Z4"] should contain the Error of Z4
"""
batch_size = len(y_ground)
Error_Pred = (y_hat - y_ground)/batch_size
# Please calculate Error_Z1,Error_Z2,Error_Z3,Error_Z4
# YOUR CODE HERE
raise NotImplementedError()
try:
assert(Error_Pred.shape == (batch_size,))
assert(Error_Z1.shape == (batch_size,))
assert(Error_Z2.shape == (batch_size,))
assert(Error_Z3.shape == (batch_size,))
assert(Error_Z4.shape == (batch_size,))
except :
print("Error Pred : {}".format(Error_Pred.shape))
print("Error Z1 : {}".format(Error_Z1.shape))
print("Error Z2 : {}".format(Error_Z2.shape))
print("Error Z3 : {}".format(Error_Z3.shape))
print("Error Z4 : {}".format(Error_Z4.shape))
raise ValueError("Something's wrong!")
All_Errors["Pred"] = Error_Pred
All_Errors["Z1"] = Error_Z1
All_Errors["Z2"] = Error_Z2
All_Errors["Z3"] = Error_Z3
All_Errors["Z4"] = Error_Z4
# Please run these hidden tests !!
When we do not add an activation function, we can perform a procedure known as Network Collapse. Consider, the complex network shown in the previous section. If we observe carefully, $\hat{Y}$ can directly be written as a function of $X_{1}$ and $X_{2}$.
For example, $\hat{Y} = {W_{1}}_{New}*X_{1} + {W_{2}}_{New}*X_{2}$ where ${W_{1}}_{New}$ is expressed as $W_{1}*W_{3}*W_{6}.$ Can you find out ${W_{2}}_{New}$?
def calculate_W1_New(All_weights:OrderedDict)->float:
""" Calculates W1_New of the collapsed network.
Arguments :
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
"""
return All_weights["W1"]*All_weights["W3"]*All_weights["W6"]
def calculate_W2_New(All_weights:OrderedDict)->float:
""" Calculates W2_New of the collapsed network.
Max Marks : 1
Arguments :
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
sample_weights = OrderedDict()
sample_weights["W1"] = 0.1
sample_weights["W2"] = 0.9
sample_weights["W3"] = 0.3
sample_weights["W4"] = 0.4
sample_weights["W5"] = 0.9
sample_weights["W6"] = 0.1
sample_weights["W7"] = 0.3
assert( calculate_W2_New(sample_weights) == 0.255)
We realize, that without adding an activation function, complex networks are indeed useless! Hence let us now add an activation function. Right below is an example of a very minimal network in order to showcase what differences arise in the backprop when we incorporate an activation function. In this example, we use the sigmoid function.
%%html
<img src="Images/Activation.jpg" width=500 height=500>
Let us list down all the computation equations.
Let us list down all the backprop equations.
$Error_{Pred} = \frac{\partial Loss}{\partial \hat{Y}} = (1/m)*(\hat{Y} - Y_{g})$
$Error_{Z_{2}} = \frac{\partial Loss}{\partial Z_{2}} = \frac{\partial Loss}{\partial \hat{Y}} * \frac{\partial \hat{Y}}{\partial Z_{2}} = Error_{Pred}*W_{2}*\sigma{(Z_{2})}*(1-\sigma{(Z_{2})}) = Error_{Pred}*W_{2}*differentation(\sigma(Z_{2}))$
$Error_{Z_{1}} = \frac{\partial Loss}{\partial Z_{1}} = (\frac{\partial Loss}{\partial \hat{Y}} * \frac{\partial \hat{Y}}{\partial Z_{2}}) * \frac{\partial Z_{2}}{\partial Z_{1}} = (Error_{Z_{2}})*W_{1}$
By observing equations 2 and 3 we understand a very simple thing.
$Error_{Z_{2}}$ would have been $Error_{Pred}*W_{2}$ if there were no activation function. By adding an activation function, $Error_{Pred}*W_{2}$ was simply multiplied by the differentiation of the activation function which is $\sigma{(Z_{2})}*(1-\sigma{(Z_{2})})$.
$Error_{Z_{1}}$ is the same as there is no activation function after $Z_{1}$.
It is apparent now, that the only change in the backprop when adding an activation function, is the multiplication of the differentiation of the activation function.
def sigmoid(x:np.ndarray) -> np.ndarray:
""" This function returns the output of sigmoid(x). Where sigmoid(x) = 1/(1+e^{-x}). Make sure that you
first make a copy of "x" using np.copy(). Perform operations on this copy and return it.
Max Marks : 1
Arguments:
x : Input to sigmoid function
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
assert(sigmoid(np.array([0])) == np.array([0.5]))
def diff_sigmoid(x:np.ndarray) -> np.ndarray:
""" This function returns the differentiation of sigmoid(x). Where sigmoid(x) = 1/(1+e^{-x}). Make sure that you
first make a copy of "x" using np.copy(). Perform operations on this copy and return it.
Max Marks : 1
Arguments:
x : The input to sigmoid function
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
assert(diff_sigmoid(np.array([0])) == 0.25)
def get_Errors_with_Activation_Fn(y_hat:np.ndarray,
y_ground:np.ndarray,
All_weights:OrderedDict,
All_Zs:OrderedDict,
All_Errors:OrderedDict) -> None:
""" Calculates the errors for the minimal network given above and stores these errors into "All_Errors".
To incorporate the differentiation of the activation function, you can use the "diff_sigmoid" function.
Max Marks : 2
Arguments:
y_hat : Prediction Array with size = [batch_size]
y_ground : Ground truth Array with size = [batch_size]
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_Zs : Dictionary with all stored outputs. The keys and values are listed below
All_Zs["Z1"] contains output Z1
All_Zs["Z2"] contains output Z2
All_Errors : Dictionary with all stored errors. The keys and values are listed below.
All_Errors["Pred"] should contain the Error of Pred
All_Errors["Z1"] should contain the Error of Z1
All_Errors["Z2"] should contain the Error of Z2
"""
batch_size = len(y_ground)
Error_Pred = (y_hat - y_ground)/batch_size
# Please calculate Error_Z2 and Error_Z1
# YOUR CODE HERE
raise NotImplementedError()
try:
assert(Error_Pred.shape == (batch_size,))
assert(Error_Z1.shape == (batch_size,))
assert(Error_Z2.shape == (batch_size,))
except :
print("Error Pred : {}".format(Error_Pred.shape))
print("Error Z1 : {}".format(Error_Z1.shape))
print("Error Z2 : {}".format(Error_Z2.shape))
raise ValueError("Something's wrong!")
All_Errors["Pred"] = Error_Pred
All_Errors["Z2"] = Error_Z2
All_Errors["Z1"] = Error_Z1
# Please run these hidden tests !!
Let us now add an activation function to each node in the complex network. From now onwards, the combination of $Act_{i}(Z_{i})$ and $Z_{i}$ will be fused into a node called $N_{i}$. This modified network will be referred to as Modified Network in further discussions.
The list of activation functions ($Act_{i}$) corresponding to each node ($N_{i}$) are mentioned below.
$N_{1}$ => $Act_{1}$ => $ReLU$
$N_{2}$ => $Act_{2}$ => $Sigmoid$
$N_{3}$ => $Act_{3}$ => $ReLU$
$N_{4}$ => $Act_{4}$ => $Sigmoid$
%%html
<img src="Images/Modified_Network.jpg" width=500 height=500>
def ReLU(x:np.ndarray) -> np.ndarray:
""" Returns the output of Rectified Linear Unit Function. ReLU(x) is max(0,x). Make sure that you
first make a copy of "x" using np.copy(). Perform operations on this copy and return it.
Max Marks : 1
Arguments:
x : Input to ReLU function
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
assert( np.array_equal(ReLU(np.array([-0.2,0.5])),np.array([0,0.5])) )
def diff_ReLU(x:np.ndarray) -> np.ndarray:
""" Returns the differentiation of ReLU(x). Note, make sure that the differentiation of ReLU(x) at x=0 is 0. Make sure that you
first make a copy of "x" using np.copy(). Perform operations on this copy and return it.
Max Marks : 1
Arguments:
x : Input to the ReLU function
"""
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
assert( np.array_equal(diff_ReLU(np.array([0.9,-0.5,0.])), np.array([1.,0.,0.])) )
def get_Errors_of_Modified_Network(y_ground:np.ndarray,
All_weights:OrderedDict,
All_Zs:OrderedDict,
All_Errors:OrderedDict) -> None:
""" Calculates the errors of the modified network and stores these errors in "All_Errors". To incorporate the
differentiation of sigmoid and ReLU, you can use "diff_sigmoid" and "diff_ReLU".
Max Marks : 4
Arguments :
y_ground : Ground truth Array with size = [batch_size]
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
All_Zs : Dictionary with all stored outputs. The keys and values are listed below
All_Zs["Pred"] contains output Pred
All_Zs["Z1"] contains output Z1
All_Zs["Z2"] contains output Z2
All_Zs["Z3"] contains output Z3
All_Zs["Z4"] contains output Z4
All_Errors : Dictionary with all stored errors. The keys and values are listed below.
All_Errors["Pred"] should contain the Error of Pred
All_Errors["Z1"] should contain the Error of Z1
All_Errors["Z2"] should contain the Error of Z2
All_Errors["Z3"] should contain the Error of Z3
All_Errors["Z4"] should contain the Error of Z4
"""
batch_size = len(y_ground)
Error_Pred = (All_Zs["Pred"] - y_ground)/batch_size
# Please calculate Error_Z1, Error_Z2, Error_Z3, Error_Z4
# YOUR CODE HERE
raise NotImplementedError()
try:
assert(Error_Pred.shape == (batch_size,))
assert(Error_Z1.shape == (batch_size,))
assert(Error_Z2.shape == (batch_size,))
assert(Error_Z3.shape == (batch_size,))
assert(Error_Z4.shape == (batch_size,))
except :
print("Error Pred : {}".format(Error_Pred.shape))
print("Error Z1 : {}".format(Error_Z1.shape))
print("Error Z2 : {}".format(Error_Z2.shape))
print("Error Z3 : {}".format(Error_Z3.shape))
print("Error Z4 : {}".format(Error_Z4.shape))
raise ValueError("Something's wrong!")
All_Errors["Pred"] = Error_Pred
All_Errors["Z1"] = Error_Z1
All_Errors["Z2"] = Error_Z2
All_Errors["Z3"] = Error_Z3
All_Errors["Z4"] = Error_Z4
# Please run these hidden tests !!
Whenever we train neural networks, we perform two steps. The first step is to perform backpropagation and get the gradients. The second step is to use these gradients to change the weights. This second step is also known as Optimization.
If you recall, the update equation mentioned at the beginning was $W_{i} = W_{i} - \alpha*\frac{\partial Loss}{\partial W_{i}}$. But up until now, we have been finding out gradients for nodes i.e. $\frac{\partial Loss}{\partial Z_{i}}$ and not for weights i.e. $\frac{\partial Loss}{\partial W_{i}}$.
The procedure to calculate $\frac{\partial Loss}{\partial W_{i}}$ is very straight-forward. Consider, that the input to the weight $W_{i}$ is $Input_{i}$ and that the other end of the weight is connected to $Output_{i}$.
Then $\frac{\partial Loss}{\partial W_{i}} = \frac{\partial Loss}{\partial Output_{i}}*Input_{i}$.
Let us take two examples to better understand this.
def apply_Gradient_Change_to_Modified_Network(alpha:float,
All_weights:OrderedDict,
All_Errors:OrderedDict,
All_Nodes:OrderedDict,
All_Xs:OrderedDict) -> None:
""" Apply Gradient Change to weights in the modified network. And store the new weights back in the
'All_weights' dictionary.
Max Marks : 3
Arguments:
alpha : Learning rate for gradient descent
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
All_Errors : Dictionary with all stored errors. The keys and values are listed below.
All_Errors["Pred"] contains the Error of Pred
All_Errors["Z1"] contains the Error of Z1
All_Errors["Z2"] contains the Error of Z2
All_Errors["Z3"] contains the Error of Z3
All_Errors["Z4"] contains the Error of Z4
All_Nodes : Dictionary with all stored outputs. The keys and values are listed below
All_Nodes["Pred"] contains output Pred
All_Nodes["N1"] contains output N1
All_Nodes["N2"] contains output N2
All_Nodes["N3"] contains output N3
All_Nodes["N4"] contains output N4
All_Xs : Dictionary with all stored inputs. The keys and values are listed below
All_Nodes["X1"] contains input X1
All_Nodes["X2"] contains input X2
"""
batch_size = len(All_Xs["X1"])
# Calculate the gradients
W1_gradient = All_Errors["Pred"]*All_Nodes["N1"]
# Please calculate W2_gradient, W3_gradient, .... ,W7_gradient
# YOUR CODE HERE
raise NotImplementedError()
# Making sure everything is correct
try :
assert(W1_gradient.shape == (batch_size,))
assert(W2_gradient.shape == (batch_size,))
assert(W3_gradient.shape == (batch_size,))
assert(W4_gradient.shape == (batch_size,))
assert(W5_gradient.shape == (batch_size,))
assert(W6_gradient.shape == (batch_size,))
assert(W7_gradient.shape == (batch_size,))
except :
print("W1 Gradient : {}".format(W1_gradient.shape))
print("W2 Gradient : {}".format(W2_gradient.shape))
print("W3 Gradient : {}".format(W3_gradient.shape))
print("W4 Gradient : {}".format(W4_gradient.shape))
print("W5 Gradient : {}".format(W5_gradient.shape))
print("W6 Gradient : {}".format(W6_gradient.shape))
print("W7 Gradient : {}".format(W7_gradient.shape))
All_weights["W1"] -= alpha*np.sum(W1_gradient)
# Please update all the weights W2,W3 .... W7
# YOUR CODE HERE
raise NotImplementedError()
# Please run these hidden tests !!
We now know how to calculate the backprop equations and how to use these gradients to change the weights. Let us assemble all these components together to complete the backprop and optimization of the modified network.
def init_Weights():
# Initialize the Weights of the network
All_weights = OrderedDict()
All_weights["W1"] = np.abs(np.random.randn(1)).item()
All_weights["W2"] = np.abs(np.random.randn(1)).item()
All_weights["W3"] = np.abs(np.random.randn(1)).item()
All_weights["W4"] = np.abs(np.random.randn(1)).item()
All_weights["W5"] = np.abs(np.random.randn(1)).item()
All_weights["W6"] = np.abs(np.random.randn(1)).item()
All_weights["W7"] = np.abs(np.random.randn(1)).item()
return All_weights
def init_Errors():
# Initialize the Errors of the network
All_Errors = OrderedDict()
All_Errors["Pred"] = None
All_Errors["Z1"] = None
All_Errors["Z2"] = None
All_Errors["Z3"] = None
All_Errors["Z4"] = None
return All_Errors
def init_Nodes():
# Initialize the Nodes of the network
All_Nodes = OrderedDict()
All_Nodes["Pred"] = None
All_Nodes["N1"] = None
All_Nodes["N2"] = None
All_Nodes["N3"] = None
All_Nodes["N4"] = None
return All_Nodes
def init_Zs():
# Initialize the Z's of the network
All_Zs = OrderedDict()
All_Zs["Pred"] = None
All_Zs["Z1"] = None
All_Zs["Z2"] = None
All_Zs["Z3"] = None
All_Zs["Z4"] = None
return All_Zs
def shuffle_data(x1:np.ndarray,
x2:np.ndarray,
y:np.ndarray):
""" Shuffles data after every epoch """
num_points = len(x1)
g = np.arange(num_points)
np.random.shuffle(g)
x1 = x1[g]
x2 = x2[g]
y = y[g]
return x1,x2,y
def forward_prop(x1:np.ndarray,
x2:np.ndarray,
All_weights:OrderedDict,
All_Nodes:OrderedDict,
All_Zs:OrderedDict) -> None :
""" Perform forward prop of modified network and store all the node values in the All_Nodes and the
All_Zs dictionaries. Use the "sigmoid" and "ReLU" functions you implemented earlier.
Max Marks : 2
Arguments :
x1 : The x1 data points of the mini-batch
x2 : The x2 data points of the mini-batch
All_weights : Dictionary with all stored weights. The keys and values are listed below
All_weights["W1"] contains weight W1
All_weights["W2"] contains weight W2
All_weights["W3"] contains weight W3
All_weights["W4"] contains weight W4
All_weights["W5"] contains weight W5
All_weights["W6"] contains weight W6
All_weights["W7"] contains weight W7
All_Nodes : Dictionary with all stored outputs. The keys and values are listed below
All_Nodes["Pred"] contains output Pred
All_Nodes["N1"] contains output N1
All_Nodes["N2"] contains output N2
All_Nodes["N3"] contains output N3
All_Nodes["N4"] contains output N4
All_Zs : Dictionary with all stored outputs. The keys and values are listed below
All_Zs["Pred"] contains output Pred
All_Zs["Z1"] contains output Z1
All_Zs["Z2"] contains output Z2
All_Zs["Z3"] contains output Z3
All_Zs["Z4"] contains output Z4
"""
All_Zs["Z3"] = All_weights["W6"]*x1
All_Nodes["N3"] = ReLU(All_Zs["Z3"])
# Please calculate Z2,N2,Z3,N3,Z4,N4,Pred
# YOUR CODE HERE
raise NotImplementedError()
assert(np.array_equal(All_Zs["Pred"],All_Nodes["Pred"]))
# Please run these hidden tests !!
def main_function(epoch_num:int,batch_size:int, data_points:int, alpha:float,
x1_data:np.ndarray, x2_data:np.ndarray, y_data:np.ndarray,
All_Weights:OrderedDict, All_Nodes:OrderedDict, All_Zs:OrderedDict,
All_Errors:OrderedDict) -> None:
""" The main function that will train your neural network for 1 epoch.
Max Marks : 1
Arguments:
epoch_num : Count of which epoch is running
batch_size : The batch size of each mini-batch input
data_points : The total number of training data points
alpha : The learning rate of training
x1_data : All the x1 data points
x2_data : All the x2 data points
y_data : All the groundtruths
All_Weights : A dictionary containing all the weights of the network
All_Nodes : A dictionary containing all the outputs after activations of the network
Ex. (N1, N2, N3 and N4)
All_Zs : A dictionary containing all the outputs before activations of the network
Ex. (Z1, Z2, Z3, Z4)
All_Errors : A dictionary containing all the errors of the network
Ex. Error_Pred, Error_Z1, Error_Z2 and so on...
"""
num_batches = int(np.ceil(data_points/batch_size))
total_loss = 0
# For each mini-batch
for k in range(0,num_batches):
# Input data
x1_input = np.copy(np.squeeze(x1_data[k*batch_size:min((k+1)*batch_size,data_points)]))
x2_input = np.copy(np.squeeze(x2_data[k*batch_size:min((k+1)*batch_size,data_points)]))
y_input = np.copy(np.squeeze(y_data[k*batch_size:min((k+1)*batch_size,data_points)]))
# Perform forward propagation - Call your appropriate function
# YOUR CODE HERE
raise NotImplementedError()
# Perform back propagation to calculate the partial derivatives - Call your appropriate function
# YOUR CODE HERE
raise NotImplementedError()
# Perform optimization and change weights - Call your appropriate function
All_Xs = OrderedDict()
All_Xs["X1"] = x1_input
All_Xs["X2"] = x2_input
# YOUR CODE HERE
raise NotImplementedError()
# Update running loss
total_loss += np.sum((y_input-All_Nodes["Pred"])**2)
print("Student : Epoch {} : Loss : {:.2f}".format(epoch_num,total_loss/data_points))
print("\tW1 : {:.2f}, W2 : {:.2f}, W3 : {:.2f}, W4 : {:.2f}, W5 : {:.2f}, W6 : {:.2f}, W7 : {:.2f}\n".format(All_Weights["W1"],All_Weights["W2"],All_Weights["W3"],All_Weights["W4"],All_Weights["W5"],All_Weights["W6"],All_Weights["W7"]))
def check_more() :
num_epochs = 1000
batch_size = 10
data_points = 10000
alpha = 0.001
x1 = np.random.randn(data_points,1)
x2 = np.random.randn(data_points,1)
y = 3*ReLU(x1) + 4*x2
All_Weights = init_Weights()
All_Nodes = init_Nodes()
All_Zs = init_Zs()
All_Errors = init_Errors()
print("Initial W1 : {:.2f}, W2 : {:.2f}, W3 : {:.2f}, W4 : {:.2f}, W5 : {:.2f}, W6 : {:.2f}, W7 : {:.2f}".format(All_Weights["W1"],All_Weights["W2"],All_Weights["W3"],All_Weights["W4"],All_Weights["W5"],All_Weights["W6"],All_Weights["W7"]))
for k in range(num_epochs) :
x1,x2,y = shuffle_data(x1,x2,y)
main_function(k,batch_size,data_points,alpha,x1,x2,y,All_Weights,All_Nodes,All_Zs,All_Errors)
check_more()
# Please run these hidden tests !!