In: Computer Science
1. Multi-layer BP neural networks have no proof of converging to an optimal solution. Is this true? If it is, then why do we bother to use them?
1. What is the fundamental equation that guides changes to a weight wij in a BP network. Describe its components.
1). ANSWER :
GIVENTHAT :
Backpropagation, short for “backward propagation of errors”, is a mechanism used to update the weights using gradient descent. It calculates the gradient of the error function with respect to the neural network’s weights. The calculation proceeds backwards through the network.
Gradient descent is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.
For example, to update w6
, we take the current
w6
and subtract the partial derivative of
error function with respect to w6
.
Optionally, we multiply the derivative of the
error function by a selected number to make sure
that the new updated weight is minimizing the
error function; this number is called learning
rate.
The derivation of the error function is evaluated by applying the chain rule as following
So to update w6
we can apply the following
formula
Similarly, we can derive the update formula for w5
and any other weights existing between the output and the hidden
layer.
However, when moving backward to update w1
,
w2
, w3
and w4
existing
between input and hidden layer, the partial derivative for the
error function with respect to w1
, for example, will
be as following.
We can find the update formula for the remaining weights
w2
, w3
and w4
in the same
way.
In summary, the update formulas for all weights will be as following:
We can rewrite the update formulas in matrices as following
Backward Pass
Using derived formulas we can find the new weights.
Learning rate: is a hyperparameter which means that we need to manually guess its value.
Now, using the new weights we will repeat the forward passed
We can notice that the prediction
0.26
is a little bit closer to actual
output than the previously predicted one
0.191
. We can repeat the same process of backward and
forward pass until error is close or equal to
zero.
Backpropagation Visualization
You can see visualization of the forward pass and backpropagation here.
You can build your neural network using netflow.js