In: Advanced Math
Recurrent Nets:
(a) What is the “vanishing or exploding gradient problem” in recurrent nets?
(b) Give a weight initialization method that can mitigate the vanishing or exploding gradient problem.
(c) Recurrent nets are notoriously bad at “remembering” things for more than a few iterations. Give the names and quick descriptions of two methods that augment RNNs with a memory.
Answers are given assuming that reader is familiar with the concept of neural network, weight, bias, activation functions,forward and backward propagation etc.
Thank you!