In: Statistics and Probability
2. Choose all the valid answers to the description about gradient descent from the options below:
A. The global minimum can always be reached by using gradient descent.
B. Every gradient descent iteration can always decrease the value of loss function even when the gradient of the loss function is zero.
C. When the learning rate is very large, some iterations of gradient descent may not decrease the value of loss function.
D. With different initial weights, the gradient descent algorithm may lead to different local minimum.
E. None of the above is valid.
Answer: Option C and D are valid statements
Option A is not valid
Explanation:
Gradient Descent Algorithm will not always converge to global minimum. It will converge to Global minimum only if the function have one minimum and that will be a global minimum too.
Option B is not valid
Explanation:
Gradient descent climbs down a hill. If it reaches a plateau (gradient of the loss function is zero), it considers the algorithm converged and moves no more. Thus, gradient descent iteration will not be able to decrease the value of loss function when gradient is zero as it will stop moving at that point.
Option C is valid
Explanation:
If we record the learning at each iteration and plot the learning rate (log) against loss; we will see that as the learning rate increase, there will be a point where the loss stops decreasing and starts to increase.Thus, when the learning rate is very big, the loss function will increase.
Option D is valid
Explanation:
Neural networks are usually trained using back propagation, which is a non-convex optimization problem for most of the loss functions. As there are multiple local minima, non-convex generally converge to different optimal points for different initial conditions. So it not only affects the speed of the convergence but optimality also. Initial parameters of neural networks are as important as the network architecture and initialization has been thoroughly studied in the past.
Option E is incorrect as Option C and D are valid