In: Advanced Math
4. Gradient descent. Gradient descent is one of the most popular algorithms in data science and by far the most common way to optimise neural networks. A function is minimised by iteratively moving a little bit in the direction of negative gradient. For the two-dimensional case, the step of iteration is given by the formula xn+1 , yn+1 = xn, yn − ε ∇f(xn, yn). In general, ε does not have to be a constant, but in this question, for demonstrative purposes, we set ε = 0.1. Let f(x, y) = 3.5x 2 − 4xy + 6.5y 2 and x0 and y0 be any real numbers. (a) For all x, y ∈ R compute ∇f(x, y) and find a matrix A such that [3] A x y = x y − ε ∇f(x, y). Write an expression for xn yn in terms of x0 and y0 and powers of A. (b) Find the eigenvalues of A. [1] (c) Find one eigenvector corresponding to each eigenvalue. [2] (d) Find matrices P and D such that D is diagonal and A = P DP −1 . [1] (e) Find matrices Dn , P −1 and An . Find formulas for xn and yn. [4] (f) Suppose x0 = y0 = 1. Find the smallest N ∈ N such that xN yN ≤ 0.05. [3] (g) Sketch the region R consisting of those (x0, y0) such that xN ≥ 0, yN ≥ 0 and [4] xN yN ≤ 0.05, xN−1 yN−1 > 0.05, where N is the number found in part (f). Write an equation for the boundary of R. Which points of the boundary belongs to R and which do not?