4. Gradient descent. Gradient descent is one of the most popular algorithms in data science and...

4. Gradient descent. Gradient descent is one of the most popular algorithms in data science and by far the most common way to optimise neural networks. A function is minimised by iteratively moving a little bit in the direction of negative gradient. For the two-dimensional case, the step of iteration is given by the formula xn+1 , yn+1 = xn, yn − ε ∇f(xn, yn). In general, ε does not have to be a constant, but in this question, for demonstrative purposes, we set ε = 0.1. Let f(x, y) = 3.5x 2 − 4xy + 6.5y 2 and x0 and y0 be any real numbers. (a) For all x, y ∈ R compute ∇f(x, y) and find a matrix A such that [3] A x y = x y − ε ∇f(x, y). Write an expression for xn yn in terms of x0 and y0 and powers of A. (b) Find the eigenvalues of A. [1] (c) Find one eigenvector corresponding to each eigenvalue. [2] (d) Find matrices P and D such that D is diagonal and A = P DP −1 . [1] (e) Find matrices Dn , P −1 and An . Find formulas for xn and yn. [4] (f) Suppose x0 = y0 = 1. Find the smallest N ∈ N such that xN yN ≤ 0.05. [3] (g) Sketch the region R consisting of those (x0, y0) such that xN ≥ 0, yN ≥ 0 and [4] xN yN ≤ 0.05, xN−1 yN−1 > 0.05, where N is the number found in part (f). Write an equation for the boundary of R. Which points of the boundary belongs to R and which do not?

Expert Solution

Colby Messinger answered 2 years ago

Write the Gradient Descent definition and explain how to apply gradient descent on Linear Regression.

For Machine Learning, one question....................................................................................... 1) Discuss the relation between feature scaling and the gradient descent...

For Machine Learning, one question....................................................................................... 1) Discuss the relation between feature scaling and the gradient descent algorithm. To be precise, discuss how feature scaling can affect the gradient descent algorithm?

2. Choose all the valid answers to the description about gradient descent from the options below:...

2. Choose all the valid answers to the description about gradient descent from the options below: A. The global minimum can always be reached by using gradient descent. B. Every gradient descent iteration can always decrease the value of loss function even when the gradient of the loss function is zero. C. When the learning rate is very large, some iterations of gradient descent may not decrease the value of loss function. D. With different initial weights, the gradient descent...

Mergesort is known to be one of the fastest algorithms for sorting lists of data. In...

Mergesort is known to be one of the fastest algorithms for sorting lists of data. In fact, the runtime of mergesort is proportional to n· logn where n is the size of the list to be sorted. Bubblesort, on the other hand is a much slower algorithm, since it's runtime is proportional to n2 , where n is the size of the list to be sorted. As an experiment, mergesort is run on a slow computer, bubblesort is run on...

The Bradford Assay is one of the most popular protein assays, but it is not the...

The Bradford Assay is one of the most popular protein assays, but it is not the only one. Briefly describe any of the alternative assays that are available. If you were setting up your own lab, which would you prefer to use? Briefly state why.

Scientism is: means the same thing as science A currently popular ideology that science is the...

Scientism is: means the same thing as science A currently popular ideology that science is the only way (or most reliable way) to get knowledge is the only way to do science invited by Aristotle Flag this Question Question 22 pts According to the text there is no such thing as THE scientific method. Different scientific methodologies are used to test different scientific hypotheses. True False Flag this Question Question 32 pts What are some warning signs of bogus science:...

COMP 251: Data Structures and Algorithms – Lab 4 Page 1 of 2 Lab 4: Implementing...

COMP 251: Data Structures and Algorithms – Lab 4 Page 1 of 2 Lab 4: Implementing a Stack Using Linked List Implementation Objectives: The aim of this lab session is to make you familiar with using the linked list implementation in week3 and implement a stack using it. In this lab you will complete the partially implemented StackUsingLinkedList. Specifically, you will implement the following methods. • The default constructor: public StackUsingLinkedList() • public void push(AnyType x) • public AnyType pop()...

What is Data Science? How can you relate the term ‘e-science’ to data science? A data...

What is Data Science? How can you relate the term ‘e-science’ to data science? A data scientist typically performs 3 tasks. What are they?

China is one of the most popular investment destinations in the world. Throughout much of the...

China is one of the most popular investment destinations in the world. Throughout much of the 1990s, China accounted for 50% of foreign direct investment (FDI) going into developing countries and between 1994 and 1997, China was the second largest recipient of FDI in the world, after the United States. Do you think the recent corporate tax cuts in the U.S. and the changes in tariff rates in both countries could affect FDI in China? Why? How? What about the...

Brewed coffee is one of the most popular drinks in the world and a big market...

Brewed coffee is one of the most popular drinks in the world and a big market in Singapore. In recent years Singaporean consumers have become concerned that coffee farmers often receive very low prices by large coffee companies buying their beans and selling them in international markets. For this reason, fair trade coffee has reached supermarket shelves and coffee shops. Because producing fair trade coffee includes providing extra income for the farmers, this product is more costly to make than...

Question