In: Computer Science
Write the Gradient Descent definition and explain how to apply gradient descent on Linear Regression.
Gradient Descent:- Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.
Applying gradient descent on Linear Regression.:-
Linear Regression is a supervised machine learning algorithm which learns from given x dependent variable and Y as quantifiable variable and predicts New Y from given new X. It simply works to find that best fit line from the given data.
Consider m is the line’s slope and b is the line’s y-intercept. To find the best line for our data, we need to find the best set of slope m and y-intercept b values. The way to solving this type of problem is to define an error function (also called a cost function) by incorporating gradient descent function we can achieve that measures how good fit a given line is.
This function will take in an (m,b) pair and return an error value based on how well the line fits our data. To compute this error for a given line, we’ll iterate through each (x,y) point in our data set and sum the square distances between each point’s y value and the candidate line’s y value (computed at mx + b). It’s conventional to square this distance to ensure that it is positive and to make our error function differentiable.
Below given is sum of square error Equation:-
To run gradient descent on this error function, we first need to compute its gradient. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters (m and b), we will need to compute a partial derivative for each.
Partial Derivaties of sum of square error equation:-
In this way we used gradient descent to iteratively estimate m and b. Way to do this is taking derivative of cost function as explained in the above figure. Gradient Descent step downs the cost function in the direction of the steepest descent. Size of each step is determined by parameter known as Learning Rate.