In: Computer Science
In Ridge Regression, the OLS loss function is augmented in such a way that we not only minimize the sum of squared residuals but also penalize the size of parameter estimates, in order to shrink them towards zero:
Solving this for β^β^ gives the the ridge regression estimates β^ridge=(X′X+λI)−1(X′Y)β^ridge=(X′X+λI)−1(X′Y), where I denotes the identity matrix.
The λ parameter is the regularization penalty. We will talk about how to choose it in the next sections of this tutorial, but for now notice that:
So, setting λ to 0 is the same as using the OLS, while the larger its value, the stronger is the coefficients' size penalized.
Bias-Variance Trade-Off in Ridge Regression
Incorporating the regularization coefficient in the formulas for bias and variance gives us
From there you can see that as λ becomes larger, the variance decreases, and the bias increases. This process you can use