In: Statistics and Probability
1. The coefficients of the least squares regression line, Y = M*X + B, are determined by minimizing the sum of the squares of the
a) x‐coordinates.
b) y‐coordinates.
c) residuals
-----------------------
2 . Which of the following statements are true about overfitting and underfitting?
a) Models that do not do well on training or test data are said to underfit the data.
b) They lack enough independent variables to predict the response variable.
c) A model’s generalization ability refers to it ability to give accurate predictions for new, previously unseen, training data.
d) Models that are too simplistic for the amount test data are said to overfit and are not likely to generalize to new observations.
-----------------------------
3. A linear regression was run where there were four features used to predict the response variable. The predictor variables were standardized before the run. The regression output of intercept followed by the coefficients for these predictors are given in the list below.
[3.33, 1.09, 1.33, -0.15, -3.14]
* Write the linear equation for this model, using Y for response and Xi for each of the i=1-4 predictors
*Which of the predictors are the least important in this model?
* What is the meaning of the coefficient of the most important predictor?
Answer to question# 1)
The aim is to minimize the difference of the actual and the predicted values which si called residuals
So the line of best fit is obtained by minimizing the squares of residuals
Hence the correct answer choice is (c ) residuals
.
Answer to question# 2)
A model is said to be overfitting or underfitting when it doesnot have the correct set of independent variables that are needed to explain it
Hence the correct answer choice is ( b) they lack enough independent variables to predict the response variable
.
Answer to question# 3)
The equation will be:
Y = 3.33 +1.09x1 + 1.33x2 -0.15x3 -3.14x4
.
The smallest coefficient in terms of magnitude is -0.15, hence this factor x3 has the least influence on the value of y
.
The highest coefficient in terms of magnitude is -3.14
This implies if one unit of variable x4 is increased, the value of y will decrease 3.14 times
They share an inverse relation because the value of coefficient is negative