In: Statistics and Probability
1) A regression line generally attempts to ____: (choose the best answer)
a) Pass through as few data points as possible
b) Pass through as many data points as possible
c) Minimize the squared errors between the line and the data points
d) Minimize the adjusted R-squared
e) Maximize the number of data points it passes through
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
2) If we use more than one predictor, which of the following is true?
a) This is indicative of non-parametric regression.
b) There is likely to be multicollinearity.
c) One may gain a more nuanced view of the relationship between the response and predictors.sum of residuals is close to 0
d) There is likely to be unequal variance.
e) The model is probably too complex and should be simplified.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
3) A business partner has created a regression model with a high R-squared of 0.88. He says that because there is such a high R-squared, x must cause y. Is he correct?
Yes or No
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
4) Why is multicollinearity bad?
a) It reduces the R-squared drastically.
b) It will increase the heteroscedasticity of the model.
c) It forces you to add more variables to the model, making it more complex.
d) It makes it more difficult to determine the exact effects/impact of each explanatory variable.
e) It imposes a condition of causation on the relationship, while regression should only display correlation.
Answer 1: c) Minimize the squared errors between the line and the data points
Explanation
The regression line minimizes the sum of squared differences between observed values (the y values) and predicted values (the ŷ values computed from the regression equation). The regression line passes through the mean of the X values (x) and through the mean of the Y values (y).
Answer 2: b) There is likely to be multicollinearity.
Explanation
In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. Thus, if we use more than one predictor, there are chances of multicollinearity.
Answer 3: No
Explanation
An R-squared value indicates how well your observed data, or the data you collected, fits an expected trend. This value tells you the strength of the relationship but, like all statistical tests, there is nothing given that tells you the cause behind the relationship or its strength.
Answer 4: d) It makes it more difficult to determine the exact effects/impact of each explanatory variable
Explanation
The problem with multicollinearity is that, as the Xs become
more highly correlated, it becomes more and more
difficult to determine which X is actually producing the effect on
Y.