In: Statistics and Probability
When fitting a multiple regression model, you should check for independence of observations and the absence of multicollinearity. Discuss how you would check appropriate statistics and/or plots.
(a) When we run a multiple regression, we hope to be able to
generalize the sample model to the entire population. To do this,
several assumptions must be met including: No Multicollinearity
Homoscedasticity Independent Errors Normally-distributed
Errors
Explain what is meant by each of these assumptions and describe the
checks that you would undertake to validate that there has been no
violation.
1 multiple linear regression analysis requires that the errors between observed and predicted values (i.e., the residuals of the regression) should be normally distributed. This assumption may be checked by looking at a histogram or a Q-Q-Plot. Normality can also be checked with a goodness of fit test (e.g., the Kolmogorov-Smirnov test), though this test must be conducted on the residuals themselves.
2 multiple linear regression assumes that there is no multicollinearity in the data. Multicollinearity occurs when the independent variables are too highly correlated with each other.
Multicollinearity may be checked multiple ways:
1) Correlation matrix – When computing a matrix of Pearson’s bivariate correlations among all independent variables, the magnitude of the correlation coefficients should be less than .80.
2) Variance Inflation Factor (VIF) – The VIFs of the linear regression indicate the degree that the variances in the regression estimates are increased due to multicollinearity. VIF values higher than 10 indicate that multicollinearity is a problem.
3 A scatterplot of residuals versus predicted values is good way to check for homoscedasticity. If the data are heteroscedastic, a non-linear data transformation or addition of a quadratic term might fix the problem.
4
Independent errors: This means that residuals should be uncorrelated.this can be checked using Durbin Watson test.