In: Statistics and Probability
please use R for solving the questions
(e) Is multicollinearity a potential problem in this model?
(f) Construct a normal regression plot of residuals. Does there seem to be any problem with the normality assumption?
(g) Construct and interpret a plot of the residuals versus predicted response.
(h) Based on the above analysis, what is your recommended model?
[Hint: Use the lm commend in R to fit a regression equation.
Table B.4
y | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 |
29.5 | 5.0208 | 1 | 3.531 | 1.5 | 2 | 7 | 4 | 62 | 0 |
27.9 | 4.5429 | 1 | 2.275 | 1.175 | 1 | 6 | 3 | 40 | 0 |
25.9 | 4.5573 | 1 | 4.05 | 1.232 | 1 | 6 | 3 | 54 | 0 |
29.9 | 5.0597 | 1 | 4.455 | 1.121 | 1 | 6 | 3 | 42 | 0 |
29.9 | 3.891 | 1 | 4.455 | 0.988 | 1 | 6 | 3 | 56 | 0 |
30.9 | 5.898 | 1 | 5.85 | 1.24 | 1 | 7 | 3 | 51 | 1 |
28.9 | 5.6039 | 1 | 9.52 | 1.501 | 0 | 6 | 3 | 32 | 0 |
35.9 | 5.8282 | 1 | 6.435 | 1.225 | 2 | 6 | 3 | 32 | 0 |
31.5 | 5.3003 | 1 | 4.9883 | 1.552 | 1 | 6 | 3 | 30 | 0 |
31 | 6.2712 | 1 | 5.52 | 0.975 | 1 | 5 | 2 | 30 | 0 |
30.9 | 5.9592 | 1 | 6.666 | 1.121 | 2 | 6 | 3 | 32 | 0 |
30 | 5.05 | 1 | 5 | 1.02 | 0 | 5 | 2 | 46 | 1 |
36.9 | 8.2464 | 1.5 | 5.15 | 1.664 | 2 | 8 | 4 | 50 | 0 |
41.9 | 6.6969 | 1.5 | 6.902 | 1.488 | 1.5 | 7 | 3 | 22 | 1 |
40.5 | 7.7841 | 1.5 | 7.102 | 1.376 | 1 | 6 | 3 | 17 | 0 |
43.9 | 9.0384 | 1 | 7.8 | 1.5 | 1.5 | 7 | 3 | 23 | 0 |
37.5 | 5.9894 | 1 | 5.52 | 1.256 | 2 | 6 | 3 | 40 | 1 |
37.9 | 7.5422 | 1.5 | 5 | 1.69 | 1 | 6 | 3 | 22 | 0 |
44.5 | 8.7951 | 1.5 | 9.89 | 1.82 | 2 | 8 | 4 | 50 | 1 |
37.9 | 6.0831 | 1.5 | 6.7265 | 1.652 | 1 | 6 | 3 | 44 | 0 |
38.9 | 8.3607 | 1.5 | 9.15 | 1.777 | 2 | 8 | 4 | 48 | 1 |
36.9 | 8.14 | 1 | 8 | 1.504 | 2 | 7 | 3 | 3 | 0 |
45.8 | 9.1416 | 1.5 | 7.3262 | 1.831 | 1.5 | 8 | 4 | 31 | 0 |
25.9 | 4.9176 | 1 | 3.472 | 0.998 | 1 | 7 | 4 | 42 | 0 |
Soln
data_9Nov = read.csv(file.choose(),header = T)
head(data_9Nov)
y x1 x2 x3 x4 x5 x6 x7 x8 x9
1 29.5 5.0208 1 3.531 1.500 2 7 4 62 0
2 27.9 4.5429 1 2.275 1.175 1 6 3 40 0
3 25.9 4.5573 1 4.050 1.232 1 6 3 54 0
4 29.9 5.0597 1 4.455 1.121 1 6 3 42 0
5 29.9 3.8910 1 4.455 0.988 1 6 3 56 0
6 30.9 5.8980 1 5.850 1.240 1 7 3 51 1
e)
round(cor(data_9Nov),2)
|
From the above correlation matrix we can see that x4 is correlated with x1,x2,x6. Hence multicollinearity will be present it we use these variables.
f)
hist(residuals(model))
Since the above plot does not resemble a bell curve, we can conclude that the residuals are not normally distributed.
g)
plot(model,1)
Ideally, the residual plot will show no fitted pattern. That is, the red line should be approximately horizontal at zero. The presence of a pattern may indicate a problem with some aspect of the linear model.
In our example, there is no pattern in the residual plot. This suggests that we can assume linear relationship between the predictors and the outcome variables.
h)
My recommended model is:
model2 = lm(y~x1+x2,data=data_9Nov)
summary(model2)
Call:
lm(formula = y ~ x1 + x2, data = data_9Nov)
Residuals:
Min 1Q Median 3Q Max
-4.7639 -1.9454 -0.1822 1.8068 5.0423
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.0418 2.9585 3.394 0.00273 **
x1 2.7134 0.4849 5.595 1.49e-05 ***
x2 6.1643 3.1864 1.935 0.06663 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.792 on 21 degrees of freedom
Multiple R-squared: 0.8025, Adjusted R-squared: 0.7837
F-statistic: 42.67 on 2 and 21 DF, p-value: 4.007e-08
hist(residuals(model2))