In: Statistics and Probability
data:
"STATE" "MALE" "BIRTH" "DIVO" "BEDS" "EDUC" "INCO" "LIFE"
AK 119.1 24.8 5.6 603.3 14.1 4638 69.31
AL 93.3 19.4 4.4 840.9 7.8 2892 69.05
AR 94.1 18.5 4.8 569.6 6.7 2791 70.66
AZ 96.8 21.2 7.2 536.0 12.6 3614 70.55
CA 96.8 18.2 5.7 649.5 13.4 4423 71.71
CO 97.5 18.8 4.7 717.7 14.9 3838 72.06
CT 94.2 16.7 1.9 791.6 13.7 4871 72.48
DC 86.8 20.1 3.0 1859.4 17.8 4644 65.71
DE 95.2 19.2 3.2 926.8 13.1 4468 70.06
FL 93.2 16.9 5.5 668.2 10.3 3698 70.66
GA 94.6 21.1 4.1 705.4 9.2 3300 68.54
HW 108.1 21.3 3.4 794.3 14.0 4599 73.60
IA 94.6 17.1 2.5 773.9 9.1 3643 72.56
ID 99.7 20.3 5.1 541.5 10.0 3243 71.87
IL 94.2 18.5 3.3 871.0 10.3 4446 70.14
IN 95.1 19.1 2.9 736.1 8.3 3709 70.88
KS 96.2 17.0 3.9 854.6 11.4 3725 72.58
KY 96.3 18.7 3.3 661.9 7.2 3076 70.10
LA 94.7 20.4 1.4 724.0 9.0 3023 68.76
MA 91.6 16.6 1.9 1103.8 12.6 4276 71.83
MD 95.5 17.5 2.4 841.3 13.9 4267 70.22
ME 94.8 17.9 3.9 919.5 8.4 3250 70.93
MI 96.1 19.4 3.4 754.7 9.4 4041 70.63
MN 96.0 18.0 2.2 905.4 11.1 3819 72.96
MO 93.2 17.3 3.8 801.6 9.0 3654 70.69
MS 94.0 22.1 3.7 763.1 8.1 2547 68.09
MT 99.9 18.2 4.4 668.7 11.0 3395 70.56
NC 95.9 19.3 2.7 658.8 8.5 3200 69.21
ND 101.8 17.6 1.6 959.9 8.4 3077 72.79
NE 95.4 17.3 2.5 866.1 9.6 3657 72.60
NH 95.7 17.9 3.3 878.2 10.9 3720 71.23
NJ 93.7 16.8 1.5 713.1 11.8 4684 70.93
NM 97.2 21.7 4.3 560.9 12.7 3045 70.32
NV 102.8 19.6 18.7 560.7 10.8 4583 69.03
NY 91.5 17.4 1.4 1056.2 11.9 4605 70.55
OH 94.1 18.7 3.7 751.0 9.3 3949 70.82
OK 94.9 17.5 6.6 664.6 10.0 3341 71.42
OR 95.9 16.8 4.6 607.1 11.8 3677 72.13
PA 92.4 16.3 1.9 948.9 8.7 3879 70.43
RI 96.2 16.5 1.8 960.5 9.4 3878 71.90
SC 96.5 20.1 2.2 739.9 9.0 2951 67.96
SD 98.4 17.6 2.0 984.7 8.6 3108 72.08
TN 93.7 18.4 4.2 831.6 7.9 3079 70.11
TX 95.9 20.6 4.6 674.0 10.9 3507 70.90
UT 97.6 25.5 3.7 470.5 14.0 3169 72.90
VA 97.7 18.6 2.6 835.8 12.3 3677 70.08
VT 95.6 18.8 2.3 1026.1 11.5 3447 71.64
WA 98.7 17.8 5.2 556.4 12.7 3997 71.72
WI 96.3 17.6 2.0 814.7 9.8 3712 72.48
WV 93.9 17.8 3.2 950.4 6.8 3038 69.48
WY 100.7 19.6 5.4 925.9 11.8 3672 70.29
We consider the multiple linear regression with LIFE (y) as the
response variable, and MALE, BIRTH, DIVO , BEDS, EDUC, and INCO, as
predictors.
(a) Plot the standardized residuals against the fitted values. Are there any notable points. In particular look for points with large residuals or that may be influential.
(b) Compute and plot the leverage of each point. Identify any points that have a leverage larger than 0.5.
(c) Compute the Cook’s distance for each point. Identify any points that have a Cook’s distance larger than 1. Are these the same observations as those seen in part (b)?
(d) Plot the standardized residuals against the variable BEDS. Specifically mark the point corresponding to Washington, D.C. What can you say about this observation?
(e) Remove the observation corresponding to Washington, D.C. and refit the model. Are there any notable differences with the model fit in part (a)?
(f) Plot the standardized residuals against each of the 6 explanatory variables. Specifically mark the observation corresponding to UT. What is notable about this state?
(g) Remove the observation corresponding to UT and refit the model. Are there any notable differences with the model fit in part (a)? In particular, how does UT’s exclusion impact the R2 value?
Answer:
By using given data,
CODES:
data<-read.table("C:/Users/Computer/Desktop/data.txt",header=TRUE,sep="")
data
model<-lm(LIFE ~ MALE + BIRTH + DIVO + INCO, data = data)
summary(model)
st1 <- stepAIC(model, direction = "both")
st2 <- stepAIC(model, direction = "forward")
st3 <- stepAIC(model, direction = "backward")
summary(st1)
summary(st2)
summary(st3)
A)
AIC(FULL MODEL)= 33.41
AIC(REDUCED MODEL)= 31.44
Therefore,Life~Male+Birth+Divo is the best model.
B)
Even looking at Adj R^2 value we can say that reduced model is the best model.