In: Statistics and Probability
The data set mantel in the alr4 package has a response Y and three predictors x1, x2 and x3.
(a) Apply the forward selection and backward elimination algorithms, using AIC as a criterion function. Report your findings.
(b) Use regsubsets() function in R to determine the best model. Which appear to be the important predictors? What’s the final model? Explain your reasoning.
R-code is -
install.packages("alr4")
library(alr4)
names(mantel)
attach(mantel)
fit=lm(Y~.,data=mantel)
#Backward Selection based on AIC
library(MASS)
step <- stepAIC(fit, direction="backward")
summary(step)
Output-
Call:
lm(formula = Y ~ X1 + X2, data = mantel)
Residuals:
1 2 3 4 5
1.212e-14 -1.824e-14 2.743e-15 5.123e-15 -1.747e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 4.294e-12 -2.329e+14 <2e-16 ***
X1 1.000e+00 4.250e-15 2.353e+14 <2e-16 ***
X2 1.000e+00 4.266e-15 2.344e+14 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.607e-14 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 4.415e+28 on 2 and 2 DF, p-value: < 2.2e-16
> summary(step1)
Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)
Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15
R-code-
#Forward Selection based on AIC
step1 <- stepAIC(fit, direction="forward")
summary(step1)
Output-
Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)
Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15
R-code-
install.packages("olsrr")
library(olsrr)
ols_best_subset(fit)
Note that regressors X1,X2 appear to be the active regressors beacuse these regressors selected in Backward Selection ,Forward Selection & all subsets feature selection methods.
Best Subsets Regression
-----------------------
Model Index Predictors
-----------------------
1 X3
2 X1 X2
3 X1 X2 X3
-----------------------
Subsets Regression Summary
---------------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP
APC
---------------------------------------------------------------------------------------------------------------------------------------
1 0.9074 0.8765 0.7461 4.58082e+27 15.8806 -4.3087 14.7090 1.2673
0.9857 0.3520 0.2162
2 1.0000 1.0000 1 2.1199 -300.0114 -317.4200 -301.5737 0.0000
0.0000 0.0000 0.0000
3 1.0000 1.0000 1 4.0000 -298.5777 -312.7671 -300.5306 Inf 0.0000
Inf 0.0000
---------------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate
normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria