Question

In: Statistics and Probability

The data set mantel in the alr4 package has a response Y and three predictors x1,...

The data set mantel in the alr4 package has a response Y and three predictors x1, x2 and x3.

(a) Apply the forward selection and backward elimination algorithms, using AIC as a criterion function. Report your findings.

(b) Use regsubsets() function in R to determine the best model. Which appear to be the important predictors? What’s the final model? Explain your reasoning.

Solutions

Expert Solution

R-code is -

install.packages("alr4")
library(alr4)
names(mantel)
attach(mantel)
fit=lm(Y~.,data=mantel)

#Backward Selection based on AIC
library(MASS)
step <- stepAIC(fit, direction="backward")
summary(step)

Output-

Call:
lm(formula = Y ~ X1 + X2, data = mantel)

Residuals:
1 2 3 4 5
1.212e-14 -1.824e-14 2.743e-15 5.123e-15 -1.747e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 4.294e-12 -2.329e+14 <2e-16 ***
X1 1.000e+00 4.250e-15 2.353e+14 <2e-16 ***
X2 1.000e+00 4.266e-15 2.344e+14 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.607e-14 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 4.415e+28 on 2 and 2 DF, p-value: < 2.2e-16

> summary(step1)

Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)

Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15

R-code-

#Forward Selection based on AIC
step1 <- stepAIC(fit, direction="forward")
summary(step1)

Output-

Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)

Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15

R-code-

install.packages("olsrr")
library(olsrr)
ols_best_subset(fit)

Note that regressors X1,X2 appear to be the active regressors beacuse these regressors selected in Backward Selection ,Forward Selection & all subsets feature selection methods.

Best Subsets Regression
-----------------------
Model Index Predictors
-----------------------
1 X3
2 X1 X2
3 X1 X2 X3
-----------------------

Subsets Regression Summary
---------------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
---------------------------------------------------------------------------------------------------------------------------------------
1 0.9074 0.8765 0.7461 4.58082e+27 15.8806 -4.3087 14.7090 1.2673 0.9857 0.3520 0.2162
2 1.0000 1.0000 1 2.1199 -300.0114 -317.4200 -301.5737 0.0000 0.0000 0.0000 0.0000
3 1.0000 1.0000 1 4.0000 -298.5777 -312.7671 -300.5306 Inf 0.0000 Inf 0.0000
---------------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria


Related Solutions

Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 15.5 58.8 25.4 62.9 53.7 68.8 46.5 78.6 28.5 57.5 5.7 68.1 -0.4 67.8 43.8 87.7 23.1 64.9 31.3 81.3 48.2 80.1 15.9 71.1 1) Find the correlation coefficient and report it accurate to three decimal places. r = 2) What proportion of the variation in y can be explained by the variation in the values of x? Report answer as...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 81 81.3 92.6 90.8 80.1 94.9 77.8 53.4 89.4 102.9 70.3 38.2 90.2 98 81.4 94.6 94.9 122.4 77.2 42.1 70.6 47.8 71 50.6 Find the correlation coefficient and report it accurate to three decimal places. r = What proportion of the variation in y can be explained by the variation in the values of x? Report answer as a percentage...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 73 16.1 80 14.9 72.5 8.5 55.8 33.6 54.6 23.4 76.6 26.2 74.6 19.1 40.2 40.6 58.7 25.8 Find the correlation coefficient and report it accurate to three decimal places. What proportion of the variation in y can be explained by the variation in the values of x? Report answer as a percentage accurate to one decimal place. (If the answer...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 26.1 54.3 28.4 42.5 33.8 50.2 63.4 63.3 64.3 79.4 72.5 76.1 46.2 57.4 70.2 82.1 50.5 64.3 55.4 58.8 40.7 48.5 69.4 57.5 40.4 47.8 45.1 60 64 72.5 50.6 56.9 44.2 65.6 Verify that the correlation is significant at an α = 0.05 . If the correlation is indeed significant, predict what value (on average) for the explanatory variable...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 58.6 56.3 66.3 72.1 54.8 119.2 57.4 83.2 62.8 74.3 77.6 72.2 71.8 62.2 46.4 77.2 77.4 86.6 60.5 78.4 66.4 131.3 76.4 113.5 68.5 84.2 81.5 102 77.5 136.3 49 12.2 Verify that the correlation is significant at an α=0.05. If the correlation is indeed significant, predict what value (on average) for the explanatory variable will give you a value...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 37.2 26.9 66.2 41.4 80.9 37.6 83.7 45 55 31.2 46.3 29.1 82 44.1 71.6 36.1 54.7 29.3 56.4 33.8 Predict what value (on average) for the response variable will be obtained from a value of 45.1 as the explanatory variable. Use a significance level of α = 0.05 to assess the strength of the linear correlation. What is the predicted...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 62.4 39.2 62.2 40.4 47.7 68.9 120.5 -36.5 49 37.9 69.1 -9.4 53.2 71.8 66 13.8 87 8.4 92.3 -1.6 70.3 17.8 Verify that the correlation is significant at an α=0.05α=0.05. If the correlation is indeed significant, predict what value (on average) for the explanatory variable will give you a value of 67.7 on the response variable. What is the predicted...
Find the regression equation using the following set of data with y as the response variable....
Find the regression equation using the following set of data with y as the response variable. x y 40.2 82.2 54.2 111.8 43 84.3 30.7 68.5 33 90.8 42.8 78.5 30.9 71.7 28.6 69.8 36.6 83.1 41.1 93.9 26.6 63.9 45.5 95.5 What is the correlation coefficient? use three decimal places. r =   What is the regression line equation. Use each value to three decimal places. ˆyy^ =  +  x What is the predicted value of the response variable, when using a...
Use RStudio. The carsafety data set in the UsingR package has records of the number of...
Use RStudio. The carsafety data set in the UsingR package has records of the number of passenger deaths in the “Other.deaths” column and the type of vehicle in the type column. Determine which type of vehicle is the deadliest for passengers by looking at difference in variance between groups.
Does the input requirement set V (y) = {(x1, x2, x3) | x1 + min {x2,...
Does the input requirement set V (y) = {(x1, x2, x3) | x1 + min {x2, x3} ≥ 3y, xi ≥ 0 ∀ i = 1, 2, 3} corresponds to a regular (closed and non-empty) input requirement set? Does the technology satisfies free disposal? Is the technology convex?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT