Question

In: Statistics and Probability

The data set mantel in the alr4 package has a response Y and three predictors x1,...

The data set mantel in the alr4 package has a response Y and three predictors x1, x2 and x3.

(a) Apply the forward selection and backward elimination algorithms, using AIC as a criterion function. Report your findings.

(b) Use regsubsets() function in R to determine the best model. Which appear to be the important predictors? What’s the final model? Explain your reasoning.

Solutions

Expert Solution

R-code is -

install.packages("alr4")
library(alr4)
names(mantel)
attach(mantel)
fit=lm(Y~.,data=mantel)

#Backward Selection based on AIC
library(MASS)
step <- stepAIC(fit, direction="backward")
summary(step)

Output-

Call:
lm(formula = Y ~ X1 + X2, data = mantel)

Residuals:
1 2 3 4 5
1.212e-14 -1.824e-14 2.743e-15 5.123e-15 -1.747e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 4.294e-12 -2.329e+14 <2e-16 ***
X1 1.000e+00 4.250e-15 2.353e+14 <2e-16 ***
X2 1.000e+00 4.266e-15 2.344e+14 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.607e-14 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 4.415e+28 on 2 and 2 DF, p-value: < 2.2e-16

> summary(step1)

Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)

Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15

R-code-

#Forward Selection based on AIC
step1 <- stepAIC(fit, direction="forward")
summary(step1)

Output-

Call:
lm(formula = Y ~ X1 + X2 + X3, data = mantel)

Residuals:
1 2 3 4 5
1.372e-14 -1.609e-14 -2.054e-15 2.142e-15 2.280e-15

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.501e-11 -6.660e+13 9.56e-15 ***
X1 1.000e+00 1.501e-14 6.661e+13 9.56e-15 ***
X2 1.000e+00 1.501e-14 6.664e+13 9.55e-15 ***
X3 4.108e-15 1.186e-14 3.460e-01 0.788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.147e-14 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.648e+28 on 3 and 1 DF, p-value: 5.726e-15

R-code-

install.packages("olsrr")
library(olsrr)
ols_best_subset(fit)

Note that regressors X1,X2 appear to be the active regressors beacuse these regressors selected in Backward Selection ,Forward Selection & all subsets feature selection methods.

Best Subsets Regression
-----------------------
Model Index Predictors
-----------------------
1 X3
2 X1 X2
3 X1 X2 X3
-----------------------

Subsets Regression Summary
---------------------------------------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP FPE HSP APC
---------------------------------------------------------------------------------------------------------------------------------------
1 0.9074 0.8765 0.7461 4.58082e+27 15.8806 -4.3087 14.7090 1.2673 0.9857 0.3520 0.2162
2 1.0000 1.0000 1 2.1199 -300.0114 -317.4200 -301.5737 0.0000 0.0000 0.0000 0.0000
3 1.0000 1.0000 1 4.0000 -298.5777 -312.7671 -300.5306 Inf 0.0000 Inf 0.0000
---------------------------------------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa's Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking's Sp
APC: Amemiya Prediction Criteria


Related Solutions

question 23 Given a data with y as a response variable and x1,x2, and x3 as...
question 23 Given a data with y as a response variable and x1,x2, and x3 as explanatory variable, a regression equation relates y to x1 and another relates y to x1,x2, and x3. Calculate the first degree of freedom df1 for testing H0:β2=β3=0,HA:β2≠0orβ3≠0. A. 1 B. 2 C. 3 D. 4 question 25 The following table shows the output of a regression model to explain SAT math scores. Coefficient Standard Error T Stat p-value Intercept 650.11 117.42 5.54 0.000 x...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 15.5 58.8 25.4 62.9 53.7 68.8 46.5 78.6 28.5 57.5 5.7 68.1 -0.4 67.8 43.8 87.7 23.1 64.9 31.3 81.3 48.2 80.1 15.9 71.1 1) Find the correlation coefficient and report it accurate to three decimal places. r = 2) What proportion of the variation in y can be explained by the variation in the values of x? Report answer as...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 81 81.3 92.6 90.8 80.1 94.9 77.8 53.4 89.4 102.9 70.3 38.2 90.2 98 81.4 94.6 94.9 122.4 77.2 42.1 70.6 47.8 71 50.6 Find the correlation coefficient and report it accurate to three decimal places. r = What proportion of the variation in y can be explained by the variation in the values of x? Report answer as a percentage...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 73 16.1 80 14.9 72.5 8.5 55.8 33.6 54.6 23.4 76.6 26.2 74.6 19.1 40.2 40.6 58.7 25.8 Find the correlation coefficient and report it accurate to three decimal places. What proportion of the variation in y can be explained by the variation in the values of x? Report answer as a percentage accurate to one decimal place. (If the answer...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 26.1 54.3 28.4 42.5 33.8 50.2 63.4 63.3 64.3 79.4 72.5 76.1 46.2 57.4 70.2 82.1 50.5 64.3 55.4 58.8 40.7 48.5 69.4 57.5 40.4 47.8 45.1 60 64 72.5 50.6 56.9 44.2 65.6 Verify that the correlation is significant at an α = 0.05 . If the correlation is indeed significant, predict what value (on average) for the explanatory variable...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 58.6 56.3 66.3 72.1 54.8 119.2 57.4 83.2 62.8 74.3 77.6 72.2 71.8 62.2 46.4 77.2 77.4 86.6 60.5 78.4 66.4 131.3 76.4 113.5 68.5 84.2 81.5 102 77.5 136.3 49 12.2 Verify that the correlation is significant at an α=0.05. If the correlation is indeed significant, predict what value (on average) for the explanatory variable will give you a value...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 37.2 26.9 66.2 41.4 80.9 37.6 83.7 45 55 31.2 46.3 29.1 82 44.1 71.6 36.1 54.7 29.3 56.4 33.8 Predict what value (on average) for the response variable will be obtained from a value of 45.1 as the explanatory variable. Use a significance level of α = 0.05 to assess the strength of the linear correlation. What is the predicted...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 62.4 39.2 62.2 40.4 47.7 68.9 120.5 -36.5 49 37.9 69.1 -9.4 53.2 71.8 66 13.8 87 8.4 92.3 -1.6 70.3 17.8 Verify that the correlation is significant at an α=0.05α=0.05. If the correlation is indeed significant, predict what value (on average) for the explanatory variable will give you a value of 67.7 on the response variable. What is the predicted...
Run a regression analysis on the following bivariate set of data with y as the response...
Run a regression analysis on the following bivariate set of data with y as the response variable. x y 6.7 -14.1 29.8 23.2 61.4 85.8 23.6 11.5 1.4 -54.8 29.5 22 54.2 30.9 35.1 7.9 23.3 13.2 29.5 6.7 27.3 29.2 18.6 -15.7 Find the correlation coefficient and report it accurate to three decimal places. r = What proportion of the variation in y can be explained by the variation in the values of x? Report answer as a percentage...
Find the regression equation using the following set of data with y as the response variable....
Find the regression equation using the following set of data with y as the response variable. x y 40.2 82.2 54.2 111.8 43 84.3 30.7 68.5 33 90.8 42.8 78.5 30.9 71.7 28.6 69.8 36.6 83.1 41.1 93.9 26.6 63.9 45.5 95.5 What is the correlation coefficient? use three decimal places. r =   What is the regression line equation. Use each value to three decimal places. ˆyy^ =  +  x What is the predicted value of the response variable, when using a...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT