In: Statistics and Probability
Use α = 0.05 unless told otherwise.
--Everything should be r-code base.
--data set is built-in in r code. Just type in LifeCycleSavings.
--DATA ALREADY EXIST IN R. PLEASE JUST TYPE IN LifeCycleSavings in R.
--DATA IS NOT MISSING
We want to build a multiple regression model to predict sr (the “Savings Ratio”) in the LifeCycleSavings dataset (see ?LifeCycleSavings for more background info).
a. Build a model that uses all the other variables in the data frame as predictors, including all their two-way interactions (I’ll call this the full model). Using an ?-test, does it appear that at least one of the predictors is significant/useful?
b. Regardless of your answer to part a, look at the individual ?-tests for the predictors. Which of the predictors look significant/useful?
c. Use the backwards stepwise procedure to select a reduced (smaller) model from the full model. Which predictors are included in this reduced model?
d. Compare and contrast both the ? 2 s and the adjusted ? 2 s for the full model to the reduced model. What do you observe?
e. Conduct a partial ?-test to compare the reduced model to the full model. Does it appear that we lost anything of value by removing those predictors?
f. In the reduced model, there should be one predictor that looks very insignificant based on its ?-test. Which one is it? And why do you think the stepwise procedure decided to keep it in the model? [Take your best guess on the second part there; we can discuss this idea more.]
***Everything should be in r-code base. Specific explanation with code is needed. thanks in advance.
AS per our Guidelines, We are only allowed to Answer 4 Subparts.
the r codes are as follows
# ans A
LifeCycleSavings
head(LifeCycleSavings)
fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data =
LifeCycleSavings)
summary(fm1)
#fm1 is the full model and fm2 is the reduced model.
# f test
#from the summary of fm1 (fitted model 1), we have the value of F
statistic and the P-value i.e.
#5.756 and 0.0007904 respectively. And the Level of significance is
0.05.
#since the p-value < L.O.S . we can reject the null hypothesis
of independent variables not contributing to the dependent
variable.
#therefore it appears that at least one of the predictors is
significant.
# Ans B
#from the coefficient table, we get to see the P-values of all the
predictors.
#implying for those predictors having values < L.O.S indicates
those predictors are not significant.
#which indicates pop15, pop75, and ddpi are the significant
predictors. highlighted by a star mark.
#ans (C),
step(fm1,direction="backward")
#or
null=lm(sr~1,data=LifeCycleSavings)
full=lm(sr~pop15 + pop75 + dpi + ddpi, data =
LifeCycleSavings)
step(full,scope=list(lower=null,upper=full),direction="backward")
#In this method, the predictors are entered and removed in a
stepwise manner, until there is no justifiable reason to enter or
remove more.
#instead of p values, R uses AIC (Akaike Information Criterion)
values instead of p values.
#the smaller the value of AIC, the better is the model.
#it starts with AIC of 138.3 in the first step. it states that if
we remove dpi the AIC of the model becomes 136.45 < 138.3 Since
we want smaller values of AIC
#In the Second Step, it Starts with AIC value of 136.45, and
suggests if we remove nothing then the model will be better.
#therefore we have now the reduced model as sr~ pop15 + pop75 +
ddpi
#fitting that model now, we get,
fm2=lm(sr~pop15 + pop75 + ddpi, data = LifeCycleSavings)
# ans (D)
summary(fm1)
summary(fm2)
#from here we observe the r squared value for the full model is
0.3385 and for the reduced model is 0.3365.
#meaning R squared (full model) > R squared (reduced Model)
#The Adjusted R squared for the full model is 0.2797 whereas the
Adjusted R squared value for the Reduced model is 0.2933.
#which signifies that the reduced model is actually a better model
since it has greater Adjusted R squared value.