Question

In: Statistics and Probability

We want to build a multiple regression model to predict sr (the “Savings Ratio”) in the LifeCycleSavings datase

Use α = 0.05 unless told otherwise.

--Everything should be r-code base.

--data set is built-in in r code. Just type in LifeCycleSavings.

--DATA ALREADY EXIST IN R. PLEASE JUST TYPE IN LifeCycleSavings in R.

--DATA IS NOT MISSING

We want to build a multiple regression model to predict sr (the “Savings Ratio”) in the LifeCycleSavings dataset (see ?LifeCycleSavings for more background info).

a. Build a model that uses all the other variables in the data frame as predictors, including all their two-way interactions (I’ll call this the full model). Using an ?-test, does it appear that at least one of the predictors is significant/useful?

b. Regardless of your answer to part a, look at the individual ?-tests for the predictors. Which of the predictors look significant/useful?

c. Use the backwards stepwise procedure to select a reduced (smaller) model from the full model. Which predictors are included in this reduced model?

d. Compare and contrast both the ? 2 s and the adjusted ? 2 s for the full model to the reduced model. What do you observe?

e. Conduct a partial ?-test to compare the reduced model to the full model. Does it appear that we lost anything of value by removing those predictors?

f. In the reduced model, there should be one predictor that looks very insignificant based on its ?-test. Which one is it? And why do you think the stepwise procedure decided to keep it in the model? [Take your best guess on the second part there; we can discuss this idea more.]

***Everything should be in r-code base. Specific explanation with code is needed. thanks in advance.

Solutions

Expert Solution

AS per our Guidelines, We are only allowed to Answer 4 Subparts.

the r codes are as follows

# ans A


LifeCycleSavings
head(LifeCycleSavings)
fm1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
summary(fm1)


#fm1 is the full model and fm2 is the reduced model.


# f test


#from the summary of fm1 (fitted model 1), we have the value of F statistic and the P-value i.e.


#5.756 and 0.0007904 respectively. And the Level of significance is 0.05.


#since the p-value < L.O.S . we can reject the null hypothesis of independent variables not contributing to the dependent variable.


#therefore it appears that at least one of the predictors is significant.


# Ans B


#from the coefficient table, we get to see the P-values of all the predictors.


#implying for those predictors having values < L.O.S indicates those predictors are not significant.


#which indicates pop15, pop75, and ddpi are the significant predictors. highlighted by a star mark.


#ans (C),


step(fm1,direction="backward")
#or
null=lm(sr~1,data=LifeCycleSavings)
full=lm(sr~pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)
step(full,scope=list(lower=null,upper=full),direction="backward")


#In this method, the predictors are entered and removed in a stepwise manner, until there is no justifiable reason to enter or remove more.


#instead of p values, R uses AIC (Akaike Information Criterion) values instead of p values.


#the smaller the value of AIC, the better is the model.


#it starts with AIC of 138.3 in the first step. it states that if we remove dpi the AIC of the model becomes 136.45 < 138.3 Since we want smaller values of AIC


#In the Second Step, it Starts with AIC value of 136.45, and suggests if we remove nothing then the model will be better.


#therefore we have now the reduced model as sr~ pop15 + pop75 + ddpi


#fitting that model now, we get,

fm2=lm(sr~pop15 + pop75 + ddpi, data = LifeCycleSavings)


# ans (D)


summary(fm1)
summary(fm2)


#from here we observe the r squared value for the full model is 0.3385 and for the reduced model is 0.3365.


#meaning R squared (full model) > R squared (reduced Model)


#The Adjusted R squared for the full model is 0.2797 whereas the Adjusted R squared value for the Reduced model is 0.2933.


#which signifies that the reduced model is actually a better model since it has greater Adjusted R squared value.


Related Solutions

Suppose we wish to build a multiple regression model to predict the cost of rent (dollars)...
Suppose we wish to build a multiple regression model to predict the cost of rent (dollars) in a city based on population (thousands of people), and income (thousands of dollars). Use the alpha level of 0.05. City Monthly Rent ($) 2018 Population (Thousands) 2010 Median Income (Thousands of Dollars) Denver, CO 998 586.158 45.438 Birmingham, AL 711 212.237 301.704 San Diego, CA 1414 1307.402 61.962 Gainesville, FL 741 124.354 28.653 Winston-Salem, NC 750 239.617 41.979 Memphis, TN 819 646.889 36.535...
Suppose we wish to build a multiple regression model to predict the cost of rent (dollars)...
Suppose we wish to build a multiple regression model to predict the cost of rent (dollars) in a city based on population (thousands of people), and income (thousands of dollars). Use the alpha level of 0.05. A. Is the whole regression model effective in predicting the cost of rent? Use alpha of 0.1. Make sure to show which values you use to make the decision. B. Write down the multiple regression equation using actual names of IVs and DVs. C....
A multiple regression model is to be constructed to predict the final exam score of a...
A multiple regression model is to be constructed to predict the final exam score of a university student doing a particular course based upon their mid-term exam score, the average number of hours spent studying per week and the average number of hours spent watching television per week. Data has been collected on 30 randomly selected individuals: hide data Download the data Final score Mid-term Score Hours studying per week Hours watching TV per week 76 85 19 34 60...
A multiple regression model is to be constructed to predict the heart rate in beats per...
A multiple regression model is to be constructed to predict the heart rate in beats per minute (bpm) of a person based upon their age, weight and height. Data has been collected on 30 randomly selected individuals: hide data Heart Rate (bpm) Age (yrs) Weight (lb) Height (in) 78 23 245 70 91 44 223 68 79 42 178 67 60 33 200 58 57 25 99 68 59 35 123 64 78 30 204 62 98 56 200 63...
Use the following data to develop a multiple regression model to predict from and . Discuss...
Use the following data to develop a multiple regression model to predict from and . Discuss the output, including comments about the overall strength of the model, the significance of the regression coefficients, and other indicators of model fit. y x1 x2 198 29 1.64 214 71 2.81 211 54 2.22 219 73 2.70 184 67 1.57 167 32 1.63 201 47 1.99 204 43 2.14 190 60 2.04 222 32 2.93 197 34 2.15 Appendix A Statistical Tables *(Round...
Use Excel to develop a multiple regression model to predict Cost of Materials by Number of...
Use Excel to develop a multiple regression model to predict Cost of Materials by Number of Employees, New Capital Expenditures, Value Added by Manufacture, and End-of-Year Inventories. Locate the observed value that is in Industrial Group 12 and has 7 employees. Based on the model and the multiple regression output, what is the corresponding residual of this observation? Write your answer as a number, round to 2 decimal places. SIC Code No. Emp. No. Prod. Wkrs. Value Added by Mfg....
1. In the iris data, build a linear regression model to predict Sepal.Length based on both...
1. In the iris data, build a linear regression model to predict Sepal.Length based on both Petal.Length and Species. a. Calculate the regression equation, including the interaction. b. From this equation, you should be able to find 3 regression lines (one for each Species). Interpret each of the 3 slopes of the lines in the context of the problem. Remember that both numerical variables are measured in centimeters. c. Plot the 3 regression lines in a scatterplot of Sepal.Length vs....
When we estimate a linear multiple regression model (including a linear simple regression model), it appears...
When we estimate a linear multiple regression model (including a linear simple regression model), it appears that the calculation of the coefficient of determination, R2, for this model can be accomplished by using the squared sample correlation coefficient between the original values and the predicted values of the dependent variable of this model. Is this statement true? If yes, why? If not, why not? Please use either matrix algebra or algebra to support your reasoning.
1.Develop a multiple linear regression model to predict the price of a house using the square...
1.Develop a multiple linear regression model to predict the price of a house using the square feet of living area, number of bedrooms, and number of bathrooms as the predictor variables     Write the reqression equation.      Discuss the statistical significance of the model as a whole using the appropriate regression statistic at a 95% level of confidence. Discuss the statistical significance of the coefficient for each independent variable using the appropriate regression statistics at a 95% level of confidence....
1) A multiple regression model to predict nacho sales at a baseball game yields the following...
1) A multiple regression model to predict nacho sales at a baseball game yields the following coefficients: Intercept 1500 Home Team Score 80 Temp. (Degrees F) 100 Home Team Loss? -2000 Assuming that all variables in this model are significant, what would be the expected result of the home team scoring another run? A- There would be no effect B- Nacho sales would increase by 80 C- Nacho sales would decrease by 80 D- Not enough info 2) The regression...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT