In: Statistics and Probability
Suppose an environmental agency would like to investigate the relationship between the engine size of sedans, x, and the miles per gallon (MPG), y, they get. The accompanying table shows the engine size in cubic liters and rated miles per gallon for a selection of sedans. The regression line for the data is Y hat=35.9500−3.8750x.
Use this information to complete the parts below
Engine Size MPG
2.4 25
2.2 31
2.2 24
3.4 21
3.6 24
2.1 29
2.5 25
2.1 29
3.9 21
a) Calculate the coefficient of determination. R2=? (Round to three decimal places as needed.)
b) Using α=0.05, test the significance of the population coefficient of determination.
Determine the null and alternative hypotheses.
c) The F-test statistic is? (Round to two decimal places as needed.)
d) the p-value is? (Round to three decimal places as needed.)
e) Construct a 95% confidence interval for the average MPG of a 2.5-cubic liter engine.
UCL= ? (Round to two decimal places as needed.)
LCL= ? (Round to two decimal places as needed.)
f) Construct a 95% prediction interval for the MPG of a 2.5-cubic liter engine.
UPL= ? (Round to two decimal places as needed.)
LPL= ? (Round to two decimal places as needed.)
Solution:
In R studio ,
We created a dataframe or table named df1,
lm function to fit linear regression of MPG on Enginesize
summary function to get the coeffcients
Predict function to get the prediction interval and confidence interval
Rcode:
df1 =read.table(header = TRUE, text ="
EngineSize MPG
2.4 25
2.2 31
2.2 24
3.4 21
3.6 24
2.1 29
2.5 25
2.1 29
3.9 21
"
)
df1
linreg=lm(MPG~EngineSize ,data=df1)
anova(linreg)
attach(df1)
newdata=data.frame(EngineSize=2.5)
predict(linreg,newdata,interval="confidence",level=0.95)
predict(linreg,newdata,interval="predict",level=0.95)
Output:
Call:
lm(formula = df1$Eating_in_restaurants ~
df1$Monthly_net_income,
data = df1)
Residuals:
1 2 3 4 5 6 7
-9.604 -19.813 0.815 33.590 -4.394 -4.604 4.009
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -98.72247 57.47911 -1.718 0.1465
df1$Monthly_net_income 0.13194 0.04539 2.907 0.0335
(Intercept)
df1$Monthly_net_income *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.28 on 5 degrees of freedom
Multiple R-squared: 0.6282, Adjusted R-squared:
0.5539
F-statistic: 8.449 on 1 and 5 DF, p-value: 0.03353
> anova(linreg)
Analysis of Variance Table
Response: MPG
Df Sum Sq Mean Sq F value Pr(>F)
EngineSize 1 61.397 61.397 11.07 0.01264 *
Residuals 7 38.825 5.546
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> attach(df1)
The following objects are masked from df1 (pos = 3):
EngineSize, MPG
> newdata=data.frame(EngineSize=2.5)
> predict(linreg,newdata,interval="confidence",level=0.95)
fit lwr upr
1 26.2625 24.31728 28.20772
> predict(linreg,newdata,interval="predict",level=0.95)
fit lwr upr
1 26.2625 20.36365 32.16135
coefficient of determination. R2=0.628
b) Using α=0.05, test the significance of the population coefficient of determination.
Determine the null and alternative hypotheses.
Null Hypothesis:
Ho:
Ha:
c) The F-test statistic is? (Round to two decimal places as needed.)
F=11.07
d) the p-value is? (Round to three decimal places as needed.)
p=0.013
e) Construct a 95% confidence interval for the average MPG of a 2.5-cubic liter engine.
UCL= 28.21
LCL= 24.32
f) Construct a 95% prediction interval for the MPG of a 2.5-cubic liter engine.
UCL=32.16
LCL= 20.36