In: Statistics and Probability
Homeowners across the nation are concerned with how much their electric bill is each month. The American Housing Survey collects that information along with a whole host of other variables including the number of bedrooms in a house. Can the number of bedrooms in a house help us predict the electric bill? Twenty randomly selected houses produced the key numeric measures below where x = number of bedrooms and y = amount of electric bill in dollars.
n=20
Σx = 58 (sum x)
Σy = 2690 (sum y)
Σx2 = 190 (sum x^2)
Σy2 = 489700 (sum y^2)
Σxy = 8670 (sum xy)
SSxx = 21.8
SSyy = 127895
SSxy = 869
1. Find the sample slope coefficient for the estimated simple linear regression equation between number of bedrooms and the electric bill.
2. Find the y-intercept for the estimated simple linear regression equation between number of bedrooms and electric bill.
3. Write the estimated simple linear regression equation between number of bedrooms and electric bill.
4. Predict the amount of the electric bill in a house with 4 bedrooms.
5. Find the Sum of Squares due to Regression for the estimated simple linear regression equation between number of bedrooms and electric bill.
6. Find the Sum of Squares due to Error for the estimated simple linear regression equation between number of bedrooms and electric bill.
7. Find the coefficient of determination for the estimated simple linear regression equation between number of bedrooms and electric bill.
8. Interpret the coefficient of determination for the estimated simple linear regression equation between number of bedrooms and electric bill.
9. Find the correlation coefficient for the estimated simple linear regression equation between number of bedrooms and electric bill.
10. Find the Mean Square Error for the estimated simple linear regression equation between number of bedrooms and electric bill.
11. Find the standard error of the estimate for the estimated simple linear regression equation between number of bedrooms and electric bill.
12. Find the standard error of the sample slope coefficient for the estimated simple linear regression equation between number of bedrooms and electric bill.
13. Find the standard error of y^∗ when given x* = 4
(More exact wording: Find the standard error of the mean electric bill for houses with 4 bedrooms)
14. Find the standard error of y* when given x*=4
(More exact wording: Find the standard error of the electric bill for a single house with 4 bedrooms)
15. Use a hypothesis test to try to prove that there is a useful linear relationship between the number of bedrooms and the electric bill using a significance level of 0.05.
16. Construct a 95% confidence interval for the population slope coefficient of the simple linear regression equation between number of bedrooms and electric bill. (you do not need to interpret)
17. Construct a 95% confidence interval for the mean electric bill for all houses with 4 bedrooms. (you do not need to interpret)
18. Construct a 95% prediction interval for the electric bill for a single house with 4 bedrooms. (you do not need to interpret)
Ʃx = | 58 |
Ʃy = | 2690 |
Ʃxy = | 8670 |
Ʃx² = | 190 |
Ʃy² = | 489700 |
Sample size, n = | 20 |
x̅ = Ʃx/n = 58/20 = | 2.9 |
y̅ = Ʃy/n = 2690/20 = | 134.5 |
SSxx = Ʃx² - (Ʃx)²/n = 190 - (58)²/20 = | 21.8 |
SSyy = Ʃy² - (Ʃy)²/n = 489700 - (2690)²/20 = | 127895 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 8670 - (58)(2690)/20 = | 869 |
1.
Slope, b = SSxy/SSxx = 869/21.8 = 39.86238532
2 .
y-intercept, a = y̅ -b* x̅ = 134.5 - (39.86239)*2.9 = 18.89908257
3.
Regression equation :
ŷ = 18.8991 + (39.8624) x
4.
Predicted value of y at x = 4
ŷ = 18.8991 + (39.8624) * 4 = 178.3486
5.
SSR = SSxy²/SSxx = (869)²/21.8 = 34640.4128
6.
SSE = SSyy -SSxy²/SSxx = 127895 - (869)²/21.8 = 93254.5872
7.
Coefficient of determination, r² = (SSxy)²/(SSxx*SSyy) = (869)²/(21.8*127895) = 0.2709
8.
27.09% variation in y is explained by the least squares model.
9.
Correlation coefficient, r = SSxy/√(SSxx*SSyy) = 869/√(21.8*127895) = 0.5204
10.
Estimate of variance, MSE = SSE/(n-2) = 93254.58716/(20-2) = 5180.8104
11.
Standard error, se = √(SSE/(n-2)) = √(93254.58716/(20-2)) = 71.97785
12.
Standard error for slope, se(b1) = se/√SSxx = 71.97785/√21.8 = 15.41596
13.
standard error of y^∗ when given x* = 4
Sy^*= se*√((1/n) + ((x-x̅)²/(SSxx))) = 71.9778*√((1/20) + ((4 - 2.9)²/(21.8))) = 23.3795
14.
standard error of y* when given x*=4
Sy* = se*√(1 + (1/n) + ((x-x̅)²/(SSxx))) = 71.9778*√(1 + (1/20) + ((4 - 2.9)²/(21.8))) = 75.67965
15.
Null and alternative hypothesis:
Ho: β₁ = 0
Ha: β₁ ≠ 0
n = 20
α = 0.05
Test statistic:
t = b1/se(b1) = 18.8991/47.5152 = 2.5858
df = n-2 = 18
p-value = T.DIST.2T(ABS(2.5858), 18) = 0.0186
Conclusion:
p-value < α Reject the null hypothesis.
There is a useful relationship.
16.
Critical value, t_c = T.INV.2T(0.05, 18) =
2.1009
95% Confidence interval for slope:
Lower limit = β₁ - tc*se/√SSxx = 39.8624 - 2.1009*71.9778/√21.8 =
7.4747
Upper limit = β₁ + tc*se/√SSxx = 39.8624 + 2.1009*71.9778/√21.8 =
72.2501
17.
95% Confidence interval :
Lower limit = ŷ - tc*se*√((1/n) + ((x-x̅)²/(SSxx))) = 178.3486 -
2.1009*71.9778*√((1/20) + ((4 - 2.9)²/(21.8))) =
129.2302
Upper limit = ŷ + tc*se*√((1/n) + ((x-x̅)²/(SSxx))) = 178.3486 +
2.1009*71.9778*√((1/20) + ((4 - 2.9)²/(21.8))) =
227.4671
18.
95% Prediction interval :
Lower limit = ŷ - tc*se*√(1 + (1/n) + ((x-x̅)²/(SSxx))) = 178.3486
- 2.1009*71.9778*√(1 + (1/20) + ((4 - 2.9)²/(21.8))) =
19.3516
Upper limit = ŷ + tc*se*√(1 + (1/n) + ((x-x̅)²/(SSxx))) = 178.3486
+ 2.1009*71.9778*√(1 + (1/20) + ((4 - 2.9)²/(21.8))) =
337.3457