In: Statistics and Probability
Part 1. Consider the dataset below. You will perform a series of regressions and data transformations. Be sure to keep a record of all your computer results. First, please perform a simple linear regression. Predict Y if X = 40. To avoid rounding errors in ALL your calculations, please perform your calculations on your spreadsheet referencing data from your regression output.
|
X |
Y |
|
54 |
6 |
|
42 |
16 |
|
28 |
33 |
|
38 |
18 |
|
25 |
41 |
|
70 |
3 |
|
48 |
10 |
|
41 |
14 |
|
20 |
45 |
|
52 |
9 |
|
65 |
5 |
|
14.0363 |
||
|
14.1891 |
||
|
17.2164 |
||
|
21.5627 |
||
|
None of the above |
Part 2
Please perform a polynomial regression of Y against X and X-squared. What is the coefficient of the curvature term?
|
92.8725 |
||
|
-2.7222 |
||
|
0.0208 |
||
|
0.9851 |
||
|
None of the above |
Part 3.
With other OLS regression conditions satisfied, can we utilize the estimated equation of this model to predict Y?
|
Yes. The regression is statistically significant and the coefficient of determination is reasonably high. |
||
|
No. Although the regression is significant, there is a high likelihood that multicollinearity is a problem due to the inclusion of the X-squared term. |
||
|
No. The regression model is nonlinear and cannot therefore be utilized to make a forecast. |
||
|
No. The standard error of the regression is too high, indicating that the unexplained variation (error term) exhibits heteroscedasticity. |
||
|
None of the above |
Part 4.
Based on the parameter estimates of the quadratic model, predict Y if X = 40.
|
14.0363 |
||
|
14.1891 |
||
|
17.2164 |
||
|
21.5627 |
||
|
19.1523 |
Part 5.
Perform a logarithmic transformation of only Y on the original dataset. That is, ln(Y) = B0 + B1(X) + e. Then predict Y if X = 40.
|
21.5627 |
||
|
17.2164 |
||
|
2.7885 |
| SUMMARY OUTPUT | |||||
| Regression Statistics | |||||
| Multiple R | 0.932145801 | ||||
| R Square | 0.868895795 | ||||
| Adjusted R Square | 0.854328661 | ||||
| Standard Error | 5.64255772 | ||||
| Observations | 11 | ||||
| ANOVA | |||||
| df | SS | MS | F | Significance F | |
| Regression | 1 | 1899.090245 | 1899.090245 | 59.64768355 | 2.92936E-05 |
| Residual | 9 | 286.5461186 | 31.83845762 | ||
| Total | 10 | 2185.636364 | |||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
| Intercept | 56.15733314 | 5.203079582 | 10.79309518 | 1.88916E-06 | 44.3871494 |
| X | -0.8648668 | 0.111983087 | -7.72319128 | 2.92936E-05 | -1.118190142 |
part 1)
y^ = 56.1573 - 0.8649 *x
= 56.1573 - 0.8649 *40
= 21.5627
part 2)
| Y | X | x^2 |
| 6 | 54 | 2916 |
| 16 | 42 | 1764 |
| 33 | 28 | 784 |
| 18 | 38 | 1444 |
| 41 | 25 | 625 |
| 3 | 70 | 4900 |
| 10 | 48 | 2304 |
| 14 | 41 | 1681 |
| 45 | 20 | 400 |
| 9 | 52 | 2704 |
| 5 | 65 | 4225 |
| SUMMARY OUTPUT | |||||
| Regression Statistics | |||||
| Multiple R | 0.994003876 | ||||
| R Square | 0.988043706 | ||||
| Adjusted R Square | 0.985054633 | ||||
| Standard Error | 1.807349945 | ||||
| Observations | 11 | ||||
| ANOVA | |||||
| df | SS | MS | F | Significance F | |
| Regression | 2 | 2159.504253 | 1079.752 | 330.5518 | 2.04E-08 |
| Residual | 8 | 26.1321106 | 3.266514 | ||
| Total | 10 | 2185.636364 | |||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
| Intercept | 92.87251293 | 4.436918394 | 20.93176 | 2.85E-08 | 82.64096 |
| X | -2.722212931 | 0.211088773 | -12.8961 | 1.24E-06 | -3.20898 |
| x^2 | 0.020770253 | 0.002326226 | 8.928735 | 1.96E-05 | 0.015406 |
coefficient of the curvature term = 0.0208
part 3)
A) Yes. The regression is statistically significant and the coefficient of determination is reasonably high.
part 4)
| 17.21640086 |
part 5)
| ln (Y) | Y | X |
| 1.791759 | 6 | 54 |
| 2.772589 | 16 | 42 |
| 3.496508 | 33 | 28 |
| 2.890372 | 18 | 38 |
| 3.713572 | 41 | 25 |
| 1.098612 | 3 | 70 |
| 2.302585 | 10 | 48 |
| 2.639057 | 14 | 41 |
| 3.806662 | 45 | 20 |
| 2.197225 | 9 | 52 |
| 1.609438 | 5 | 65 |
| SUMMARY OUTPUT | ||||||||
| Regression Statistics | ||||||||
| Multiple R | 0.991254363 | |||||||
| R Square | 0.982585213 | |||||||
| Adjusted R Square | 0.980650236 | |||||||
| Standard Error | 0.122438958 | |||||||
| Observations | 11 | |||||||
| ANOVA | ||||||||
| df | SS | MS | F | Significance F | ||||
| Regression | 1 | 7.612614045 | 7.612614 | 507.8022 | 3.16E-09 | |||
| Residual | 9 | 0.134921686 | 0.014991 | |||||
| Total | 10 | 7.747535731 | ||||||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
| Intercept | 4.978748604 | 0.112902636 | 44.09772 | 7.92E-12 | 4.723345 | 5.234152 | 4.723345 | 5.234152 |
| X | -0.054757465 | 0.002429943 | -22.5345 | 3.16E-09 | -0.06025 | -0.04926 | -0.06025 | -0.04926 |
ln y^ = 4.9787 -0.0547 * x
y^ =
| 16.25580413 |