In: Statistics and Probability
1. A researcher would like to predict the dependent variable YY
from the two independent variables X1X1 and X2X2 for a sample of
N=15N=15 subjects. Use multiple linear regression to calculate the
coefficient of multiple determination and test the significance of
the overall regression model. Use a significance level
α=0.05α=0.05.
| X1X1 | X2X2 | YY |
|---|---|---|
| 66.4 | 76.4 | 58 |
| 34.6 | 39 | 65.5 |
| 32.7 | 23.1 | 65.8 |
| 44.4 | 71.2 | 73.3 |
| 57.3 | 50.8 | 57.9 |
| 32.7 | 48 | 74.6 |
| 53.3 | 51.4 | 64.4 |
| 48.3 | 51.1 | 59.2 |
| 66.9 | 81.4 | 59.4 |
| 31.2 | 40.1 | 59.1 |
| 44.6 | 33.1 | 54.8 |
| 45.5 | 49.7 | 65.7 |
| 30.5 | 9.9 | 44.4 |
| 62.8 | 75.3 | 66.7 |
| 56.1 | 61 | 51.2 |
SSreg=
SSres=
R2=
F=
P-value =
2.
Complete the missing information for this regression model. Note: N=23N=23.
| ˆYY^ | = | 29.692 | + | 2.311X12.311X1 | −- | 1.374X21.374X2 | + | X3X3 | |
| 2.195 | 1.005 | 1.016 | 1.415 | Standard Errors | |||||
| 13.527 | 2.3 | t-ratios | |||||||
| 0.059 | P-values | ||||||||
(Except for P-values, report all values accurate to 3 decimal places. For P-values, report accurate to 4 decimal places.)
In order to solve this question I used R software.
R codes and output:
> d=read.table('data.csv',header=T,sep=',')
> head(d)
X1 X2 Y
1 66.4 76.4 58.0
2 34.6 39.0 65.5
3 32.7 23.1 65.8
4 44.4 71.2 73.3
5 57.3 50.8 57.9
6 32.7 48.0 74.6
> attach(d)
The following objects are masked from d (pos = 3):
X1, X2, Y
> fit=lm(Y~X1+X2)
> summary(fit)
Call:
lm(formula = Y ~ X1 + X2)
Residuals:
Min 1Q Median 3Q Max
-8.9976 -3.0325 0.4088 3.9166 8.1187
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.8802 6.0662 11.355 8.93e-08 ***
X1 -0.6906 0.2119 -3.259 0.00684 **
X2 0.4928 0.1358 3.630 0.00345 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.89 on 12 degrees of freedom
Multiple R-squared: 0.528, Adjusted R-squared: 0.4493
F-statistic: 6.711 on 2 and 12 DF, p-value: 0.01106
> anova(fit)
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X1 1 8.57 8.57 0.2469 0.628242
X2 1 457.15 457.15 13.1757 0.003452 **
Residuals 12 416.36 34.70
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
SSReg =SS(X1) +SS(X2) = 8.57 + 457.15 = 465.72
SSres = 416.36
R2 = 0.528
F = 6.711 [ It is in the output of summary(fit) command ]
P-value = 0.01106