In: Statistics and Probability
Data from n = 113 hospitals in the United States are used to assess factors related to the likelihood that a hospital patients acquires an infection while hospitalized. The variables here are y = infection risk, x1 = average length of patient stay, x2 = average patient age, x3 = measure of how many x-rays are given in the hospital. The Minitab output is as follows:
Regression Analysis: InfctRsk versus Stay, Age, Xray
Analysis of Variance
Source |
DF |
Adj SS |
Adj MS |
F-Value |
P-Value |
Regression |
3 |
73.099 |
24.366 |
20.70 |
0.000 |
Stay |
1 |
31.684 |
31.684 |
26.92 |
0.000 |
Age |
1 |
1.126 |
1.126 |
0.96 |
0.330 |
Xray |
1 |
13.719 |
13.719 |
11.66 |
0.001 |
Error |
109 |
128.281 |
1.177 |
||
Total |
112 |
201.380 |
Model Summary
S |
R-sq |
R-sq(adj) |
R-sq(pred) |
1.08484 |
36.30% |
34.55% |
30.64% |
Coefficients
Term |
Coef |
SE Coef |
T-Value |
P-Value |
VIF |
Constant |
1.00 |
1.31 |
0.76 |
0.448 |
|
Stay |
0.3082 |
0.0594 |
5.19 |
0.000 |
1.23 |
Age |
-0.0230 |
0.0235 |
-0.98 |
0.330 |
1.05 |
Xray |
0.01966 |
0.00576 |
3.41 |
0.001 |
1.18 |
Regression Equation
InfctRsk |
= |
1.00 + 0.3082 Stay - 0.0230 Age + 0.01966 Xray |
The regression model that is being estimated is
where is the intercept, are the slope coefficients for stay, Age, Xray and is a random error
The hypotheses are
The test statistics for this test has F distribution and it is obtained from the ANOVA table
Source |
DF |
Adj SS |
Adj MS |
F-Value |
P-Value |
Regression |
3 |
73.099 |
24.366 |
20.70 |
0.000 |
Stay |
1 |
31.684 |
31.684 |
26.92 |
0.000 |
Age |
1 |
1.126 |
1.126 |
0.96 |
0.330 |
Xray |
1 |
13.719 |
13.719 |
11.66 |
0.001 |
Error |
109 |
128.281 |
1.177 |
||
Total |
112 |
201.380 |
The test statistics is
F=20.70
The p-value =0.000
We will reject the null hypothesis if the p-value is less than the significance level.
Here, the p-value is 0.000 and it is less than the significance level 0.05. Hence we reject the null hypothesis.
We conclude that, at 5% level of significance, there is a connection between the infection risk and the group of predictors.
We need to test for each of the slopes the following hypotheses.
Test if the variable Stay is significant in the relationship
The test statistics for this 2 tailed test is in the coefficient table.
Coefficients
Term |
Coef |
SE Coef |
T-Value |
P-Value |
VIF |
Constant |
1.00 |
1.31 |
0.76 |
0.448 |
|
Stay |
0.3082 |
0.0594 |
5.19 |
0.000 |
1.23 |
Age |
-0.0230 |
0.0235 |
-0.98 |
0.330 |
1.05 |
Xray |
0.01966 |
0.00576 |
3.41 |
0.001 |
1.18 |
The test statistics is t=5.19 and the p-value is 0.000
We will reject the null hypothesis if the p-value is less than the significance level.
Here, the p-value is 0.000 and it is less than the significance level 0.05. Hence we reject the null hypothesis.
We conclude that, at 5% level of significance, there is a connection between the infection risk and the predictor Stay.
Test if the variable Age is significant in the relationship
The test statistics for this 2 tailed test is in the coefficient table.
The test statistics is t=-0.98 and the p-value is 0.330
We will reject the null hypothesis if the p-value is less than the significance level.
Here, the p-value is 0.330 and it is not less than the significance level 0.05. Hence we do not reject the null hypothesis.
We conclude that, at 5% level of significance, there is no connection between the infection risk and the predictor Age.
Test if the variable Xray is significant in the relationship
The test statistics for this 2 tailed test is in the coefficient table.
The test statistics is t=3.41 and the p-value is 0.001
We will reject the null hypothesis if the p-value is less than the significance level.
Here, the p-value is 0.001 and it is less than the significance level 0.05. Hence we reject the null hypothesis.
We conclude that, at 5% level of significance, there is a connection between the infection risk and the predictor Xray.
We can decide that we need to drop Age from the model and retain Stay and Xray in the final relationship.
The new model that we need to estimate would be
Give the value of the coefficient of determination and tell what it means
The value of the coefficient of determination is
Model Summary
S |
R-sq |
R-sq(adj) |
R-sq(pred) |
1.08484 |
36.30% |
34.55% |
30.64% |
Ans: The value of the coefficient of determination is 0.3630
This indicates that 36.30% of the variation in infection risk is explained by the model (or is explained by the predictor variables).