In: Statistics and Probability
Construct a regression model for predicting total charges from length of stay for DRG 105.
a. State the null and alternative hypotheses and alpha level.
b. Prepare a scatter diagram with the regression line for the two variables.
c. What are the r and r2? What is the importance of the r and r2 results?
d. What is the regression equation?
e. What are your conclusions?
Gender Age LOS Charges Payor
1 Female 47 20 $91,683 Medicaid
2 Female 75 43 $93,708 Medicare
3 Female 84 7 $21,446 Medicare
4 Female 50 13 $37,797 Medicare
5 Male 77 14 $54,364 Medicare
6 Male 57 4 $17,626 Medicare
7 Male 73 4 $12,832 Medicare
8 Female 56 1 $36,153 Medicaid
9 Male 69 1 $14,907 Medicaid
10 Female 81 23 $104,148 Medicare
11 Male 21 5 $21,423 Medicaid
12 Female 37 5 $24,971 Medicaid
13 Female 69 4 $17,022 Medicare
14 Female 89 17 $50,652 Medicare
15 Male 28 35 $186,496 Medicaid
16 Male 47 6 $24,441 Medicaid
17 Male 87 11 $35,349 Medicare
18 Female 85 5 $22,155 Medicare
19 Male 56 5 $24,455 Managed Care
20 Male 45 11 $36,401 Medicaid
21 Male 82 6 $25,783 Medicare
22 Female 65 10 $37,055 Managed Care
23 Male 67 4 $19,236 Medicare
24 Male 59 23 $60,132 Other
25 Female 67 7 $35,777 Medicare
26 Male 53 4 $19,972 Managed Care
27 Male 71 7 $25,409 Medicare
28 Female 79 6 $281,140 Medicare
29 Male 63 1 $41,283 Medicaid
30 Male 53 19 $71,439 Medicaid
31 Female 75 9 $33,735 Medicare
32 Female 68 9 $37,830 Gov Mngd Care
33 Male 37 4 $22,311 Medicaid
Total N 33 33 33 33 33
I used R software to solve this question.
R codes and output:
> d=read.table('data.csv',header=T,sep=',')
> head(d)
LOS Charges
1 20 $91,683
2 43 $93,708
3 7 $21,446
4 13 $37,797
5 14 $54,364
6 4 $17,626
> attach(d)
The following objects are masked from d (pos = 3):
Charges, LOS
> Charge=as.numeric(Charges)
> fit=lm(Charge~LOS)
> summary(fit)
Call:
lm(formula = Charge ~ LOS)
Residuals:
Min 1Q Median 3Q Max
-20.850 -5.925 1.690 7.152 13.614
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.0014 2.3517 5.529 4.72e-06 ***
LOS 0.3847 0.1675 2.297 0.0285 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.082 on 31 degrees of freedom
Multiple R-squared: 0.1454, Adjusted R-squared: 0.1179
F-statistic: 5.275 on 1 and 31 DF, p-value: 0.02855
> plot(LOS, Charge)
> abline(fit)
> cor(Charge,LOS)
[1] 0.3813442
Que.a
Hypothesis:
The fitted regression model is not statistically significant.
The fitted regression model is statistically significant.
Que.b
Scatter plot:
Que.c
Correlation coefficient = r = 0.3813442
Coefficient of determination = r2 = 0.1454
Correlation coefficient (r) gives the strength of linear relationship between two variables whereas Coefficient of determination tells information about how much variation in one variable is explained by the other variable.
Que.d
Regression equation:
Total charges = 13.0014 + 0.3847 LOS
Que.e
Since p-value for t test ( for testing significance of slope ) is 0.0285, which is less than 0.05, hence we reject null hypothesis and conclude that fitted regression model is statistically significant.