In: Statistics and Probability
The DHFS is interested in predictive techniques that provide
reliable utilization forecasts to update Medicaid funding rate
schedule of nursing facilities. (Frees 2010).
Note that the in order to measure the utility of nursing homes a
quantity called patient days is defined which is the number of days
each patient was in the facility added for the number of patients
ie P D = d1 + d2 + ....dn, where di = number of days ith patient
spends in the nursing home and PD is an abbreviation for patient
days. The variables for this exercise are defined as follows.
1. Total patient years; TPY
2. The number of beds; NUMBED
3. Square footage of the nursing home; SQRFOOT
Descriptive Statistics Use R code
Compute the following for TPY, NUMBED, SQRFOOT
i mean (3 points)
ii standard deviation
iii median
Construct the histogram for TPY, NUMBED, SQRFOOT. Comment on the shape of the distributions for all three variables. (6 points)
Construct a qq plot for each variable. Do the variables appear to be normally distributed? (6 points)
Here, it is given that,
P D = d1 + d2 + ....dn,
where di = number of days ith patient spends in the nursing home and
PD is an abbreviation for patient days.
The variables for this exercise are defined as follows.
1. Total patient years; TPY
2. The number of beds; NUMBED
3. Square footage of the nursing home; SQRFOOT
Now, we have to compute
1.Descriptive Statistics Using R code: TPY<-c() NUMBED<-c() (Combine function is used because it may possible that beds may be purchased so that, number will be varying in days if it is not then it becoem a constant a every day/ in year) SQRFOOT<- c() (Area is fixed but patients number may vary so space may get change) #To Compute the summary statistics following command is used table<-c(TPY,NUMBED,SQRFOOT) summary(table) ii. To compute mean, standard deviation, median we can use mean( TPY); sd(TPY); median(TPY) mean(NUMBED); sd(NUMBED); median(NUMBED) mean(SQRFOOT); sd(SQRFOOT); median(SQRFOOT) iii. Construct the histogram for TPY, NUMBED, SQRFOOT. par(mfrow=c(3,1)) hist(TPY) hist(NUMBED) hist(SQRFOOT) iv. The shape of the distributions for all three variables. Shape of TPY is fluctuating and it may be non-normal. Shape of NUMBED is fixed, because it is already available. If we have to focus on whether it is associated with count of increasing number of patients then varies in certain limit i.e. upto maximum availability in the hospital. Shape of is SQRFOOT if fixed, it is constant. If we have to focus on whether it is associated with count of increasing number of patients then varies in certain limit i.e. as patients number increase it get decreases. v. Construct a qq plot for each variable. Do the variables appear to be normally distributed? qqnorm(TPY); qqline(TPY) qqnorm(NUMBED); qqline(NUMBED) qqnorm(SQRFOOT); qqline(SQRFOOT) to conform whether it is normally distributed, we can apply shapiro Wilk test to verify it: shapiro.test(TPY) shapiro.test(NUMBED) shapiro.test(SQRFOOT) Based on the P-Value(<0.05) obtained from it, we can conclude that,they are normally distributed or not.