In: Statistics and Probability
Studies have shown that people who suffer sudden cardiac arrest (SCA) have a better chance of survival if a defibrillator is administered very soon after cardiac arrest. How is survival rate related to the time between when cardiac arrest occurs and when the defibrillator shock is delivered? This question is addressed in the paper “Improving Survival from Sudden Cardiac Arrest: The Role of Home Defibrillators” (by J.K. Stross, University of Michigan, February 2002). The accompanying data give y = survival rate (percent) and x = mean call-to-shock time (minutes) for a cardiac rehabilitative center (where cardiac arrests occurred while victims were hospitalized and so the call-to-shock time tended to be short) and for four communities of different sizes
Mean call-to-shock time,x 2 6 7 9 12
Survival Rate, y 92............ 44 .............32 ...............6 .................4
Do the following by hand and on Minitab.
1)Construct a scatter plot.
2)Calculate the Pearson correlation coefficient.
3)Determine equation of least squares line that can be used for predicting a value of y based on a value of x.
4)Compute SSE = for the least squares line.
5)Why do we call the least squares line the “best fitting line”?
6) Calculate r2 using the following formula: . Interpret the r2 value.
7) Using your equation in part c, draw the least squares line on the scatterplot you constructed in part a.
8) Use your prediction equation to predict SCA survival rate for a community with a mean call-to-shock time of 8 min. (Round your answer to five decimal places.)
X | Y | XY | X² | Y² |
2 | 92 | 184 | 4 | 8464 |
6 | 44 | 264 | 36 | 1936 |
7 | 32 | 224 | 49 | 1024 |
9 | 6 | 54 | 81 | 36 |
12 | 4 | 48 | 144 | 16 |
Ʃx = | Ʃy = | Ʃxy = | Ʃx² = | Ʃy² = |
36 | 178 | 774 | 314 | 11476 |
Sample size, n = | 5 |
x̅ = Ʃx/n = 36/5 = | 7.2 |
y̅ = Ʃy/n = 178/5 = | 35.6 |
SSxx = Ʃx² - (Ʃx)²/n = 314 - (36)²/5 = | 54.8 |
SSyy = Ʃy² - (Ʃy)²/n = 11476 - (178)²/5 = | 5139.2 |
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 774 - (36)(178)/5 = | -507.6 |
1) Scatter plot:
2)
Correlation coefficient, r = SSxy/√(SSxx*SSyy) = -507.6/√(54.8*5139.2) = -0.9565
3)
Slope, b = SSxy/SSxx = -507.6/54.8 = -9.262774
y-intercept, a = y̅ -b* x̅ = 35.6 - (-9.26277)*7.2 = 102.29197
Regression equation :
ŷ = 102.292 + (-9.2628) x
4)
Sum of Square error, SSE = SSyy -SSxy²/SSxx = 5139.2 - (-507.6)²/54.8 = 437.41606
5)
The regression line is sometimes called the "line of best fit" because it is the line that fits best when drawn through the points. It is a line that minimizes the distance of the actual scores from the predicted scores.
6)
Coefficient of determination, r² = (SSxy)²/(SSxx*SSyy) = (-507.6)²/(54.8*5139.2) = 0.9149
91.49% variation in y is explained by the least squares model.
7)
8)
Predicted value of y at x = 8
ŷ = 102.292 + (-9.2628) * 8 = 28.18978