In: Statistics and Probability
The following bivariate data set contains an outlier.
| x | y |
|---|---|
| 62.8 | 256.5 |
| 61.9 | -39.3 |
| 47.2 | 765.4 |
| 54.3 | 1350.5 |
| 72.5 | 479.2 |
| 84.7 | 2508.9 |
| 43.9 | -2120.9 |
| 54 | -1687.9 |
| 45.6 | 1021.6 |
| 68.5 | 786.7 |
| 34.7 | -4360.1 |
| 51.3 | -1825.9 |
| 50.7 | -1365.3 |
| 77 | 488.1 |
| 234.7 | -236 |
What is the correlation coefficient with the
outlier?
rw =
What is the correlation coefficient without the
outlier?
rwo =
Would inclusion of the outlier change the evidence for or against a
significant linear correlation?
1)
| S.No | X | Y | (x-x̅)2 | (y-y̅)2 | (x-x̅)(y-y̅) |
| 1 | 62.8 | 256.5 | 46.0588 | 272205.67 | -3540.8302 |
| 2 | 61.9 | -39.3 | 59.0848 | 51045.87 | -1736.6742 |
| 3 | 47.2 | 765.4 | 501.1628 | 1062205.07 | -23072.4449 |
| 4 | 54.3 | 1350.5 | 233.6822 | 2610594.20 | -24699.1769 |
| 5 | 72.5 | 479.2 | 8.4875 | 554180.99 | 2168.7824 |
| 6 | 84.7 | 2508.9 | 228.4128 | 7695815.75 | 41926.4018 |
| 7 | 43.9 | -2120.9 | 659.8048 | 3443498.78 | 47665.8911 |
| 8 | 54 | -1687.9 | 242.9442 | 2023980.44 | 22174.6311 |
| 9 | 45.6 | 1021.6 | 575.3602 | 1655940.03 | -30866.8422 |
| 10 | 68.5 | 786.7 | 1.1808 | 1106563.74 | -1143.1009 |
| 11 | 34.7 | -4360.1 | 1217.0795 | 16767933.02 | 142856.2484 |
| 12 | 51.3 | -1825.9 | 334.4022 | 2435680.44 | 28539.3911 |
| 13 | 50.7 | -1365.3 | 356.7062 | 1210146.67 | 20776.5924 |
| 14 | 77 | 488.1 | 54.9575 | 567511.11 | 5584.7111 |
| 15 | 234.7 | -236 | 27262.4128 | 854.59 | 4826.8131 |
| Total | 1043.8 | -3978.5 | 31781.7373 | 41458156.37 | 231460.3933 |
| Mean | 69.587 | -265.23 | SSX | SSY | SXY |
| correlation coefficient with the outlier rw= | Sxy/(√Sxx*Syy) = | 0.2016 | |
removing point (234.7 , -236)
| correlation coefficient without the outlier rw= | Sxy/(√Sxx*Syy) = | 0.6930 | |
Yes. Including the outlier changes the evidence regarding a linear correlation.