In: Statistics and Probability
The following bivariate data set contains an outlier.
x | y |
---|---|
28.5 | 59.3 |
-6.4 | -356.8 |
29.7 | 81.6 |
23.9 | 1298 |
23.4 | 1724.5 |
-11.9 | 303 |
12.3 | -810.6 |
46.3 | -628.5 |
24.7 | 1557.2 |
14.4 | 394.8 |
7.8 | -1789.2 |
6 | -1949.6 |
30.8 | 2369.6 |
28.5 | 1974.8 |
220.9 | 37.7 |
What is the correlation coefficient with the
outlier?
rw =
What is the correlation coefficient without the
outlier?
rwo =
Would inclusion of the outlier change the evidence for or against a
significant linear correlation?
Question for thought: Would you always draw the same conclusion
with the addition of an outlier?
Correlation coefficient with the outlier rw = 0.0553
For finding out outlier, we need to plot scatter plot.
From the scatter plot be can seen that point x = 220 and Y = 37.7 is an outlier.
Correlation coefficient with the outlier rwo = 0.3799
We can seen that rwo = 0.3799 is quite greater than rw = 0.0553. Thus, it can be inferred that including the outlier changes the evidence regarding a linear correlation.
Yes. Including the outlier changes the evidence regarding a linear correlation.
We can not always draw the same conclusion with the addition of an outlier. This is because sample size also impacts the effect of the outliers. The smaller the sample size, the greater the effect of the outlier. At some point when the sample size is fairly large, the outlier will have little or no effect on the size of the correlation coefficient.