In: Statistics and Probability
the following bivariate data set contains an outlier.
x | y |
---|---|
27.7 | 12.9 |
29 | 13.9 |
33.1 | 7.6 |
31.7 | 48.2 |
32.3 | 12.4 |
31.1 | 32.9 |
29 | 22.4 |
35.2 | 29.5 |
47.2 | 62.3 |
31.7 | 21.9 |
27.6 | 20.4 |
29.2 | 16.9 |
37.6 | 54.2 |
34.1 | 27.7 |
106 | -156.5 |
What is the correlation coefficient with the
outlier?
rw =
What is the correlation coefficient without the
outlier?
rwo =
With outlier correlation coefficient is as below:
X Values
∑ = 562.5
Mean = 37.5
∑(X - Mx)2 = SSx = 5365.28
Y Values
∑ = 226.7
Mean = 15.113
∑(Y - My)2 = SSy = 35182.457
X and Y Combined
N = 15
∑(X - Mx)(Y - My) = -11764.81
R Calculation
r = ∑((X - My)(Y - Mx)) /
√((SSx)(SSy))
r = -11764.81 / √((5365.28)(35182.457)) = -0.8563
Now without outlier correlation coefficient is as below:
X Values
∑ = 456.5
Mean = 32.607
∑(X - Mx)2 = SSx = 337.869
Y Values
∑ = 383.2
Mean = 27.371
∑(Y - My)2 = SSy = 3627.669
X and Y Combined
N = 14
∑(X - Mx)(Y - My) = 830.383
R Calculation
r = ∑((X - My)(Y - Mx)) /
√((SSx)(SSy))
r = 830.383 / √((337.869)(3627.669)) = 0.7501