In: Statistics and Probability
The following bivariate data set contains an outlier.
| x | y | 
|---|---|
| 47.6 | 53.7 | 
| 40.6 | 112.6 | 
| 28.9 | 72.2 | 
| 36.1 | 101.1 | 
| 32.4 | 112.4 | 
| 26.8 | 67.1 | 
| 36.8 | 38.9 | 
| 33.1 | 70.1 | 
| 67.5 | 8.6 | 
| 43.3 | -6.4 | 
| 50.1 | 41.3 | 
| 30.1 | 3.2 | 
| 29.8 | 41.7 | 
| 55 | -32.8 | 
| 189.6 | 548.8 | 
What is the correlation coefficient with the
outlier?
rw =
What is the correlation coefficient without the
outlier?
rwo =
Would inclusion of the outlier change the evidence for or against a
significant linear correlation at 5% significance?
A) No. Including the outlier does not change the evidence regarding a linear correlation.
B) Yes. Including the outlier changes the evidence regarding a linear correlation.
Would you always draw the same conclusion with the addition of an
outlier?
A) Yes, any outlier would result in the same conclusion.
B) No, a different outlier in a different problem could lead to a different conclusion.
Explain your answer.
a)
using excel data analysis tool for regression, following o/p is
obtained
| Regression Statistics | ||||||
| Multiple R | 0.8680 | |||||
| R Square | 0.7534 | |||||
| Adjusted R Square | 0.7344 | |||||
| Standard Error | 70.1143 | |||||
| Observations | 15 | |||||
| ANOVA | ||||||
| df | SS | MS | F | Significance F | ||
| Regression | 1 | 195208.35 | 195208.35 | 39.7087 | 0.0000 | |
| Residual | 13 | 63908.14 | 4916.01 | |||
| Total | 14 | 259116.49 | ||||
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
| Intercept | -63.9811 | 29.4216 | -2.1746 | 0.0487 | -127.543 | -0.420 | 
| X | 2.9319 | 0.4653 | 6.3015 | 0.0000 | 1.927 | 3.937 | 
so, correlation coefficient with the outlier=0.8680
------------------------------------------------------
correlation coefficient without the outlier=
-0.4853
----------------------------------
with outlier,
correlation hypothesis test      
Ho:   ρ = 0  
Ha:   ρ ╪ 0  
n=   15  
alpha,α =    0.1  
correlation , r=   0.8680  
t-test statistic =    t = r*√(n-2)/√(1-r²) =   
6.3015
critical t-value =    1.7709  
p-value =    0.0000 <α=0.05, reject Ho, so, linear
correlation exists at α=0.05
-----------
now without lier
correlation hypothesis test      
Ho:   ρ = 0  
Ha:   ρ ╪ 0  
n=   14  
alpha,α =    0.1  
correlation , r=   -0.4853  
t-test statistic =    t = r*√(n-2)/√(1-r²) =   
-1.9227
critical t-value =    1.7823  
p-value =    0.0786 >α=0.05, fail to reject Ho, so,
linear correlation does not exists at α=0.05
hence, answer is Yes. Including the outlier changes the evidence regarding a linear correlation.
-------------------------------------
No, a different outlier in a different problem could lead to a different conclusion.