In: Statistics and Probability
Using R studio:
shapiro.test(c(rnorm(100), 6))
shapiro.test(c(rnorm(1000), 6))
shapiro.test(c(rnorm(4000), 6))
> shapiro.test(c(rnorm(100), 6))
Shapiro-Wilk normality test
data: c(rnorm(100), 6)
W = 0.90901, p-value = 3.489e-06
> shapiro.test(c(rnorm(1000), 6))
Shapiro-Wilk normality test
data: c(rnorm(1000), 6)
W = 0.9936, p-value = 0.0002736
> shapiro.test(c(rnorm(4000), 6))
Shapiro-Wilk normality test
data: c(rnorm(4000), 6)
W = 0.99848, p-value = 0.0007784
If outlier is far away from original data then presence of single outlier can change the result of Shapiro wilk test.
But if outlier is not far away from original data then Shapiro -Wilk test give correct result even outlier is present in the data.
Consider following result:
> shapiro.test(c(rnorm(100), 4))
Shapiro-Wilk normality test
data: c(rnorm(100), 4)
W = 0.9749, p-value = 0.05094
> shapiro.test(c(rnorm(1000), 4))
Shapiro-Wilk normality test
data: c(rnorm(1000), 4)
W = 0.99789, p-value = 0.2399
> shapiro.test(c(rnorm(4000), 4))
Shapiro-Wilk normality test
data: c(rnorm(4000), 4)
W = 0.99965, p-value = 0.7486
Here 4 is outlier. Then also Shapiro Wilk test gives correct result.