In: Statistics and Probability
What is the difference between a Univariate outlier screen and a Multivariate outlier screen?
Univariate outlier screen - In this method, we look at data point which has an extreme value as compared to other values only in one variable. For example, in the variable of height, of the observation is 12 feet when all other observations are up to 6.4 feet. Hence it is a clear outlier.
The univariate outlier can be detected with the help of a simple
box plot, where point beyond 1.5 times the IQR is termed as the
outlier.
Another method is the Tukey's method, which expresses an outlier as
that valeus of the dataset which falls from the median(which is the
central point in the data).
Multivariate outlier screen- In this method, we
look at datapoint which has an extreme value in two or more
variable. In simple words, a point has an extreme value in a
combination of variables. For example in the dataset with two
variables of age and income, we have one variable with age equal to
10 years and income is 1 million. This observation seems suspicious
and needs to investigate further.
The multivariate outlier can be identified by plotting a
bivariate plot, we can clearly see if an observation is an outlier
with respect to the two variables.
More advance method to identify is Mahalanobis distance.
It is the distance of an observation from the calculated centroid
of the other observation where the centroid is calculated as the
intersection of the mean of the variables being assessed.