In: Statistics and Probability
Outlier is a data value, which is present in data set and is located at an abnormal distance from the rest of the data values.
This means that outliers are those values which are located away from most of the data values.
For example, suppose we have the data value 60,70,80,90,100,190
In this example, we can see that most of the data values are located on the left and right side of element 80, but data value 190 is located at a very long distance from all data values.
This kind of data items are defined as outlier. Outlier can affect the mean value or average value by large amount.
we know that averag = (sum of all data values)/(total number of data values)
suppose our data set is only 60,70,80,90,100
then, mean = (60+70+80+90+100)/5 =400/5 = 80 is the mean or average value
And if we add outlier 190 in the data set, then mean will be given as
mean = (60+70+80+90+100+190)/6 = 590/6 = 98.33
We can see impact of outlier on the average or mean value of data set.
Generally, any value that is 1.5 IQR below the first quartile or 1.5 IQR above the third quartile are considered as outliers.
IQR is inter quartile range.