Question

In: Statistics and Probability

What is an outlier? How would you scan for outliers in your dataset? What would you...

What is an outlier? How would you scan for outliers in your dataset? What would you do with data points that are considered outliers? [6 Points]

Solutions

Expert Solution

An Outlier is a data value in a data set that differs significantly from other data points in the set. It may be due to variability in measurement or it can be an experimental error.

For Example: In a data set of 6 points, say – 2, 5, 8, 3, 0, 378 one can easily say (without calculating) that 378 is an outlier of the data set as it differs significantly from the rest of the 5 points.

To Scan for outliers in the data set one must find the range / fences for the data set. To do that –

Firstly, one needs to find the Lower / First Quartile of the data set. It is the median of the first half of the data set. Then one needs to find the Upper / Third Quartile of the data set. It is the median of the next half of the data set. Before, finding the measures it would be better to find the median of the complete data set as it would help to divide the data into two equal halves

Then, one needs to find the Interquartile Range, IQR of the data set which can be obtained by subtracting the value of Lower Quartile from the Upper Quartile, that is, IQR = Upper Quartile – Lower Quartile

Then, to find the range of the data set, one must multiply 1.5 to IQR of the data set. Let p = 1.5 x IQR. To this value we add the Upper Quartile to find the upper fence and also we subtract this value p from the lower quartile to obtain the lower fence

That is,

Upper Fence = Upper Quartile + p

Lower Fence = Lower Quartile – p

If all the values of the data set lie inside [Lower Fence, Upper Fence] then there are no outliers in the data set. If any of the data point(s) does not lie in the interval then those data point(s) is/are the outlier(s) of the data set.

If there are outliers then one can –

· Trim the data set and replace the outliers with the “near” values available

· Replace the outliers with the mean / median of the data set, whichever better suits the data

· Completely remove the outliers and work with whatever values left for better/accurate results

  


Related Solutions

Would you please demonstrate to me how to create dataset A and dataset B, where dataset...
Would you please demonstrate to me how to create dataset A and dataset B, where dataset A has a larger range but smaller standard deviation than dataset B. Then the reverse where data set A has a smaller range and larger standard deviation than data set B.
How do outliers affect PC scores? Perform a PCA on the board stiffness dataset with and...
How do outliers affect PC scores? Perform a PCA on the board stiffness dataset with and without detected outliers.
What is(are) the difference(s) between stealth scan and full scan? When would you use one or...
What is(are) the difference(s) between stealth scan and full scan? When would you use one or another?
In your own words, explain what an outlier is.
In your own words, explain what an outlier is.
1. How would you assess skin turgor for dehydration during a bladder scan? 2. How would...
1. How would you assess skin turgor for dehydration during a bladder scan? 2. How would you palpate the abdomen for bladder distention during a bladder scan? 3. Why is ultrasound gel used with a bladder scan?
If there is an outlier, what would it tend to affect more? Group of answer choices...
If there is an outlier, what would it tend to affect more? Group of answer choices Mean Median Mode All equally None of the above.
Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is...
Download the dataset CARS1 from BlackBoard. a. Do not worry about outliers. Assume the data is correct and any outliers will remain in the dataset. b. Do scatterplot and analyze the results. c. Test for correlation (correlation coefficient) d. Regress weight (column 2) against gas mileage in the city (column 1). Make sure you make gas mileage the dependent (Y) variable. e. Determine and fully explain R2 MPG City Weight 19 3545 23 2795 23 2600 19 3515 23 3245...
Discuss the similarities and difference between Level Shifts and Additive Outliers including how you would go...
Discuss the similarities and difference between Level Shifts and Additive Outliers including how you would go about identifying them and then dealing with them in a forecasting scenario
What is the difference between a Univariate outlier screen and a Multivariate outlier screen?
What is the difference between a Univariate outlier screen and a Multivariate outlier screen?
Discuss in detail what an outlier is and explain how it can be a disadvantage of...
Discuss in detail what an outlier is and explain how it can be a disadvantage of using the High-Low method. The CEO of Keechen & Moore has heard about some firms using regression analysis (method of least squares) to estimate costs. Briefly explain the regression method and discuss its main advantage relative to the High-Low method.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT