In: Statistics and Probability
Concept-Discussion 1 - Explain why the standard deviation would likely not be a reliable measure of variability for a distribution of data that includes at least one extreme outlier.
Concept-Discussion 2 - Suppose that you collect a random sample of 250 salaries for the salespersons employed by a large PC manufacturer. Furthermore, assume that you find that two of these salaries are considerably higher than the others in the sample. Before analyzing this data set, should you delete the unusual observations? Explain why or why not.
Concept-Discussion 3 - A researcher is interested in determining whether there is a relationship between the number of room air- conditioning units sold each week and the time of year. What type of descriptive chart would be most useful in performing this analysis? Explain your choice.
Concept-Discussion 4 - Suppose that the histogram of a given income distribution is positively skewed. What does this fact imply about the relationship between the mean and median of this distribution?
1) Standard deviation is given by
Thus, Standard deviation measures deviation of each observation from the mean.As standard deviation takes into account each and every observation , it is affected by outliers .Suppose all other observations are tightly packed but for even one outlier , standard deviation increases. Thus gives a distorted picture of location of other observations.
2) Yes , we should delete the two outliers. As out of 250 observations , only 2 are outliers , that means variability may be due to sampling error , or error in measurement . Including these outliers will affect the measures of central tendency , variability etc. So its better to delete the two observations.
3) The dependent variable here is time of the year , which is categorical variable. Thus it can be showed at discrete points in the x axis.
The independent variable here is sales , which is quantitative variable . Thus it can shown in the y axis corresponding to its x value.
Thus bar chart or line chart will be best for describing the data. An example is shown below.
Month | Sales |
January | 2 |
February | 2 |
March | 4 |
April | 5 |
May | 8 |
June | 8 |
July | 6 |
August | 5 |
September | 4 |
October | 1 |
November | 1 |
December | 1 |
The bar chart is gives by
Line chart is given by
4) For symmetrical data mean, median and mode coincide . For skewed data mean and median fall apart from each other . Median divides the data into two equal parts ,it is not effected by skewness, thus median lies exactly at the middle of the of the histogram . As mean is affected by extreme observations, it moves towards the longer tail.For positively skewed data , the right tail is longer , thus mean moves towards the right tail. Therefore mean is greater than median for positively skewed data.