In: Statistics and Probability
Conceptual Questions:-
1.If the bars of a histogram are all the same height, what is one observation you could make about the distribution of the sample presented in the histogram?
2. A statistician computes a 95% confidence interval for the number of prior arrests of those convicted of violent crimes. The interval ranged from 1.6 to 3.6 prior arrests. Given these data, what is the probability that the population mean is greater than 3.6 prior arrests? Why?
3. Why is the range considered a misleading measure of variability for a skewed frequency distribution?
4. A researcher collects a sample of 30 individuals who have a mean age of 34.6, a median age of 41.5, and a modal age of 44. Make one observation regarding the nature of the distribution of data that the researcher collected. What would be the best measure of central tendency to use in describing a distribution of this form? Why did you choose the measure of central tendency that you did?
5. Why is it necessary to assume that the data are distributed normally when calculating a z score?
1. A histogram is a distribution representation in a graphical pattern whereby the data is represented as the frequency in specific bins. Hence, the height of the bars of a histogram represent the frequency of values in a specific bin. Here, the height of all bars is roughly the same. Therefore, the probability that a randomly drawn number belongs to any of the bins is the same. This is the case for a uniform distribution.
2. The 95% Confidence Interval for the number of prior arrests is given to be between 1.6 and 3.6 prior arrests. This means that the probability that the population mean will lie between 1.6 and 3.6 is 95%. Hence, the probability that the population mean is below 1.6 and above 3.6 is 100 - 95 = 5%.
Hence, the probability of the population mean lying below 1.6 or above 3.6 will be 5/2 = 2.5% since each side of the distribution has an equal probability.
3. Range refers to the total scale of the values assumed by the random variable. A skewed distribution is the one that has a large chunk of data present on one side of the distribution. Thus, the range for a skewed distribution will cover the entire space of the random variable and will not give an estimate of the variability present in the distribution.
4. The sample size is 30. The mean, median, and mode age are 34.6, 41.5, and 44. The mean captures the average of all the values in the distribution, the median is a measure of the central value in the distribution, and mode refers to the value that occurs the most number of times. Since the median is larger than the mean, it implies that majority of data points are present on the right side of the distribution. Therefore, we can conclude that the data is skewed and specifically, it is right-skewed.
5. Z-score is a standardized score with a mean of 0 and a standard deviation of 1. It is not essential to assume that the distribution is normal for calculating a z-score. Z-score can be calculated for practically any distribution.
However, since the majority of the distributions followed by real-world data are roughly normal, the z-score can be employed easily for normal distributions thereby converting a normal distribution to a standard normal distribution.