In: Statistics and Probability
An administrator wanted to study the utilization of long-distance telephone service by a department. One variable of interest (let’s call it X) is the length, in minutes, of long-distance calls made during one month. There were 38 calls that resulted in a connection. The length of calls, already ordered from smallest to largest, are presented in the following table.
1.6 |
1.7 |
1.8 |
1.8 |
1.9 |
2.1 |
2.5 |
3.0 |
3.0 |
4.4 |
4.5 |
4.5 |
5.9 |
7.1 |
7.4 |
7.5 |
7.7 |
8.6 |
9.3 |
9.5 |
12.7 |
15.3 |
15.5 |
15.9 |
15.9 |
16.1 |
16.5 |
17.3 |
17.5 |
19.0 |
19.4 |
22.5 |
23.5 |
24.0 |
31.7 |
32.8 |
43.5 |
53.3 |
Which one of the following statements is not true?
The 75th percentile (Q3) is 17.5 minutes.
The 50th percentile is (Q2) 9.4 minutes.
The 25th percentile (Q1) is 4.4 minutes.
Q3- Q2 > Q2- Q1
Average X > Median X.
X distribution is positively skewed.
The percentile rank of 5.9 minutes is 13.
Range of X is 51.7 minutes.
IQR (Inter-Quartile Range) is 13.1 minutes.
There are 2 outliers in X distribution.
Q4: (This continues Q3: 2 marks) Which one of the following cannot be used to describe the distribution of X?
A Histogram.
A Stemplot.
Skewness and Kurtosis.
Mean and SD (Standard Deviation).
The 5-number Summary.
The coefficient of determination.
The coefficient of relative variation (CRV).
The 1.5 IQR Rule.
The Deciles.
A Boxplot.
An administrator wanted to study the utilization of long-distance telephone service by a department.
One variable of interest (let’s call it X) is the length, in minutes, of long-distance calls made during one month.
The length of calls, already ordered from smallest to largest, are presented in the following table.
1.6 |
1.7 |
1.8 |
1.8 |
1.9 |
2.1 |
2.5 |
3.0 |
3.0 |
4.4 |
4.5 |
4.5 |
5.9 |
7.1 |
7.4 |
7.5 |
7.7 |
8.6 |
9.3 |
9.5 |
12.7 |
15.3 |
15.5 |
15.9 |
15.9 |
16.1 |
16.5 |
17.3 |
17.5 |
19.0 |
19.4 |
22.5 |
23.5 |
24.0 |
31.7 |
32.8 |
43.5 |
53.3 |
Which one of the following statements is not true?
Now we will find each of the required quantity to justify weather statements is not true
i)
The 75th percentile (Q3) is 17.5 minutes
Given sample size = 38 ( even )
Thus 75th percentile for even data with n = 38 is given by
To calculate 75th percentile we will first calculate Median of
data
Median = ( (n\2)th observation + (n/2+1)th observation ) / 2
= ( (38/2)th observation + (38/2+1)th observation ) / 2
= ( (19)th observation + (20)th observation ) / 2
From given dat data is (19)th observation = 9.3 and (20)th observation = 9.5
Thus, Median = ( 9.3 + 9.5 ) / 2 = 18.8 /2 = 9.4
Thus Median = 9.4
Now 75th percentile is nothing but median of data which is more that Median value i.e
Median of this observation ( medain value = 9.4 , so observation greater than 9.4 are )
9.5 12.7 15.3 15.5 15.9 15.9 16.1 16.5 17.3 17.5 19.0 19.4 22.5 23.5 24.0 31.7 32.8 43.5 53.3
Number of observation greater than 9.4 are n3 = 19 ( odd )
Median of odd number is given by
Median of data greater than 9.4 = ( n3 + 1 )/2 observation = ( 19 + 1 ) /2 th observation
= 10 th observation
Now 10th observation is 17.5
Thus
Our 75th percentile is 17.5
Hence
The 75th percentile (Q3) is 17.5 minutes. - TRUE
ii)
The 50th percentile is (Q2) 9.4 minutes.
We have have already obtain median of data above
Which was Median =
Median = ( (n\2)th observation + (n/2+1)th observation ) / 2
= ( (38/2)th observation + (38/2+1)th observation ) / 2
= ( (19)th observation + (20)th observation ) / 2
From given dat data is (19)th observation = 9.3 and (20)th observation = 9.5
Thus, Median = ( 9.3 + 9.5 ) / 2 = 18.8 /2 = 9.4
Thus Median = 9.4
Thus The 50th percentile is (Q2) 9.4 minutes. - TRUE
iii)
To find The 25th percentile (Q1)
Now 25th percentile is nothing but median of data which is less that Median value i.e
Median of this observation ( medain value = 9.4 , so observation less than 9.4 are )
1.6 1.7 1.8 1.8 1.9 2.1 2.5 3.0 3.0 4.4 4.5 4.5 5.9 7.1 7.4 7.5 7.7 8.6 9.3
Number of observation less than 9.4 are n1 = 19 ( odd )
Median of odd number is given by
Median of data less than 9.4 = ( n1 + 1 )/2 observation = ( 19 + 1 ) /2 th observation
= 10 th observation
Now 10th observation is 4.4
Thus
Our 25th percentile is 4.4
Thus The 25th percentile (Q1) is 4.4 minutes. - TRUE
iv)
Q3- Q2 > Q2- Q1
Now
Q3- Q2 = 17.5 - 9.4 = 8.1
Q2- Q1 = 9.4 - 4.4 = 5
hence , Q3- Q2 > Q2- Q1 - TRUE
v)
Average X > Median X.
Now we will calculate mean of X
Mean =
Mean =[ 1.6 + 1.7 +1.8 + 1.8 + 1.9 + 2.1 +.......+ 22.5 +23.5 +24.0 +31.7 +32.8 +43.5+ 53.3 ] /38
= 508.2 / 38 = 13.37368
Thus Mean = 13.37368
And Median = 9.4
Hence Average X > Median X. - TRUE
vi)
X distribution is positively skewed.
If the mean is greater than the median, the distribution is positively skewed
Here MEAN = 13.37368 and Median = 9.4
Mean > Median
Hence X distribution is positively skewed. - TRUE
vi
The percentile rank of 5.9 minutes is 13.
Yes
Given data is
X 1.6 1.7 1.8 1.8 1.9 2.1 2.5 3.0 3.0 4.4 4.5 4.5 5.9
rank 1 2 3 4 5 6 7 8 9 10 11 12 13
The percentile rank of 5.9 minutes is 13. - TRUE
vii)
Range of X is 51.7 minutes.
Range = max - min = 53.3 - 1.6 = 51.7
Range of X is 51.7 minutes. - TRUE
viii)
IQR (Inter-Quartile Range) is 13.1 minutes.
IQR = Q3 - Q1 = 17.5 - 4.4 = 13.1
Hence IQR (Inter-Quartile Range) is 13.1 minutes. - TRUE
ix)
There are 2 outliers in X distribution.
outlier is any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile
Thus 1.5 interquartile ranges (IQRs) below the first quartile = 4.4 - 1.5 * IQR = 4.4 - 1.5 * 13.1 = -15.25
And 1.5 interquartile ranges (IQRs) above the third quartile = 17.5 - 1.5 * IQR = 17.5 - 1.5 * 13.1 = 37.15
Hence our data should be in interquartile ranges = ( -15.25 , 37.15 )
Now we can see observation 37th and 38th which are 43.5 and 53.3 respectively are out of given interval
Hence 43.5 and 53.3 are outlier
So we have 2 outlier observation
There are 2 outliers in X distribution. - TRUE
Q4: (This continues Q3: 2 marks) Which one of the following cannot be used to describe the distribution of X?
i)A Histogram. - A histogram displays the shape and spread of continuous sample data.
Hence Histogram can be used to describe the distribution of X
ii)
A Stemplot. -
You could make a frequency distribution table or a histogram for the values, or you can use a
stem-and-leaf plot and let the numbers themselves to show pretty much the same information.
Hence Stemplot can be used to describe the distribution of X
iii)
Skewness and Kurtosis. -
Skewness is a measure of symmetry, or more precisely, the lack of symmetry
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution
Hence Skewness and Kurtosis can be used to describe the distribution of X
iv)
Mean and SD (Standard Deviation). -
The mean can be used to get an overall idea or picture of the data set.
Standard deviation measures the spread of a data distribution .
Hence Mean and SD (Standard Deviation) can be used to describe the distribution of X
v)
The 5-number Summary.
A summary consists of five values: the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median .This makes the five-number summary a useful measure of spread
Hence 5-number Summary can be used to describe the distribution of X
vi)
The coefficient of determination.
The coefficient of determination is used to explain how much variability of one factor can be caused by its relationship to another factor.
Sometimes referred to as the "goodness of fit.
The coefficient of determination is a measure used in statistical analysis that assesses how well a model explains and predicts future outcomes
So coefficient of determination can be used in regreesion model if there is one more dependent variable , hence here it can not describe the distribution of X
vii)
The coefficient of relative variation (CRV).
The coefficient of relative variation (relative standard deviation) is a statistical measure of the dispersion of data points around the mean
Hence coefficient of relative variation (CRV) can be used to describe the distribution of X
viii)
The 1.5 IQR Rule.
The IQR is often seen as a better measure of spread than the range as it is not affected by outliers.
Hence IQR can be used to describe the distribution of X
ix)
The Deciles
A decile is a quantitative method of splitting up a set of ranked data into 10 equally large subsections
Deciles are similar to quartiles. But while quartiles sort data into four quarters, deciles sort data into ten equal parts
So Deciles can be used to describe the distribution of X
x)
A boxplot
A boxplot is a standardized way of displaying the distribution of data based on a five number summary .
So from box-plot we can observe outlier , weather data is symmeteric , skewed etc .
Hence boxplot can be used to describe the distribution of X