In: Statistics and Probability
In preparations for an upcoming sporting event, athletes from two different countries recorded the following running times in minutes: country A [24, 16, 25, 18, 13, 26, 22, 28, 19, 29], country B [15, 17, 9, 21, 20, 10, 13, 18, 22, 11, 12]. The researchers do NOT believe that the underlying distribution is normal while it is the same in both countries. Apply a suitable statistical test to see whether there is a difference in running times. What are the correct p-value and decision at a confidence level of 95%?
Problem statement:- For a set of observations provided from two groups of data (which do not follow normal distribution) , we have to infer whether there is a statisitcal difference the between two groups.
Given:- We are presented with a set of data points for country A ,country B. Each data point corresponds to running times of the athletes from each country, with reference to the upcoming sports event. Using statistical methods we have to conclude whether there is a statistical difference in running times betweentwo countries, provided data so not follow normal distribution. We have 10 observations for country A and 11 observations for country B.
Solution:-
Since the data is not originating from a normal distribution, to study whether there is a difference in running times between two countries, we have to opt for a non-parametric statistical method called Mann-Whitney U test (Also called as Wilcoxon rank sum test).
To conclude whether there is a difference in running times, we have to calculate the U statistc from Mann-Whitney U test , compute the corresponding p-value and make an inference on difference in running times.
Mann-Whitney U test steps:
step (1): - Formulation of hypothesis
First we have to formulatethe hypothesis to proceed with the statistical test.
Null hypothesis (H0) : There is no difference in running time of athletes across both the countries.
Alternate Hypothesis (Ha) : There is a difference in runnnig time of athletes across both the countries.
Step (2):- Calculation of the statistic
Observations are as follows
Running time (mins) for country A |
Running time (mins) for Country B |
24 | 15 |
16 | 17 |
25 | 9 |
18 | 21 |
13 | 20 |
26 | 10 |
22 | 13 |
28 | 18 |
19 | 22 |
29 | 11 |
12 |
The "U" statistic is considered as the smallest of U1 and U2 where,
where,
n1 =number of observations for country A
n2= number of observations for country B
R1= sum of the ranks for country 1
R2= sum of the ranks for country 2
Sub-step (2): Computation of R1 and R2
To compute R1 and R2 we have to assign the data points from country A and country B from smallest values to Largest values and rank them 1 to 21 (total number of samples from country A and country B). To proceed with this we have to proceed with construcuting the following table.
Running time (mins) for country A |
Running time (mins) for Country B |
Arranging country A observations smallest to Largest | Arranging country B observations smallest to Largest | Ranks for Country A | Ranks for country B |
24 | 15 | 9 | 1 | ||
16 | 17 | 10 | 2 | ||
25 | 9 | 11 | 3 | ||
18 | 21 | 12 | 4 | ||
13 | 20 | 13 | 13 | 5.5 | 5.5 |
26 | 10 | ||||
22 | 13 | 15 | 7 | ||
28 | 18 | 16 | 8 | ||
19 | 22 | 17 | 9 | ||
29 | 11 | 18 | 18 | 10.5 | 10.5 |
12 | 19 | 12 | |||
20 | 13 | ||||
21 | 14 | ||||
22 | 22 | 15.5 | 15.5 | ||
24 | 17 | ||||
25 | 18 | ||||
26 | 19 | ||||
28 | 20 | ||||
29 | 21 |
Using the above table to compute R1 and R2 we get,
R1 = sum of ranks for country A = (5.5+8+10.5+12+15.5+17+18+19+20+21) = 146.5
R2 = sum of ranks for country B = (1+2+3+4+5.5+7+9+10.5+13+14+15.5) = 84.5
Using the values of R1 and R2 to compute U1 and U2 we get,
U1 = (10*11) +(10*(10+1)/2) -146.5 =18.5
U1 = (10*11) +(11*(11+1)/2) -84.5 =91.5
U statistic is is the smallest of U1 and U2, so,
U statistic = 18.5
Step (3):- Computation of P-value and conclusion
To compute p-value we have to refer to the U-statistic reference table.
For n1= 10, n2 =11, for a confiedence interval of 95%, the critical U-statistic value is 26.
We would reject our null hypothesis of U-statistic is less than the critical U-statistic value. In our case computed U-statisitc is 18.5 and critical value is 26, which implies computed U-statistic is less than the critical value, we would reject the null hypothesis and conclude that there is a statistical difference in running times of athletes between the two countries.