In: Statistics and Probability
2. Take data sets A and B and delete duplicated values such that each value is unique even when pooling the two data sets. Just like with the previous problem, treat data sets A and B as hypothetical data on the weights of children whose parents smoke cigarettes, and those whose parents do not respectively.
a) Calculate the expected value of the wilcoxon Rank-Sum test statistic E(Wx) assuming the null hypothesis of equal medians being true.
b) Conduct a Wilcoxon-Rank-Sum test on the data
Note: Data set A - 12.36, 12.39, 12.44, 12.50, 12.61, 12.80,12.82, 12.87, 12.89, 12.95, 13.25
Data set B - 12.41, 12.56, 12.61, 12.64, 12.70, 12.85, 13.05, 13.08
b)
First, we put both samples together and organize it in ascending order, which is shown in the table below:
Sample | Value |
1 | 12.36 |
1 | 12.39 |
2 | 12.41 |
1 | 12.44 |
1 | 12.50 |
2 | 12.56 |
1 | 12.61 |
2 | 12.61 |
2 | 12.64 |
2 | 12.70 |
1 | 12.80 |
1 | 12.82 |
2 | 12.85 |
1 | 12.87 |
1 | 12.89 |
1 | 12.95 |
2 | 13.05 |
2 | 13.08 |
1 | 13.25 |
Now, that the values that are in ascending order are assigned ranks to them, taking care of assigning the average rank to values with rank ties
Sample | Value | Rank | Rank (Adjusted for ties) |
1 | 12.36 | 1 | 1 |
1 | 12.39 | 2 | 2 |
2 | 12.41 | 3 | 3 |
1 | 12.44 | 4 | 4 |
1 | 12.50 | 5 | 5 |
2 | 12.56 | 6 | 6 |
1 | 12.61 | 7 | 7.5 |
2 | 12.61 | 8 | 7.5 |
2 | 12.64 | 9 | 9 |
2 | 12.70 | 10 | 10 |
1 | 12.80 | 11 | 11 |
1 | 12.82 | 12 | 12 |
2 | 12.85 | 13 | 13 |
1 | 12.87 | 14 | 14 |
1 | 12.89 | 15 | 15 |
1 | 12.95 | 16 | 16 |
2 | 13.05 | 17 | 17 |
2 | 13.08 | 18 | 18 |
1 | 13.25 | 19 | 19 |
The sum of ranks for sample 1 is:
R1 = 1+2+4+5+7.5+11+12+14+15+16+19 = 106.5
and the sum of ranks of sample 2 is:
R2 = 3+6+7.5+9+10+13+17+18 = 83.5
Hence, the test statistic is R = R1=106.5.
(1) Null and Alternative Hypotheses
The following null and alternative hypotheses need to be tested:
H0: Median (Difference) = 0
H1: Median (Difference) ≠ 0
(2) Rejection Region
The critical value for the signficance level provided and the type of tail is Rc = 46, and the null hypothesis is rejected if R ≤ 46.
(3) Decision about the null hypothesis
Since in this case R = 106.5 > 46, there is not enough evidence to claim that the population median of differences is different than 0, at the 0.05 significance level.