In: Statistics and Probability
In the course of the thesis work, a student develops a new approach for the solution of a problem (here referred to as method B). The current state-of-the-art approach, method A, is well published in the literature and has been applied to a large standard problem set where its average performance was discovered to be (and published in the main paper by the developers as) 7 with a standard deviation of 3 across the different problems in the problem set. In addition to the publication, the developers of method A also provide their code for anyone to be able to experiment with and the student decides to pick a random set of 15 problems from the standard problem set and apply both methods to these problems, resulting in the following performance numbers for method A: {8, 3, 10, 8, 11, 4, 6, 4, 12, 4, 5, 10, 6, 2, 10}, and the following performance numbers for the student’s method B: {9, 5, 9, 10, 15, 4, 7, 4, 12, 7, 8, 10, 6, 4, 12}. Looking at this data, the student discovers that it seems that method B outperforms method A and sets out to prove this using significance testing with a two-tailed 5% significance threshold. Given that both published performance results as well as the student’s experimental results are available, a number of tests can be performed.
Evaluate the results in terms of the hypothesis that method B has a higher performance than method A. List all the steps (and formulas) involved in the test and what the result implies for the significance of the hypothesis.
Here
= ((-1) + (-2) + 1 + (-2) + (-4) + 0 + 1 + 0 + 0 + (-3) + (-3) + 0 + 0 + (-2) + (-2))/15 = -1.13
sd = sqrt(((-1 + 1.13)^2 + (-2 + 1.13)^2 + (1 + 1.13)^2 + (-2 + 1.13)^2 + (-4 + 1.13)^2 + (0 + 1.13)^2 + (1 + 1.13)^2 + (0 + 1.13)^2 + (0 + 1.13)^2 + (-3 + 1.13)^2 + (-3 + 1.13)^2 + (0 + 1.13)^2 + (0 + 1.13)^2 + (-2 + 1.13)^2 + (-2 + 1.13)^2)/14) = 1.55
The test statistic is
DF = 15 - 1 = 14
At 0.05 significance level, the critical value is t0.05,14 = -1.761
Since the test statistic value is less than the critical value(-2.82 < -1.761), so we should reject the null hypothesis.
At 0.05 significance level, there is sufficient evidence to conclude that method B has a higher performance than method A.