In: Statistics and Probability
In addition to the metrics discussed in class, we can also use t-test to evaluate the difference between two models, i.e., to determine if two sets of performance results are significantly different from each other. In general, the t-test is a statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. Now suppose that we would like to select between two prediction models, M1 and M2. We have performed 10 rounds of 10-fold cross validation on each model, where the same data partitioning in round i is used for both M1 and M2. The error rates obtained for M1 are 30.5, 32.2, 20.7, 20.6, 31.0, 41.0, 27.7, 26.0, 21.5, 26.0. The error rates for M2 are 22.4, 14.5, 22.4, 19.6, 20.7, 20.4, 22.1, 19.4, 16.2, 35.0. Comment on whether one model is significantly better than the other considering a significance level of 1%
Solution:-
The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
Null hypothesis: μ1 - μ2 = 0
Alternative hypothesis: μ1 - μ2 ≠ 0
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference between sample means is too big or if it is too small.SE = sqrt[(s12/n1) + (s22/n2)]
SE = sqrt[(6.320132/10) + (5.4839262/10)] = 2.64608
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
DF = (6.320132/10 + 5.4839262/10)2 / { [ (6.320132 / 10)2 / (9) ] + [ (5.4839262 / 10)2 / (9) ] }
DF = 49.0244856847 / (1.77280732057 + 1.00490228498) = 17.649 or 18
t = [ (x1 - x2) - d ] / SE = [ (27.72 - 21.27) - 0 ] / 2.64608 = 2.437568
where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2, n1 is the size of sample 1, n2 is the size of sample 2, x1 is the mean of sample 1, x2 is the mean of sample 2, d is the hypothesized difference between the population means, and SE is the standard error.
Since we have a two-tailed test, the P-value is the probability that a t statistic having 18 degrees of freedom is more extreme than 2.437568; that is, less than -2.437568 or greater than 2.437568.We use the t Distribution Calculator,
Thus, P-Value is 0.025385.
The result is not significant at p < 0.01