Question

In: Statistics and Probability

In addition to the metrics discussed in class, we can also use t-test to evaluate the...

In addition to the metrics discussed in class, we can also use t-test to evaluate the difference between two models, i.e., to determine if two sets of performance results are significantly different from each other. In general, the t-test is a statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. Now suppose that we would like to select between two prediction models, M1 and M2. We have performed 10 rounds of 10-fold cross validation on each model, where the same data partitioning in round i is used for both M1 and M2. The error rates obtained for M1 are 30.5, 32.2, 20.7, 20.6, 31.0, 41.0, 27.7, 26.0, 21.5, 26.0. The error rates for M2 are 22.4, 14.5, 22.4, 19.6, 20.7, 20.4, 22.1, 19.4, 16.2, 35.0. Comment on whether one model is significantly better than the other considering a significance level of 1%

Solutions

Expert Solution

Solution:-

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

  • State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.

    Null hypothesis: μ1 - μ2 = 0

    Alternative hypothesis: μ1 - μ2 ≠ 0

    Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference between sample means is too big or if it is too small.
  • Formulate an analysis plan. For this analysis, the significance level is 0.01. Using sample data, we will conduct a two-sample t-test of the null hypothesis.
  • <Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t statistic test statistic (t).

    SE = sqrt[(s12/n1) + (s22/n2)]

    SE = sqrt[(6.320132/10) + (5.4839262/10)] = 2.64608

    DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }

    DF = (6.320132/10 + 5.4839262/10)2 / { [ (6.320132 / 10)2 / (9) ] + [ (5.4839262 / 10)2 / (9) ] }

    DF = 49.0244856847 / (1.77280732057 + 1.00490228498) = 17.649 or 18

    t = [ (x1 - x2) - d ] / SE = [ (27.72 - 21.27) - 0 ] / 2.64608 = 2.437568

    where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2, n1 is the size of sample 1, n2 is the size of sample 2, x1 is the mean of sample 1, x2 is the mean of sample 2, d is the hypothesized difference between the population means, and SE is the standard error.

    Since we have a two-tailed test, the P-value is the probability that a t statistic having 18 degrees of freedom is more extreme than 2.437568; that is, less than -2.437568 or greater than 2.437568.

    We use the t Distribution Calculator,

  • Thus, P-Value is 0.025385.
    The result is not significant at p < 0.01

  • Interpret results. Since the P-value (0.025385) is greater than the significance level (0.01), we cannot reject the null hypothesis.
  • Conclusion. Fail to reject the null hypothesis, thus we have insufficient evidence to claim that one model is significantly better than the other considering a significance level of 1%.

Related Solutions

We discussed a number of criteria management can use to evaluate the suitability of projects before...
We discussed a number of criteria management can use to evaluate the suitability of projects before concluding that among NPV analysis, IRR analysis, Payback Period analysis, and Profitability Index analysis, NPV analysis is best. Discuss the purposes of the other 3 types, indicating what they can be used for, as well as where their shortcoming(s) enter into evaluations.
We also discussed the use of the Extended Euclidian algorithm to calculate modular inverses. Use this...
We also discussed the use of the Extended Euclidian algorithm to calculate modular inverses. Use this algorithm to compute the following values. Show all of the steps involved. 9570-1(mod 12935) 550-1 (mod 1769)
This semester we have discussed the following statistical analyses.              Z-test               One-Sample t-test
This semester we have discussed the following statistical analyses.              Z-test               One-Sample t-test                   Independent Groups t-test                  Repeated Measures t-test One-Way ANOVA                             Regression                                           Correlation 1. Research shows that people who do well on the SAT tend to do well in college (they have a higher GPA). Likewise, students who do not do well on the SAT struggle in college (they have a lower GPA). This information is used by college admissions officials to determine if a student should...
In addition to the five factors discussed in the chapter, dividends also affect the price of...
In addition to the five factors discussed in the chapter, dividends also affect the price of an option. The Black–Scholes option pricing model with dividends is:    C=S × e−dt × N(d1) − E × e−Rt × N(d2)C=S⁢ × e−dt⁢ × N(d1)⁢ − E⁢ × e−Rt⁢ × N(d2) d1= [ln(S  /E ) +(R−d+σ2 / 2) × t ] (σ − t√) d1= [ln(S  /E⁢ ) +(R⁢−d+σ2⁢ / 2) × t ] (σ⁢ − t)  d2=d1−σ × t√d2=d1−σ⁢ × t   ...
In addition to the five factors discussed in the chapter, dividends also affect the price of...
In addition to the five factors discussed in the chapter, dividends also affect the price of an option. The Black-Scholes option pricing model with dividends is: C=S × e−dt × N(d1) − E × e−Rt × N(d2) d1= [ln(S  /E ) +(R−d+σ2 / 2) × t ] (σ − t√)  d2=d1−σ × t√ All of the variables are the same as the Black-Scholes model without dividends except for the variable d, which is the continuously compounded dividend yield on the...
I work for a software company. We use several metrics to evaluate new software projects, including...
I work for a software company. We use several metrics to evaluate new software projects, including increased sales, retention, and revenue. One quality we look for is innovation. This often prompts a discussion of "how do you measure innovation." How do you say that one idea is more or less innovative than another? We often turn to our competitors and say if they aren't doing it then the idea is innovative... but that may also be a sign that the...
In addition to the way we phrase questions, the method of surveying can also introduce bias....
In addition to the way we phrase questions, the method of surveying can also introduce bias. What are some potential biases that could occur when using the following survey techniques? - In-person - Mail - Phone - E-mail - Text - Social Media Please make sure you consider—among other things—how expensive each method is, the outreach potential, response time, and response rate.
What metrics do companies use to evaluate ieach of the analyses?
What metrics do companies use to evaluate ieach of the analyses?
What metrics do companies use to evaluate ieach of the analyses?
What metrics do companies use to evaluate ieach of the analyses?
Consider the following processes we've discussed in class. In which of these processes can we observe...
Consider the following processes we've discussed in class. In which of these processes can we observe evolutionary conflict? Group of answer choices A) multi-level selection B) sexual selection C) Coevolution D) Genetic drift A, B, and C A and B All of the above A, B, and D
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT