In: Statistics and Probability
6. Describe at least two potential problems with NHST.
11. Why do reported or “nominal” p values often seriously underestimate the true risk of a Type I error?
14. What conclusions can be drawn from a study with a null result?
15. What conclusions can be drawn from a study with a “statistically significant” result?
16. Briefly discuss: What information do you look at to evaluate whether an effect obtained in an experiment is large enough to have “practical” or “clinical” significance?
19. A p value can be interpreted as a (conditional) risk that a decision to reject H0 is a Type I error, but the p values reported in research papers are valid indications of the true risk of Type I error only if the data meet the assumptions for the test and the researcher has followed the rules that govern the use of significance tests. Identify one of the most common researcher behaviors that make the actual risk of Type I error much higher than the “nominal” risk of Type I error that is set by choosing an alpha level.
Solving 1st 4 questions
6)
NHST starts by assuming that a null hypothesis, H0, is true, where H0 is typically a statement of zero effect, zero difference, or zero correlation in the population of interest. A p value is then calculated, where p is the probability, if H0 is true, of obtaining the observed result, or more extreme. A low p value, typically p<.05, throws doubt on H0 and leads to the rejection of H0 and a conclusion that the effect in question is statistically significant.
The second limitation is that the p value is very likely to be quite different if an experiment is repeated. For example if a two-tailed result gives p = 0.05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p<.00008, and fully a 10% chance that p>.44. In other words, a p value provides only extremely vague information about a result’s repeatability. Researchers do not appreciate this weakness of p.
11)
tatistical tests are designed for use with hypotheses that have been formulated before the inspection of data. Performing a statistical test on a characteristic because observations of data attracted the investigator’s interest does not yield a correct p-value; it will look smaller than it should. To understand why, consider the differences in betting on a horse before and after the race. Multiple testing of hypotheses is a problem related to this phenomenon. Since every single test has a type I error rate of 5%, the use of multiple testing will increase the rate of getting at least one type 1 error
14) A null result occurs when we fail to reject H0. This is commonly referred to as a non-significant result or ns. There are two possible “realities” when the null is not rejected. First, is the case where the null is true. If H0 were true, the probability of failing to reject H0 is .95. When the null is true, it is unlikely to make a Type I error. Second, is the situation wherein H0 is false but we fail to reject H0. This reflects the Type II error probability (1 - Power). If the null is actually false, this probability is ideally about .20. This means that, given an ideal situation, when H0is false it will not be rejected 1 out of 5 times. This result does not convey statistical certainty. It is likely that either H0 is true or H0 is false. Neither result is ruled unlikely. This differs sharply from situations wherein we can reject H0. When we reject H0, there is a comparatively small probability that we have made an error (i.e., Type I error rate of 5%).
15)
In principle, a statistically significant result (usually a difference) is a result that’s not attributed to chance.
More technically, it means that if the Null Hypothesis is true (which means there really is no difference), there’s a low probability of getting a result that large or larger.