In: Math
There are many different kinds of hypothesis tests, including one- and two-sample t-tests, tests for association, tests for normality, and many more.
In a hypothesis test, you're going to look at two propositions: the null hypothesis (or H0 for short), and the alternative (H1). The alternative hypothesis is what we hope to support. The null hypothesis, in contrast, is presumed to be true, until the data provide sufficient evidence that it is not.
A similar idea underlies the U.S. criminal justice system: you've heard the phrase "Innocent until proven guilty"? In the statistical world, the null hypothesis is taken for granted until the alternative is proven true. The null hypothesis is never proven true; you simply fail to reject it.
When we Accept or Reject the NULL Hypotheses?
The degree of statistical evidence we need in order to “prove” the alternative hypothesis is the confidence level. The confidence level is simply 1 minus the Type I error rate (alpha, also referred to as the significance level), which occurs when you incorrectly reject the null hypothesis. The typical alpha value of 0.05 corresponds to a 95% confidence level: we're accepting a 5% chance of rejecting the null even if it is true. (When hypothesis-testing life-or-death matters, we can lower the risk of a Type I error to 1% or less.)
Regardless of the alpha level we choose, any hypothesis test has only two possible outcomes:
In the results of a hypothesis test, we typically use the p-value to decide if the data support the null hypothesis or not. If the p-value is very low (typically below 0.05), statisticians say "the null must go."
Upon realizing that statistical tests are usually misinterpreted, one may wonder what if anything these tests do for science. They were originally intended to account for random variability as a source of error, thereby sounding a note of caution against overinterpretation of observed associations as true effects or as stronger evidence against null hypotheses than was warranted. But before long that use was turned on its head to provide fallacious support for null hypotheses in the form of “failure to achieve” or “failure to attain” statistical significance.
Here's the bottom line: even if we fail to reject the null hypothesis, it does not mean the null hypothesis is true. That's because a hypothesis test does not determine which hypothesis is true, or even which is most likely: it only assesses whether available evidence exists to reject the null hypothesis.
Considering that quantitative reports will always have more information content than binary (significant or not) reports, we can always argue that raw and/or normalized effect size, confidence intervals, or Bayes factor must be reported. Reporting everything can however hinder the communication of the main result(s), and we should aim at giving only the information needed, at least in the core of a manuscript. Here I propose to adopt optimal reporting in the result section to keep the message clear, but have detailed supplementary material. When the hypothesis is about the presence/absence or order of an effect, and providing that a study has sufficient power, Considering that quantitative reports will always have more information content than binary (significant or not) reports, we can always argue that raw and/or normalized effect size, confidence intervals, or Bayes factor must be reported. Reporting everything can however hinder the communication of the main result(s), and we should aim at giving only the information needed, at least in the core of a manuscript. Here I propose to adopt optimal reporting in the result section to keep the message clear, but have detailed supplementary material. When the hypothesis is about the presence/absence or order of an effect, and providing that a study has sufficient power, NHST is appropriate and it is sufficient to report in the text the actual p-value since it conveys the information needed to rule out equivalence. When the hypothesis and/or the discussion involve some quantitative value, and because p-values do not inform on the effect, it is essential to report on effect sizes is appropriate and it is sufficient to report in the text the actual p-value since it conveys the information needed to rule out equivalence. When the hypothesis and/or the discussion involve some quantitative value, and because p-values do not inform on the effect, it is essential to report on effect sizes preferably accompanied by confidence or credible intervals. The reasoning is simply that one cannot predict and/or discuss quantities without accounting for variability.
Because science progress is obtained by cumulating evidence scientists should also anticipate the secondary use of the data. With today’s electronic articles, there are no reasons for not including all derived data (mean, standard deviations, effect size, CI, Bayes factor) as supplementary tables (or even better also share raw data). It is also essential to report the context in which tests were performed – that is to report all tests performed (all t, F, p values) because of the increase type I error rate due to selective reporting (multiple comparisons and p-hacking problems - Providing all of this information allows (i) other researchers to directly and effectively compare their results in quantitative terms ii) to compute power to future studies and (iii) to aggregate results for meta-analyses whilst minimizing publication bias
THE CORRECT USE OF NULL HYPOTHESES SIGNIFICANCE TESTING(NHST)
NHST has always been criticized, and yet is still used every day in scientific reports . One question to ask oneself is what is the goal of a scientific experiment at hand? If the goal is to establish a discrepancy with the null hypothesis and/or establish a pattern of order (i.e. establish that A > B), because both requires ruling out equivalence (i.e. ruling out A=B), then NHST is a good tool . If the goal is to test the presence of an effect (i.e. compute its probability) and/or establish some quantitative values related to an effect, then NHST is not the method of choice since testing can only reject the null hypothesis.
While a Bayesian analysis is suited to estimate the probability that a hypothesis is correct, like NHST, it does not prove a theory by itself, but adds to its plausibility . It has however another advantage: it allows to choose between competing hypotheses while NHST cannot prove any specific hypothesis No matter what testing procedure is used and how strong results are, reminds us that ‘ no isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon’.
Similarly, the recent statement of the American Statistical Association makes it clear that conclusions should be based on the researcher's understanding of the problem in context, along with all summary data and tests, and that no single value (whether p-values, Bayes factors or something else) can be used to support or invalidate a theory