In: Statistics and Probability
What change do Wasserstein, Schirm, and Lazar recommend, regarding the use of the term “statistically significant”?
The American Statisticians Wasserstein, Schirm, and Lazar recommend the following regarding “statistically significant” :
Don’t base your conclusions solely on whether an association or effect was found to be “statistically significant” (i.e.., the p-value passed some arbitrary threshold such as p < 0.05).
Don’t believe that an association or effect exists just because it was statistically significant.
Don’t believe that an association or effect is absent just because it was not statistically significant.
Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.
Don’t conclude anything about scientific or practical importance based on statistical significance.
P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned.They conclude, based on their review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way.For example, no p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important.
Make acceptance of uncertainty more natural to our thinking by accompanying every point estimate in our research with a measure of its uncertainty such as a standard error or interval estimate. Reporting and interpreting point and interval estimates should be routine.
Thoughtful research includes careful consideration of the definition of a meaningful effect size. As a researcher you should communicate this up front, before data are collected and analyzed. Afterwards is just too late; it is dangerously easy to justify observed results after the fact and to over interpret trivial effect sizes as being meaningful. Many authors in this special issue argue that consideration of the effect size and its “scientific meaningfulness” is essential for reliable inference
Be thoughtful and clear about the level of confidence or credibility that is present in statistical results.
Openness includes understanding and accepting the role of expert judgment, which enters the practice of statistical inference and decision-making in numerous ways by providing sufficient information so that other researchers can execute meaningful alternative analyses.
Be modest in recognizing there is not a “true statistical model” underlying every problem, which is why it is wise to thoughtfully consider many possible models.Calls on researchers to “recognize that behind every choice of null distribution and test statistic, there lurks a plausible family of alternative hypotheses, which can provide more insight into the null distribution.”p-values, confidence intervals, and other statistical measures are all uncertain. Treating them otherwise is immodest overconfidence.