In: Statistics and Probability
In polynomial regression, what assumptions underlie the (strict) validity of the various p-values and confidence intervals?
The full set of assumptions is embodied in a statistical model that underpins the method. This model is a mathematical representation of data variability, and thus ideally would capture accurately all sources of such variability. Many problems arise however because this statistical model often incorporates unrealistic or at best unjustified assumptions. This is true even for so-called “non-parametric” methods, which (like other methods) depend on assumptions of random sampling or randomization. These assumptions are often deceptively simple to write down mathematically, yet in practice are difficult to satisfy and verify, as they may depend on successful completion of a long sequence of actions (such as identifying, contacting, obtaining consent from, obtaining cooperation of, and following up subjects, as well as adherence to study protocols for treatment allocation, masking, and data analysis).
In most applications of statistical testing, one assumption in the model is a hypothesis that a particular effect has a specific size, and has been targeted for statistical analysis. (For simplicity, we use the word “effect” when “association or effect” would arguably be better in allowing for noncausal studies such as most surveys.) This targeted assumption is called the study hypothesis or test hypothesis, and the statistical methods used to evaluate it are called statistical hypothesis tests. Most often, the targeted effect size is a “null” value representing zero effect (e.g., that the study treatment makes no difference in average outcome), in which case the test hypothesis is called the null hypothesis.