In: Statistics and Probability
If you want an overall type I error rate of 0.05 for your study, why is testing at the 0.05 level not appropriate for interim analysis of efficacy?
What can you do to conduct appropriate interim analysis of efficacy?
Every time we look at the data and consider stopping, we introduce the chance of falsely rejecting the null hypothesis. In other words, every time we look at the data, we have the chance of a type 1 error. If we look at the data multiple times, and we use alpha of 0.05 as our criterion for significance, we have a 5% chance of stopping each time. Under the true null hypothesis and just 2 looks at the data, we “approximate” the error rates as: Probability stop at the first look: 0.05, probability stop at the second look: 0.95 × 0.05 = 0.0475, and total probability of stopping is 0.0975.
We can obtain P < 0.05, but not declare statistical significance at the final look. O′Brien-Fleming bounds use more conservative stopping boundaries at early stages. These bounds spend little alpha at the time of the interim looks and lead to boundary values at the final stage that are close to those from the fixed sample design, avoiding the problem with the Pocock bounds. The classical Pocock and O′Brien-Fleming boundaries require a prespecified number of equally spaced looks. However, a Data Safety Monitoring Board (DSMB) may require more flexibility. Alternatively, one could specify an alpha-spending function that determines the rate at which the overall type I error is to be spent during the trial. At each interim look, the type I error is partitioned according to this alpha-spending function to derive the corresponding boundary values. Because the number of looks neither has to be prespecified nor equally spaced, an O-Brien-Fleming type alpha-spending function has become the most common approach to monitoring efficacy in clinical trials. Some investigators have suggested that using “P” to denote “statistical significance” as a way to denote the detection of an “effect” is inappropriate, and offer other solutions such as the provision of effect size estimates and their precision from confidence intervals.[10,11]
Given the lack of standard statistical methods for retrospective adjustment of P values due to unplanned interim analyses, unplanned interim analyses should be avoided as they can flaw the results of a well-planned clinical trial. The performance of a clinical trial is only justified if the clinical investigators in advance consider ethical aspects and if an external Ethical Committee has approved the conduct of the study according to a defined protocol. A great deal of recent discussion in the clinical trials literature has focused on response-adaptive randomization in two-arm trials; however, this represents a fairly specific and relatively infrequently used type of adaptive clinical trial