In: Statistics and Probability
Explain the factors that needed to consider in calculating sample size for a cluster randomised controlled trial and how these factors affect sample size. (max. 200 words)
RCT: sample size formulae under individual randomisation
For a trial using individual randomisation, for fixed power (1 - β) and fixed sample size (n), the detectable difference, d I , with variance var(d I ) = 2σ 2/n I is:
(1)
where z α/2 denotes the upper 100α/2 standard normal centile.
For a trial with n sample size, the power to detect a pre-specified difference of d, is 1 - β I , such that:
(2)
where Φ is the cumulative standardised Normal distribution.
And, finally the required sample size for a trial at pre-specified power 1 - β to detect a pre-specified difference of d, is n I , where:
(3)
Using Normal approximations, the above formulae can be used for binary outcomes, by approximating the variance (σ 2) of the proportions π 1 and π 2, by:
(4)
for testing the two sided hypothesis H 0 : π 1 = π 2.
CRCTs: standard sample size formulae under cluster randomisation
Suppose, instead of randomising over individuals, the trial will randomise over k clusters each of size m, to provide a total of n C = mk individuals. Then, by standard results [1], the variance of the difference to be detected d C is inflated by the Variance Inflation Factor (VIF):
(5)
where ρ is the Intra-Cluster Correlation (ICC) coefficient, which represents how strongly individuals within clusters are related to each other. Where the cluster sizes are unequal this variance inflation factor can be approximated by:
(6)
where cv represents the coefficient of variation of the cluster sizes and is the average cluster size. Thus, the variance of d C (for fixed cluster sizes) becomes:
(7)
and this is simply extended for varying cluster sizes using equation 6. To determine the required sample size for a CRCT with a pre-specified power 1 - β, to detect the pre-specified difference d, and where there are m individuals within each cluster, then the required sample size n C = km , follows straightforwardly from equations 3 and 5 and is:
(8)
where n I is the required sample size using a trial with individual randomization to detect a difference d, and VIF can be modified to allow for variation in cluster sizes (equation 6). This is the standard result, that the required sample size for a CRCT is that required under individual randomisation, inflated by the variance inflation factor [1]. The number of clusters required is then:
(9)
assuming equal cluster sizes. This slight modification of the common formula for the number of required clusters (over that say presented in [2]), has rounded up the total sample size to a multiple of the cluster size (using the ceiling function). For, unequal cluster sizes (using the VIF at equation 6) this becomes:
(10)
again with rounding up to the average cluster size.