In: Statistics and Probability
Why are we justified in pooling the population proportion estimates and the standard error of the differences between these estimates when we conduct significance tests about the difference between population proportions?
Answer:
"Pooling" is the name given to a technique used to obtain a more
precise estimate of the standard deviation of a sample statistic by
combining the estimates given by two (or more) independent samples.
When performing tests (or calculating confidence intervals) for a
difference of two means, we do not pool. In other statistical
situations we may or may not pool, depending on the situation and
the populations being compared. For example, the theory behind
analysis of variance and the inferences for simple regression are
based on pooled estimates of variance. The rules for inference
about two proportions firmly go both(!) ways. We always use a
pooled estimate of the standard deviation (based on a pooled
estimate of the proportion) when carrying out a hypothesis test
whose null hypothesis is p1 =
p2 -- but not when constructing a confidence
interval for the difference in proportions. Why?
In any hypothesis test, we are calculating conditional
probabilities based on the assumption that the null hypothesis is
true. For example, in calculating the sample z with
proportions or t with means, we use the values derived
from the null hypothesis as the mean of our sampling distribution;
if the null hypothesis determines a value for the standard
deviation of the sample statistic, we use that value in our
calculations. If the null hypothesis fails to give us a value for
the standard deviation of our statistic, as is the case with means,
we estimate the standard deviation of the statistic using sample
data.
The special feature of proportions important for this discussion is
that the value of p determines the value of
(the standard deviation of
):
. This is very different from the situation for means, where two
populations can have identical means but wildly different standard
deviations -- and thus different standard deviations of the sample
means. We can't estimate
from a value of
; we need to go back to the data and look at deviations. In the
one-population case, this special feature means that our test
statistic
follows a z, rather than t, distribution when we
work with one proportion. In this case, we actually
do know the variance based on the null
hypothesis.
When we move to considering two populations and the difference
between proportions of "successes," our null hypothesis for a test
is generally p1 = p2 (or
equivalently, p1 - p2 = 0
). This null hypothesis implies that the estimates of
p1 and p2 -- that is,
and
-- are both estimates for the assumed common proportion of
"successes" in the population (that is, the proportion). If the
null hypothesis is true -- and all our calculations are based on
this assumed truth -- we are looking at two independent samples
from populations with the same proportion of successes. So with
independent random samples, the variance of the
differencein sample proportions (
) is given by the sum of the variances, according to the familiar
rules of random variables:
.
When we are carrying out a test, we don't know the value of
p -- in fact, we are asking if there is any such single
value -- so we don't claim to know the value for (
). We calculate our best estimate of
from our best estimate of p, which is "total number of
successes/total number of trials" (in our usual notation,
). Substituting this value of
for both p1 and
p2 gives our estimate of
; we have merged the data from the two samples to obtain what is
called the "pooled" estimate of the standard deviation. We have
done this not because it is more convenient (it isn't -- there's
more calculation involved) nor because it reduces the measurement
of variability (it doesn't always -- often the pooled estimate is
larger*) but because it gives us the best estimate
of the variability of the difference under our null hypothesis that
the two sample proportions came from populations with the same
proportion. Using the inappropriate formula will either increase
the β-risk beyond what is claimed or increase the α-risk beyond
what is intended; neither is considered a good result.
Thus for a hypothesis test with null hypothesis
p1 = p2, our test statistic
(used to find the p-value or to compare to the critical
value in a table) is
with
.
Of course, the above discussion applies only to hypothesis tests in
which the null hypothesis isp = p2.
For estimating the difference
p1 - p2 , we are not
working under the assumption of equal proportions; there would be
nothing to estimate if we believe the proportions are equal. So our
estimate of p1 -
p2 is
. Likewise, if we have null hypothesis of the form
p1 = p2 + k , our
assumption is that the proportions are different, so there is no to
estimate by pooling, and our test statistic is
.
So we have the answer to the original question. When we carry out a
test with null hypothesisp1 =
p2, all our calculations are based on the
assumption that this null is true — so our best estimate for the
variance (and thus the standard deviation) of the difference
between sample proportions (
) is given by the "pooled" formula. In all other inferences on two
proportions (estimation of a difference, a test with null
p1 = p2 + k), we
do not have any such assumption — so our best estimate for the
variance of the difference between sample proporions is given by
the "unpooled" formula. We pool for the one case, and do not pool
for the others, because in the one case we must treat the two
sample proportions as estimates of the same value and in the other
cases we have no justification for doing so.
*A technical footnote: Here are some cases in which we can readily
compare the relative sizes of pooled and unpooled estimates.
1. If
, the two (pooled and unpooled) estimates of
will be exactly the same, since we obtain
.
2. If the sample sizes are equal
(n1 = n2 = n),
then
. In this case, the unpooled estimate of the variance of the
difference is
, and the pooled estimate of variance of the difference is
, which can (with heroic algebra!) be rewritten as
, so the pooled estimate is actually larger unless the sample
proportions are equal.
3. If the sample proportions are unequal but
equally extreme (equally far from .5), then we have
and
with 0 ‹ e ‹ .5. In this case,
, the pooled estimate of variance can be written
, and the unpooled estimate can be written
and the difference is
, so the pooled estimate is always larger than the unpooled
estimate.
For example, with
and
(so that e = .3 ), with n1
= 10 and n2 = 15, the unpooled estimate of
variance is .02667 and the pooled estimate is .04107,
and
.
4. If the sample sizes are different enough
(precise cutoffs are difficult to state), and the more
extreme(further from .5) sample proportion comes from the
largersample, the pooled estimate of the variance
will be smaller than the unpooled estimate, but if the more extreme
proportion is from the smaller sample, the pooled estimate of
variance will be larger than the unpooled estimate. For example,
consider the following table showing the effects of sample size
when and
:
n1 | n2 | Pooled Estimate | Unpooled Estimate | |
15 | 10 | .0336 | .025 | Pooled is larger |
10 | 15 | .0286 | .03 | Pooled is smaller |
For
and
: (same degree of "extremeness" as in the table, but on opposite
sides of .5), a greater difference in sample sizes is required to
show the same effect — but sample sizes of 15 and 35 suffice, as
shown here:
n1 | n2 | Pooled Estimate | Unpooled Estimate | |
35 | 15 | .0236 | .0129 | Pooled is larger |
15 | 35 | .0179 | .0186 | Pooled is smaller |