In: Statistics and Probability
Polling is ubiquitous. These polls often present a margin of error along with the % distribution of the polling numbers. Select a poll. Determine what statistical value is being used to express margin of error ( variance, standard error etc), determine whether or not the value is being used correctly and critic the polls data analysis and presentation.
margin of error means in a poll with two choices. For example, if a poll estimates two candidates are to have 45% and 55% of the vote respectively and the margin of error is 4%, then we can say with 95% certainty that the first candidate will receive between 41% and 49% of the vote. This means that the second candidate will have between 51% and 59% of the vote. We can say this because there are only two choices and the total must add up to 100%.
A model of poll results
The response for a particular candidate can be modeled as a binary random variable X. The poll itself is a sample of X, often enumerated X1,X2,…,Xn This means that each of the XiXi is a random variable like X but all the Xi are independent.
If we code a positive response as 1 and a negative response as 0, then the poll total S=X1+X2+⋯+Xn counts the people who favor this candidate. (Even though the final conclusions will be the same regardless of how the results are coded, this nice relationship is precisely why people use such 0-1 coding.) The percentages named in the question are realizations of the ratios S/nS/n, which is the mean of the variables,
S/n=1n∑i=1nXi=X¯.
Suppose the expectation of X
is p the true proportion of people in the population favoring the
candidate. Although we never know pp, we can still reason about it
mathematically. In particular, p determines the distribution of X:
X takes on the value 1 with probability p and otherwise has the
value 00 with probability 1−p. Xis said to have a Bernoulli
distribution with parameter p.
Standard errors
The variance of X will determine the standard error of the poll. The next step is to compute this variance. The computation starts with the expectation. The expectation of Xis, by definition, its probability-weighted value
E[X]=p(1)+(1−p)0=p.
Its variance, again by definition, is the expectation of (X−E[X])2 computed as
Var(X)=p(1−p)2+(1−p)(0−p)2=p(1−p)..
It is useful to know this quantity varies between 0 and 1/4, attaining its maximum at p=1/2, and staying close to this maximum for p≈1/2. For instance, if p=1/4 or p=3/4the variance of 1/4×3/4=3/16 is still not much smaller than 1/4
The variance of X¯therefore equals
Var(X¯)=Var(1nS)=1n2∑i=1nVar(X)=1n2(np(1−p))=p(1−p)n..
The variance of the sum is the sum of the variances because the Xiare independent.
The standard deviation of the poll mean is the square root of its variance. Since we don't know p, we estimate p from the poll.This estimate p^ usually is taken to be the fraction of people favoring the candidate in the poll. Plugging this estimate into the variance formula and taking the square root gives the estimated standard deviation of the poll estimate, also known as its standard error:
SE(p^)=p^(1−p^)n−−−−−−−−√.
Because p^(1−p^) will be close to (but slightly less than) 1/4when p^ is anywhere near 50%, we may conservatively overestimate the standard error even before taking the poll by using 1/4 in the calculation. That is, the largest possible standard error of any response in a poll of nobservations is √1/(4n)