In: Statistics and Probability
1. What is the purpose of using a confidence interval? In other words, what does a confidence interval estimate?
2. Why is there a level of confidence associated with a confidence interval? In other words, why isn't a confidence interval 100% accurate?
Part 1
Imagine that you want to know a characteristic of a population,
for example, the proportion of people in the country who are in
favor of a candidate C. We will call this characteristic
"parameter" of the population. For obvious reasons, it is very
difficult and expensive to know the opinion of all people in the
population.
A possible solution is to take a sample of the population that we
consider "representative" and calculate the proportion of people in
favor of the candidate in that sample. We will call this value
"point estimator of the parameter".
If the sample is a good representation of the population, then the
estimator will be very close to the parameter, but we cannot know
exactly how good the estimator is because we do not know how good
the sample is. To solve this, instead of giving a point gauge, an
estimator is calculated per interval and thus be able to give a
measure of reliability of being "close" to the population
parameter.
In summary, what the interval estimator does is give a confidence
of X% of containing the true parameter. This means that: if 100
researchers each take a sample from the population and calculate X%
confidence intervals, using the same methods, then the number of
intervals containing the population parameter is expected to be
X.
NOTE: it is not correct to say that the parameter is or falls
within the confidence interval, because the parameter is a fixed
value, or that the limits of the interval vary depending on the
sample and the confidence....
Part 2
As mentioned before, the confidence intervals depend on how
representative the sample taken from the population is.
If the sample is 100% representative, then we would obtain 100%
confidence intervals regardless of their amplitude, in fact, it
would not be necessary to calculate confidence intervals, with the
point estimator it would be enough to know the true population
parameter.
The problem is that it is very difficult, or impossible, to know if a sample is an exact representation of the population, it is for this reason that the interval estimation is associated with a confidence level.
This tells us that to obtain 100% confidence intervals we have
two options:
1) Study the entire population, which is not always possible, and
if it were, it would not make sense to calculate a confidence
interval.
2) Give an interval large enough to include all possible
theoretical values that the parameter can take, but this would be
useless. It is like saying that we are 100% sure that when rolling
a die the top face shows a number between 1 and 6, which is
obvious.