In: Economics
Define the Central Limit Theorem. How are confidence intervals related to public opinion polling?
The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape.
Said another way, CLT is a statistical theory stating that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population, divided by each sample's size.
Although this concept was first developed by Abraham de Moivre in 1733, it wasn’t formally named until 1930, when noted Hungarian mathematician George Polya officially dubbed it the Central Limit Theorem.
According to the central limit theorem, the mean of a sample of data will be closer to the mean of the overall population in question, as the sample size increases, notwithstanding the actual distribution of the data. In other words, the data is accurate whether the distribution is normal or aberrant.
As a general rule, sample sizes equal to or greater than 30 are deemed sufficient for the CLT to hold, meaning that the distribution of the sample means is fairly normally distributed. Therefore, the more samples one takes, the more the graphed results take the shape of a normal distribution.
Central Limit Theorem exhibits a phenomenon where the average of the sample means and standard deviations equal the population mean and standard deviation, which is extremely useful in accurately predicting the characteristics of populations.
How confident can the surveyor be about an estimated margin of error? To measure this, surveys also have what is called a "confidence level", which is the likelihood that the estimate between seventy and eighty percent in this case is really accurate.
A confidence level is an expression of how confident a researcher can be of the data obtained from a sample. Confidence levels are expressed as a percentage and indicate how frequently that percentage of the target population would give an answer that lies within the confidence interval. The most commonly used confidence level is 95%. A related concept is called statistical significance.
A researcher's confidence in the probability that their sample is truly representative of the target population is influenced by a number of factors. A researcher's confidence in their study design and implementation—and an awareness of its limitations—is largely based on three important variables: sample size, frequency of response, and population size. Researchers have long agreed that these variables must be carefully considered during the research planning phase.