In: Statistics and Probability
Confidence intervals indicate that we move from a point estimate to an interval. The conceptual issue is that in any specific sampling the true population mean may or may not be found! How do you explain this?
The confidence level is the probability that your confidence interval truly captures the population parameter being estimated.
There are two types of estimators, Point Estimates & Interval Estimates.
A point estimate is a type of estimation that uses a single value, oftentimes a sample statistic, to infer information about the population parameter as a single value or point.
An interval estimate is a type of estimation that uses a range (or interval) of values, based on sampling information, to “capture” or “cover” the true population parameter being inferred.
The likelihood that the interval estimate contains the true population parameter is given by the confidence level.
The below image is the population distribution, with the true population mean shown, which is the population parameter that we’re attempting to estimate both point and interval
Point Estimate for the true Population Mean
So let’s say we’ve recently purchased 5,000 widgets to be consumed in our next manufacturing order, and we require that the average length of the widget of the 5,000 widgets is 2 inches.
Instead of measuring all 5,000 units, which would be extremely time consuming and costly, and in other cases possibly destructive, we can take a sample from that population and measure the average length of the sample.
As you know, the sample mean can be calculated by simply summing up the individual values and dividing by the number of samples measured.
Example of Sample Mean Calculation
Calculate the sample mean value of the following 5 length measurements for our lot of widgets: 16.5, 17.2, 14.5, 15.3, 16.1
The Interval Estimate for the true population mean
Interval estimates are created using a confidence level, which is the probability that your interval truly captures the population parameter being estimated.
Because we use a confidence level, we often call these interval estimates a confidence interval.
You can see an example of the confidence interval below.
The image starts with the population distribution in orange, and this distribution has an unknown population mean, which we’re attempting to estimate.
The conceptual issue is that in any specific sampling the true population mean may or may not be found!
Anytime we’re using an estimator to infer a population parameter, you will naturally incur some risk (or likelihood) of inferring incorrectly.
Then entire field of Inferential statistics, by nature, involves a certain element of risk
So to minimize the risk associated with estimators, we desire two characteristics of a high quality estimator, that they are unbiased & efficient.
Unbiased
An unbiased estimator is one who’s expected value is equal to the population parameter being estimated.
Consider the situation where we are repeatedly sampling (sample size = n) from a population distribution.
Each sample would have its own distribution of values, which are all shown under the main population distribution.
Let’s say you sampled 100 units from a population of 1,000 and you calculated the sample mean.
Based on the random nature of sampling, you’d expect each sample taken to likely have a different sample mean.
Now let’s say you repeated this sampling 30 times; and you plotted the distribution of sample means.
This new distribution of sample means has its own variance & expected value (mean value).
A point estimate (the sample mean, in this example) is considered unbiased if it’s expected value is equal to the parameter that it is estimating.