In: Statistics and Probability
in chapter 5 we learned how to estimate the population mean by repeatedly sampling the population. describe how this works?
often the population is so large that we actually assume the population is infinite, to make some of the maths easier. Because the population is large, we usually cannot hope to calculate the parameter of interest (e.g. the population mean) exactly, because to do so we would have to obtain income information from the large population. It is not enough to just have sample statistics (such as the sample mean) that average out (over a large number of hypothetical samples) to the correct target (i.e., the population mean). We would also like sample statistics that would have "low" variability from one hypothetical sample to another. At the very least we need to be able to quantify this variability, known as sampling uncertainty. One way to do this is to consider the sampling distribution of a statistic, that is, the distribution of values of a statistic under repeated (hypothetical) samples. Again, we can use results from probability theory to tell us what these sampling distributions are. So, all we need to do is take a single random sample, calculate a statistic, and we’ll know the theoretical sampling distribution of that statistic (i.e., we’ll know what the statistic should average out to over repeated samples, and how much the statistic should vary over repeated samples).
We have a population of x values whose histogram is the probability distribution of x. Select a sample of size n from this population and calculate a sample statistic e.g. . This procedure can be repeated indefinitely and generates a population of values for the sample statistic.Suppose that a sample of size sixteen (N=16) is taken from some population. The mean of the sixteen numbers is computed. Next a new sample of sixteen is taken, and the mean is again computed. If this process were repeated a large number of times then we can get a good estimate of population mean.