1)
Definition: Random sampling is a part of the
sampling technique in which each sample has an equal probability of
being chosen. A sample chosen randomly is meant to be an unbiased
representation of the total population. If for some reasons, the
sample does not represent the population, the variation is called a
sampling error.
Description:
- Random sampling is one of the simplest forms of collecting data
from the total population. Under random sampling, each member of
the subset carries an equal opportunity of being chosen as a part
of the sampling process.
- For example, the total workforce in organisations is 300 and to
conduct a survey, a sample group of 30 employees is selected to do
the survey. In this case, the population is the total number of
employees in the company and the sample group of 30 employees is
the sample.
- Each member of the workforce has an equal opportunity of being
chosen because all the employees which were chosen to be part of
the survey were selected randomly. But, there is always a
possibility that the group or the sample does not represent the
population as a whole, in that case, any random variation is termed
as a sampling error.
- An unbiased random sample is important for drawing conclusions.
For example when we took out the sample of 30 employees from the
total population of 300 employees, there is always a possibility
that a researcher might end up picking over 25 men even if the
population consists of 200 men and 100 women.
- Hence, some variations when drawing results can come up, which
is known as a sampling error. One of the disadvantages of random
sampling is the fact that it requires a complete list of
population.
- For example, if a company wants to carry out a survey and
intends to deploy random sampling, in that case, there should be
total number of employees and there is a possibility that all the
employees are spread across different regions which make the
process of survey little difficult.
2)
Sampling Distributions:
- Suppose that we draw all possible samples of size n
from a given population. Suppose further that we compute a
statistic (e.g., a mean, proportion, standard deviation) for each
sample. The probability distribution of this statistic is called a
sampling distribution. And the standard deviation
of this statistic is called the standard
error.
Variability of a Sampling Distribution
The variability of a sampling distribution is measured by its
variance or its standard deviation. The variability of a sampling
distribution depends on three factors:
- N: The number of observations
in the population.
- n: The number of observations
in the sample.
- The way that the random sample
is chosen.
If the population size is much larger than the sample size, then
the sampling distribution has roughly the same standard error,
whether we sample with or without replacement. On the other hand,
if the sample represents a significant fraction (say, 1/20) of the
population size, the standard error will be meaningfully smaller,
when we sample without replacement.
Sampling Distribution of the Mean
- Suppose we draw all possible samples of size n from a
population of size N. Suppose further that we compute a
mean score for each sample. In this way, we create a sampling
distribution of the mean.
- We know the following about the sampling distribution of the
mean. The mean of the sampling distribution (μx) is
equal to the mean of the population (μ). And the standard error of
the sampling distribution (σx) is determined by the
standard deviation of the population (σ), the population size (N),
and the sample size (n). These relationships are shown in the
equations below:
μx = μ
σx = [ σ / sqrt(n) ] *
sqrt[ (N - n ) / (N - 1) ]
- In the standard error formula, the factor sqrt[ (N - n ) / (N -
1) ] is called the finite population correction or fpc. When the
population size is very large relative to the sample size, the fpc
is approximately equal to one; and the standard error formula can
be approximated by:
σx = σ / sqrt(n).
- You often see this "approximate" formula in introductory
statistics texts. As a general rule, it is safe to use the
approximate formula when the sample size is no bigger than 1/20 of
the population size.
Sampling Distribution of the Proportion
- In a population of size N, suppose that the
probability of the occurrence of an event (dubbed a "success") is
P; and the probability of the event's non-occurrence (dubbed a
"failure") is Q. From this population, suppose that we draw all
possible samples of size n. And finally, within each
sample, suppose that we determine the proportion of successes
pand failures q. In this way, we create a
sampling distribution of the proportion.
- We find that the mean of the sampling distribution of the
proportion (μp) is equal to the probability of success
in the population (P). And the standard error of the sampling
distribution (σp) is determined by the standard
deviation of the population (σ), the population size, and the
sample size. These relationships are shown in the equations
below:
μp = P
σp = [ σ / sqrt(n) ] *
sqrt[ (N - n ) / (N - 1) ]
σp = sqrt[ PQ/n ] * sqrt[
(N - n ) / (N - 1) ]
where σ = sqrt[ PQ ].
Like the formula for the standard error of the mean, the formula
for the standard error of the proportion uses the finite population
correction, sqrt[ (N - n ) / (N - 1) ]. When the population size is
very large relative to the sample size, the fpc is approximately
equal to one; and the standard error formula can be approximated
by:
σp = sqrt[ PQ/n ]
- You often see this "approximate" formula in introductory
statistics texts. As a general rule, it is safe to use the
approximate formula when the sample size is no bigger than 1/20 of
the population size.
Central Limit Theorem
- The central limit theorem states that the
sampling distribution of the mean of any independent, random
variable will be normal or nearly normal, if the sample size is
large enough.
How large is "large enough"? The answer depends on two
factors.
- Requirements for accuracy. The
more closely the sampling distribution needs to resemble a normal
distribution, the more sample points will be required.
- The shape of the underlying
population. The more closely the original population resembles a
normal distribution, the fewer sample points will be required.
In practice, some statisticians say that a sample size of 30 is
large enough when the population distribution is roughly
bell-shaped. Others recommend a sample size of at least 40. But if
the original population is distinctly not normal (e.g., is badly
skewed, has multiple peaks, and/or has outliers), researchers like
the sample size to be even larger.
How to Choose Between T-Distribution and Normal
Distribution
The t distribution and the normal
distribution can both be used with statistics that have a
bell-shaped distribution. This suggests that we might use either
the t-distribution or the normal distribution to analyze sampling
distributions. Which should we choose?
Guidelines exist to help you make that
choice. Some focus on the population standard deviation.
- If the population standard
deviation is known, use the normal distribution
- If the population standard
deviation is unknown, use the t-distribution.
Other guidelines focus on sample
size.
- If the sample size is large,
use the normal distribution. (See the discussion above in the
section on the Central Limit Theorem to understand what is meant by
a "large" sample.)
- If the sample size is small,
use the t-distribution.
In practice, researchers employ a mix
of the above guidelines. On this site, we use the normal
distribution when the population standard deviation is known and
the sample size is large. We might use either distribution when
standard deviation is unknown and the sample size is very large. We
use the t-distribution when the sample size is small, unless the
underlying distribution is not normal. The t distribution should
not be used with small samples from populations that are not
approximately normal.
Test Your Understanding
In this section, we offer two examples
that illustrate how sampling distributions are used to solve commom
statistical problems. In each of these problems, the population
standard deviation is known; and the sample size is large. So you
can use the Normal Distribution Calculator, rather than the
t-Distribution Calculator, to compute probabilities for these
problems.