In: Statistics and Probability
Explain what a continuous probability distribution is and how it is used.
2. Is the height of a probability curve over a "given point" a probability? Explain.
3. List five important properties of the normal probability curve.
4. Explain why sampling with replacement is preferred over sampling without replacement.
5. What does the Central Limit Thereom tell us about the sampling distribution of the sample mean?
1.
What is a continuous distribution?
A continuous distribution describes the probabilities of the possible values of a continuous random variable. A continuous random variable is a random variable with a set of possible values (known as the range) that is infinite and uncountable.
Probabilities of continuous random variables (X) are defined as the area under the curve of its PDF. Thus, only ranges of values can have a nonzero probability. The probability that a continuous random variable equals some value is always zero.
Example of the distribution of weights
The continuous normal distribution can describe the distribution of weight of adult males. For example, you can calculate the probability that a man weighs between 160 and 170 pounds.
Distribution plot of the weight of adult males
The shaded region under the curve in this example represents the range from 160 and 170 pounds. The area of this range is 0.136; therefore, the probability that a randomly selected man weighs between 160 and 170 pounds is 13.6%. The entire area under the curve equals 1.0.
However, the probability that X is exactly equal to some value is always zero because the area under the curve at a single point, which has no width, is zero. For example, the probability that a man weighs exactly 190 pounds to infinite precision is zero. You could calculate a nonzero probability that a man weighs more than 190 pounds, or less than 190 pounds, or between 189.9 and 190.1 pounds, but the probability that he weighs exactly 190 pounds is zero.
*********
2.For a discrete r.v., it is the probability of that outcome. For a continuous r.v., it is the change in probability when that point is included in the outcome set .
**************
3.Properties of a Normal Distribution
A normal distribution is a continuous probability distribution for
a random variable x. The graph of a normal
distribution is called the normal curve, which has all of the
following properties:
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and is symmetric about the
mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches, the
x-axis.
5. Between µ − σ and µ + σ the graph is concave down and elsewhere
the graph is concave up. The points at
which the graph changes concavity are called inflection points.
************************************
4.
When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn't affect what we get on the second. Mathematically, this means that the covariance between the two is zero.
In sampling without replacement, the two sample values aren't independent. Practically, this means that what we got on the for the first one affects what we can get for the second one. Mathematically, this means that the covariance between the two isn't zero. That complicates the computations. In particular, if we have a SRS (simple random sample) without replacement, from a population with variance , then the covariance of two of the different sample values is , where N is the population size.
Sampling with replacement has two advantages over sampling without replacement as I see it:
1) You don't need to worry about the finite population correction.
2) There is a chance that elements from the population are drawn multiple times - then you can recycle the measurements and save time.
*********************
5.
The sampling distribution of the mean was defined in the section introducing sampling distributions. This section reviews some important properties of the sampling distribution of the mean introduced in the demonstrations in this chapter.
Mean
The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean μ, then the mean of the sampling distribution of the mean is also μ. The symbol μM is used to refer to the mean of the sampling distribution of the mean. Therefore, the formula for the mean of the sampling distribution of the mean can be written as:
μM = μ
Variance
The variance of the sampling distribution of the mean is computed as follows:
That is, the variance of the sampling distribution of the mean is the population variance divided by N, the sample size (the number of scores used to compute a mean). Thus, the larger the sample size, the smaller the variance of the sampling distribution of the mean.
(optional) This expression can be derived very easily from the variance sum law. Let's begin by computing the variance of the sampling distribution of the sum of three numbers sampled from a population with variance σ2. The variance of the sum would be σ2 + σ2 + σ2. For N numbers, the variance would be Nσ2. Since the mean is 1/N times the sum, the variance of the sampling distribution of the mean would be 1/N2 times the variance of the sum, which equals σ2/N.
The standard error of the mean is the standard deviation of the sampling distribution of the mean. It is therefore the square root of the variance of the sampling distribution of the mean and can be written as:
The standard error is represented by a σ because it is a standard deviation. The subscript (M) indicates that the standard error in question is the standard error of the mean.
Central Limit Theorem
The central limit theorem states that:
Given a population with a finite mean μ and a finite non-zero variance σ2, the sampling distribution of the mean approaches a normal distribution with a mean of μ and a variance of σ2/N as N, the sample size, increases.
The expressions for the mean and variance of the sampling distribution of the mean are not new or remarkable. What is remarkable is that regardless of the shape of the parent population, the sampling distribution of the mean approaches a normal distribution as N increases. If you have used the "Central Limit Theorem Demo," you have already seen this for yourself. As a reminder, Figure 1 shows the results of the simulation for N = 2 and N = 10. The parent population was a uniformdistribution. You can see that the distribution for N = 2 is far from a normal distribution. Nonetheless, it does show that the scores are denser in the middle than in the tails. For N = 10 the distribution is quite close to a normal distribution. Notice that the means of the two distributions are the same, but that the spread of the distribution for N = 10 is smaller.
Figure 1. A simulation of a sampling distribution. The parent population is uniform. The blue line under "16" indicates that 16 is the mean. The red line extends from the mean plus and minus one standard deviation.
Figure 2 shows how closely the sampling distribution of the mean approximates a normal distribution even when the parent population is very non-normal. If you look closely you can see that the sampling distributions do have a slight positive skew. The larger the sample size, the closer the sampling distribution of the mean would be to a normal distribution.
Figure 2. A simulation of a sampling distribution. The parent population is very non-normal.