In: Statistics and Probability
a) Create four multiple choice exam questions (and their solutions) that test knowledge on calculating and interpreting confidence intervals. One for each of the following conditions.
1) For the population mean with known sigma
2) For the population mean and unknown sigma
3) For the population proportion
4) Sample size determination
b) Solve the questions created above. The Solutions should include the formula in general, numbers plugged in, final answer and interpretation in terms of the business application.
(a) 1. Confidence Intervals for poMean and Known Standard Deviation
For a population with unknown mean and known standard deviation , a confidence interval for the population mean, based on a simple random sample (SRS) of size n, is + z*, where z* is the upper (1-C)/2 critical value for the standard normal distribution.
Note: This interval is only exact when the population distribution is normal. For large samples from other population distributions, the interval is approximately correct by the Central Limit Theorem.
(b). 1. Example
The student calculated the sample mean of the boiling temperatures to be 101.82, with standard deviation 0.49. The critical value for a 95% confidence interval is 1.96, where (1-0.95)/2 = 0.025. A 95% confidence interval for the unknown mean is ((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96, 101.82 + 0.96) = (100.86, 102.78).
As the level of confidence decreases, the size of the corresponding interval will decrease. Suppose the student was interested in a 90% confidence interval for the boiling temperature. In this case, C = 0.90, and (1-C)/2 = 0.05. The critical value z* for this level is equal to 1.645, so the 90% confidence interval is ((101.82 - (1.645*0.49)), (101.82 + (1.645*0.49))) = (101.82 - 0.81, 101.82 + 0.81) = (101.01, 102.63)
(a). 2. Confidence Interval for Mean With an Unknown Sigma
We will work through a list of steps required to find our desired confidence interval. Although all of the steps are important, the first one is particularly so:
(b) . 2. Example
To see how we can construct a confidence interval, we will work through an example. Suppose we know that the heights of a specific species of pea plants are normally distributed. A simple random sample of 30 pea plants has a mean height of 12 inches with a sample standard deviation of 2 inches. What is a 90% confidence interval for the mean height for the entire population of pea plants?
We will work through the steps that were outlined above:
(a). 3.Confidence Interval for the Population Proportion
If there are more than 5 successes and more than 5 failures, then the confidence interval can be computed with this formula:
The point estimate for the population proportion is the sample proportion, and the margin of error is the product of the Z value for the desired confidence level (e.g., Z=1.96 for 95% confidence) and the standard error of the point estimate. In other words, the standard error of the point estimate is:
This formula is appropriate for samples with at least 5 successes and at least 5 failures in the sample. This was a condition for the Central Limit Theorem for binomial outcomes. If there are fewer than 5 successes or failures then alternative procedures, called exact methods, must be used to estimate the population proportion.
(b). 3.Example: During the 7th examination of the Offspring cohort in the Framingham Heart Study there were 1219 participants being treated for hypertension and 2,313 who were not on treatment. If we call treatment a "success", then x=1219 and n=3532. The sample proportion is:
This is the point estimate, i.e., our best estimate of the proportion of the population on treatment for hypertension is 34.5%. The sample is large, so the confidence interval can be computed using the formula:
Substituting our values we get
which is
So, the 95% confidence interval is (0.329, 0.361).
Thus we are 95% confident that the true proportion of persons on antihypertensive medication is between 32.9% and 36.1%.
(a).4. Confidence interval for sample size determination.
The module on confidence intervals provided methods for estimating confidence intervals for various parameters (e.g., μ , p, ( μ1 - μ2 ), μd , (p1-p2)). Confidence intervals for every parameter take the following general form:
Point Estimate + Margin of Error
In the module on confidence intervals we derived the formula for the confidence interval for μ as
In practice we use the sample standard deviation to estimate the population standard deviation. Note that there is an alternative formula for estimating the mean of a continuous outcome in a single population, and it is used when the sample size is small (n<30). It involves a value from the t distribution, as opposed to one from the standard normal distribution, to reflect the desired level of confidence. When performing sample size computations, we use the large sample formula shown here. [Note: The resultant sample size might be small, and in the analysis stage, the appropriate confidence interval formula must be used.]
The point estimate for the population mean is the sample mean and the margin of error is
In planning studies, we want to determine the sample size needed to ensure that the margin of error is sufficiently small to be informative. For example, suppose we want to estimate the mean weight of female college students. We conduct a study and generate a 95% confidence interval as follows 125 + 40 pounds, or 85 to 165 pounds. The margin of error is so wide that the confidence interval is uninformative. To be informative, an investigator might want the margin of error to be no more than 5 or 10 pounds (meaning that the 95% confidence interval would have a width (lower limit to upper limit) of 10 or 20 pounds). In order to determine the sample size needed, the investigator must specify the desired margin of error. It is important to note that this is not a statistical issue, but a clinical or a practical one. For example, suppose we want to estimate the mean birth weight of infants born to mothers who smoke cigarettes during pregnancy. Birth weights in infants clearly have a much more restricted range than weights of female college students. Therefore, we would probably want to generate a confidence interval for the mean birth weight that has a margin of error not exceeding 1 or 2 pounds.
The margin of error in the one sample confidence interval for μ can be written as follows:
.
Our goal is to determine the sample size, n, that ensures that the margin of error, "E," does not exceed a specified value. We can take the formula above and, with some algebra, solve for n:
First, multipy both sides of the equation by the square root of n. Then cancel out the square root of n from the numerator and denominator on the right side of the equation (since any number divided by itself is equal to 1). This leaves:
Now divide both sides by "E" and cancel out "E" from the numerator and denominator on the left side. This leaves:
Finally, square both sides of the equation to get:
This formula generates the sample size, n, required to ensure that the margin of error, E, does not exceed a specified value. To solve for n, we must input "Z," "σ," and "E."
Sometimes it is difficult to estimate σ. When we use the sample size formula above (or one of the other formulas that we will present in the sections that follow), we are planning a study to estimate the unknown mean of a particular outcome variable in a population. It is unlikely that we would know the standard deviation of that variable. In sample size computations, investigators often use a value for the standard deviation from a previous study or a study done in a different, but comparable, population. The sample size computation is not an application of statistical inference and therefore it is reasonable to use an appropriate estimate for the standard deviation. The estimate can be derived from a different study that was reported in the literature; some investigators perform a small pilot study to estimate the standard deviation. A pilot study usually involves a small number of participants (e.g., n=10) who are selected by convenience, as opposed to by random sampling. Data from the participants in the pilot study can be used to compute a sample standard deviation, which serves as a good estimate for σ in the sample size formula. Regardless of how the estimate of the variability of the outcome is derived, it should always be conservative (i.e., as large as is reasonable), so that the resultant sample size is not too small.
The formula produces the minimum sample size to ensure that the margin of error in a confidence interval will not exceed E. In planning studies, investigators should also consider attrition or loss to follow-up. The formula above gives the number of participants needed with complete data to ensure that the margin of error in the confidence interval does not exceed E.
(b).4. Example
An investigator wants to estimate the mean systolic blood pressure in children with congenital heart disease who are between the ages of 3 and 5. How many children should be enrolled in the study? The investigator plans on using a 95% confidence interval (so Z=1.96) and wants a margin of error of 5 units. The standard deviation of systolic blood pressure is unknown, but the investigators conduct a literature search and find that the standard deviation of systolic blood pressures in children with other cardiac defects is between 15 and 20. To estimate the sample size, we consider the larger standard deviation in order to obtain the most conservative (largest) sample size.
In order to ensure that the 95% confidence interval estimate of the mean systolic blood pressure in children between the ages of 3 and 5 with congenital heart disease is within 5 units of the true mean, a sample of size 62 is needed. [Note: We always round up; the sample size formulas always generate the minimum number of subjects needed to ensure the specified precision.] Had we assumed a standard deviation of 15, the sample size would have been n=35. Because the estimates of the standard deviation were derived from studies of children with other cardiac defects, it would be advisable to use the larger standard deviation and plan for a study with 62 children. Selecting the smaller sample size could potentially produce a confidence interval estimate with a larger margin of error.
For better understanding Iam mixing part (b) to part (a) with their examples.
Please rate!