In: Statistics and Probability
The mean preparation fee H&R Block charged retail customers last year was $183. Use this price as the population mean and assume the population standard deviation of preparation fees is $50.
Let X be the population random variable. What does X stand for and why is X a random variable?
What probability distribution does the random variable X follow? Suppose and are the mean and the standard deviation of the probability distribution of X. What are the values of and .
Now we randomly select 30 H&R Block retail customers. This process is called random sampling and 30 is the sample size, conventionally denoted by n. Let , known as sample mean, be the average or mean price these 30 retail costumers pay for. Please explain why must also be a random variable.
As a random variable, must follow some probability distribution.What can we say about this probability distribution? This probability distribution is called the sampling distribution of the sample mean . Let be the mean of this sampling distribution and be the standard deviation of the sampling distribution. , the standard deviation of the sampling distribution of the sample mean, is usually referred to as the standard error for convenience. What can we say about and ? These results are typically referred to as the Central Limit Theorem.
In this example, the sample size n is 30. If we increase n, what effect does it have on the sampling distribution?
In this example, what are the values of the mean and the standard deviation of the sampling distribution of the sample mean?
What is the probability that the mean price for a sample of 30 H&R Block retail customers is within $8 (this value is generally called margin of error) of population mean? What is the probability that the mean price for a sample of 50 H&R Block retail customers is within $8 of population mean? What is the probability that the mean price for a sample of 100 H&R Block retail customers is within $8 of population mean?
Please copy your R code and the result and paste them here.
What conclusions can we draw from g)?
What sample size would you recommend to have at least a .95 probability that the sample mean is within $8 of population mean?
Please copy your R code and the result and paste them here.
In reality, we rarely know about and . Therefore, we usually apply the process described above reversely. For instance, we can use the sample mean to infer the population mean. And this process is called statistical inference. Now, let’s assume we don’t know the population mean; but we still know the population standard deviation to be $ (We will handle the situation where is unknown later). We randomly sampled n H&R Block retail customers and the mean price is . What is the probability that the population mean is within $m (recall that this is the margin of error) of the sample mean? This range of m is called a confidence interval and the resulting probability is called the confidence level. To answer the question above, first understand that the probability we seek after is . Please prove the following and explain why we want this result:
Furthermore, use pnorm(.) from R as the CDF of a normal distribution. Prove that
.
Let’s apply the above result. Given that the population standard deviation is $50, we randomly sampled 40 H&R Block retail customers and the mean price is $183. What is the probability that the population mean is within $5 of the sample mean?
Please copy your R code and the result and paste them here.
In practice, we typically have desirable confidence levels, with 90%, 95%, and 99% being the most commonly used ones. We, instead, would like to find the corresponding margin of error and the resulting confidence interval. Once again, let $ be the population standard deviation, which is known. We randomly sampled n H&R Block retail customers and the mean price is . Suppose the confidence level we want is 1-. ( is called significance level, which we will use extensively later on. If the confidence level is, say 90%, then the significance level is 10%, vice versa.) What would be the margin of error that provides the confidence level of 1-? And what would be the confidence interval that provides the confidence level of 1-? To answer these questions, we are essentially looking for the value of m (margin of error) such that
.
Recall that
. Let qnorm (.) from R be the inverse norm distribution function. Show that:
. (In many books, , or some similar notations, usually called the critical z score, have been used to denote the z score that corresponds to the confidence level of 1-; i.e., .)
Let’s apply the results above. We randomly sampled 100 H&R Block retail customers and the mean price is $183, assuming the population standard deviation is still $50. Construct a 90%, 95%, and 99% confidence interval of the population mean, respectively.
Please copy your R code and the result and paste them here.
Provide a practical interpretation of the above 90% confidence interval. What conclusions can you draw based on the 90%, 95%, and 99% confidence intervals you constructed above?
Now, assume the sample mean is still $183 and the population standard deviation is $50. Construct a 92% confidence interval for the sample sizes 36, 64, and 100, respectively. What conclusions can you draw based on the 92% confidence intervals you constructed for different sample sizes 36, 64, and 100?
Please copy your R code and the result and paste them here.
So far, we have assumed that the population standard deviation is known. In practice, it usually isn’t the case. When the population standard deviation is unknown, the best we can do is to replace it with the sample standard deviation, s. Just like the sample mean is a random variable, so is the sample standard deviation s. The replacement of with s adds more variability. Some adjustment to the Central Limit Theorem is thus necessary. It turns out that when the population standard deviation is unknown and the sample size n is sufficiently large, approximately follows a t distribution with a degree of freedom n-1. As a result, when we look for a confidence interval with unknown, we will replace the normal distributes with the t distribution. Let t.inv be the inverse t distribution function. More specifically, the margin of error m, can be given as . Use this result to find a 92% confidence interval for the population mean price that the retail customers pay for, given the sample mean is $183, the sample standard deviation is $50, and the sample size is 36. Comparing this result to that in question (o), you should notice that the margin of error is slightly larger when the population standard deviation is unknown.
Please copy your R code and the result and paste them here.
The population mean of the price is $183 and assume the population standard deviation of preparation fees is $50. The mean of observations is a random variable with mean and sd .
According to CLT, the distribution of tends to be normal with mean and sd . as becomes very large.
The probability is found using R as given below for .
n <- 50
smu <- 183
ssigma <- 50/sqrt(n)
pnorm(smu + 8, mean = smu, sd = ssigma )-pnorm(smu - 8, mean = smu,
sd = ssigma )
The probability is found using R as given below for .
n <- 100
smu <- 183
ssigma <- 50/sqrt(n)
pnorm(smu + 8, mean = smu, sd = ssigma )-pnorm(smu - 8, mean = smu,
sd = ssigma )
The probability
The probability is found using R.
n <- 40
smu <- 183
ssigma <- 50/sqrt(n)
pnorm(smu + 5, mean = smu, sd = ssigma )-pnorm(smu - 5, mean = smu,
sd = ssigma )
The CI for sample mean is .
The R code below
alpha <- 0.1
n <- 100
smu <- 183
ssigma <- 50/sqrt(n)
CI <- smu + c(1,-1)*qnorm(alpha/2)*ssigma
CI
The 90%, 95%, 99% CI are
[1] 174.7757 191.2243, [1] 173.2002 192.7998, [1] 170.1209 195.8791
The other questions can be done easily.
We are required to do only 4 parts. I have done more than that.