In: Statistics and Probability
The first problem of this assignment considers a situation where the random variable in question is a sample mean. This exercise addresses the situation where the random variable in question is a proportion. Suppose you have been hired by the Better Business Bureau (BBB) to investigate the settlement ratio of the complaints they have received. You plan to select a sample of n complaints to estimate the proportion of complaints the BBB is able to settle. We use p to denote the percentage or proportion of complaints settled among all the complaints that the BBB has received. Let Y be the random variable, which indicates whether a complaint is settled. Without loss of generality, let Y be 1 if a complaint is settle, the probability of which is p; 0 if not settled. Unlike in problem 1 where we don’t know the probability distribution of the population random variable X (the amount a retail customer pays for H&R Block’s service), here we do know the probability distribution of Y. What probability distribution does Y follow? Compute its mean and standard deviation. Now suppose you select a random sample of n complaints and find that p ̅ of them have been settled (not surprisingly, p ̅ is called the sample proportion). Assume the sample size n is sufficiently large. What do we know about the probability distribution of p ̅ (sampling distribution of the sample proportion)? Let’s apply the results above and derive some confidence intervals. Note that the population proportion p is unknown. In order to compute the standard error σ_p ̅ , we substitute p ̅ for p. As long as the sample size n is sufficiently large, a normal distribution would approximate the sample distribution of the sample proportion p ̅ well enough. Suppose the sample proportion you’ve found is 0.6. Find a 95% confidence interval of the population proportion, if the sample size is 36, 100, and 400, respectively. What effect does the sample size n have on the resulting confidence interval? Please copy your R code and the result and paste them here. It is often the case that we have a target for margin of error in mind and we want to know the sample size needed to guarantee such a margin of error when the confidence level is given. Suppose that the margin of error is m and the confidence level is 1-α, and norm.s.inv is the inverse standard normal distribution function. Derive a formula for computing the sample size needed. You can use R function qnorm. In the above formula, you will probably need the population proportion p. As we know, p is unknown. You may consider using the sample proportion p ̅ instead. But typically when we are deciding the sample size, we haven’t started the sampling process and thus the sample proportion p ̅ is also unavailable. Thus most people use p=0.5 instead. Provide the revised formula for the sample size needed given p=0.5 and explain why it is reasonable to do so. Use the above formula to compute the sample sizes needed when the respective value of m is 1%, 3%, and 5% and the respective confidence level is 90%, 95%, and 99%. You may fill out the table below and round your answers up to an integer. m = 1% m = 3% m = 5% 90% 95% 99% Historically, the Better Business Bureau settled 75% of complaints they received. Suppose you have been hired by the Better Business Bureau to investigate the complaints they received involving new car dealers because the bureau thinks that the settlement ratio of complaints involving new car dealers is significantly different from 75%. You plan to conduct a hypothesis test. You select a sample of 450 new car dealer complaints and find that 70% of them have settled. What would be the null and the alternative hypotheses you test? Compute the test statistic for your tests above. Suppose the significance level α is 5%. Compute the both critical values of the z test statistic. And explain how we can use these two critical values to draw a conclusion to the hypothesis test. (This is called the critical value approach for hypothesis testing). Please copy your R code and the result and paste them here. What conclusion should you draw for your tests? Provide a practical interpretation of this conclusion. Suppose the significance level α is 5%. Compute the p value for your test. And explain how we can use the p value to draw a conclusion to the hypothesis test. What conclusion should you draw for your tests? (This is called the critical value approach for hypothesis testing). Please copy your R code and the result and paste them here. The hypothesis test above is a two-tailed test. Now, let’s consider a one-tailed test. There are two types of one-tailed test: upper-tail test and lower-tail test. To determine whether it’s upper- or lower-tail test, simply look at the sign of the alternative hypothesis. If it is “less than” type, then this is a lower-tail test; if it is “greater than”, then this is an upper-tail test. Let’s reuse our BBB example. Historically, the Better Business Bureau settled 75% of complaints they received. Suppose you have been hired by the Better Business Bureau to investigate the complaints they received involving new car dealers because the bureau thinks that the settlement ratio of complaints involving new car dealers is significantly lower than 75%. You plan to conduct a hypothesis test. You select a sample of 450 new car dealer complaints and find that 70% of them have settled. What would be the null and the alternative hypotheses you test? Your test statistic remains the same. Suppose the significance level α is 1%. Compute the critical value of the z test statistic. Compute the p value for your test. What conclusion should you draw for your tests? Please copy your R code and the result and paste them here.
Here is a Bernoulli random variable with PMF
The mean of Bernoulli random variable is
The mean and sd of the sample is
The CI for sample mean is .
When
The R code is
n <- 36
p <- 0.6
alpha <- 0.05
p + qnorm(alpha/2)*c(1,-1)*sqrt(p*(1-p)/n)
The outputs (CI) are respectively for
0.4399696, 0.7600304
0.5039818, 0.6960182
0.5519909 , 0.6480091
Margin of error is . The sample size is
The R code for finding the sample size for varous values of is given below.
p <- 0.5
alpha <- 0.05
m <- 0.01
n <- (qnorm(alpha/2))^2*p*(1-p)/m^2
n
9603.647
The test hypotheses are
The CI is
The 95% CI is
n <- 450
p <- 0.7
alpha <- 0.05
p + qnorm(alpha/2)*c(1,-1)*sqrt(p*(1-p)/n)
Since the CI is less than we reject the null hypothesis with 95% confidence.
The test statistic is
R code for computing test statistic is below.
n <- 450
p <- 0.7
alpha <- 0.051-p)/n)
z<- p/sqrt(p*(1-p)/n)
z
The output is