In: Statistics and Probability
We are interested in estimating the proportion of graduates from Lancaster University who found a job within one year of completing their undergraduate degree. Suppose we conduct a survey and find out that 354 of the 400 randomly sampled graduates found jobs. The number of students graduating that year was over 4000.
(a) State the central limit theorem.
(b) Why is the central limit theorem useful?
(c) What is the population parameter of interest? What is the point estimate of this parameter?
(d) What are the assumptions for constructing a confidence interval based on these data? Are they met?
(e) Calculate a 95% confidence interval for the proportion of graduates who found a job within one year of completing their undergraduate degree. Interpret this within the context of the data.
(f) Without doing any calculations, describe what would happen to the confidence interval if we decided to use a higher confidence level, e.g., 99%.
(g) Without doing any calculations, describe what would happen to the confidence interval if we used a larger sample.
Solution
Part (a)
Central Limit Theorem
Let {X1, X2, …, Xn} be a sequence of n independent and identically distributed (i.i.d) random variables drawn from a distribution [i.e., {x1, x2, …, xn} is a random sample of size n] of expected value given by µ and finite variance given by σ2. Then, as n gets larger, the distribution of Z = {√n(Xbar − µ)/σ}, approximates the normal distribution with mean 0 and variance 1 (i.e., Standard Normal Distribution)
Or symbolically, Z = {√n(Xbar − µ)/σ} ~ N(0, 1) …………………………………………….....................................…………… (1a)
i.e., sample average from any distribution with mean µ and variance σ2, which is fairly symmetric, follows Normal Distribution with mean µ and variance σ2/n, if the sample size, n is large enough, say 30 or more......................... (1b)
In current scenario, distribution of sample proportion, phat can be approximated by Normal Distribution with mean = E(phat) and standard deviation = SE(phat) Answer 1................................................................................. (1c)
Part (b)
Central Limit Theorem is useful because in many practical situations, the population distribution may not be known or the distribution may be complicated and difficult to handle analytically. In all such situations, CLT provides a very easy tool to handle the situation since Normal distribution is well researched and documented in terms of easy to handle probability tables. Answer 2
Part (c)
1. Population parameter of interest is the population proportion, i.e., contextually, the true proportion of graduates who found a job within one year of completing their undergraduate degree. Answer 3
2. Point estimate of this parameter is the sample proportion. Answer 4
Part (d)
Assumptions for constructing a confidence interval based on these data
CI in this case is based on Normality approximation, which requires that the sample size is large enough for both nphat and nphat(1 – phat) to be 10 or more. Answer 5
In the given situation, n = 400, phat = 354/400 = 0.885. So, conditions are met. Answer 6
Part (e)
100(1 - α) % Confidence Interval for the population proportion, p is: phat ± MoE, .....................................………………. (2)
where
MoE = Zα/2[√{phat (1 – phat)/n}] ……………………………………........................................................................………..(2a)
with
Zα/2 is the upper (α/2)% point of N(0, 1),
phat = sample proportion, and
n = sample size.
So, 95% confidence interval for the proportion of graduates who found a job within one year of completing their undergraduate degree is: [0.85, 0.92] Answer 7
Details of calculations
n |
400 |
X |
354 |
p' = phat |
0.885 |
F = p'(1-p')/n |
0.000254 |
sqrtF |
0.015951 |
α |
0.05 |
1 - (α/2) |
0.975 |
Zα/2 |
1.959964 |
MoE |
0.031264 |
LB |
0.853736 |
UB |
0.916264 |
Contextual Interpretation: There is only 2.5% chance that actual proportion of graduates who found a job within one year of completing their undergraduate degree could be less than 85.4% or more than 91.6%. Answer 8
Part (f)
When confidence level increases, vide (2) only MoE will change and vide (2a), that change also is effected only through the percentage point, Zα/2 which increases as confidence level increases.
Thus, the width of the CI will increase. Answer 9
Part (g)
As sample size increases, the SE would decrease since n is in the denominator.
Thus, the CI will narrow down. Answer 10
DONE