In: Statistics and Probability
Question: do these data represent compelling evidence that the proportion of unemployed 18-29 year-olds that have diabetes is di↵erent from the proportion of employed 18-29 year-olds that have diabetes?
[Counts] |
Unemployed |
Employed |
Total |
Diabetes |
146 |
717 |
863 |
No Diabetes |
5709 |
47057 |
57766 |
Total |
5855 |
47774 |
58629 |
(a) In the notation we are using in this course, what is the value of n2?
(b) In the notation we are using in this course, what is the value of pˆ1?
(c) In the notation we are using in this course, what is the value of pˆ2?
(d) Verify that the conditions are met for you to test H0 using a confidence interval approach.
(e) Test H0 at the 0.001 significance level, using a confidence interval approach. Conclude with a clear
verdict as to whether you reject the null.
(f) Making reference to your work so far this problem, comment on the distinction between a difference being “significant” and a difference being “large.”
a) n2 = 47774
b)
n1 = 5855
x1 = 146
p̂1 = x1/n1 = 0.0249
c)
n2 = 47774
x2 = 717
p̂2 = x2/n2 = 0.0150
d) Assumption:
The data is random and independent of each other.
The sample size must be sufficiently large.
When the sample size, n, should be no more than 10% of the population.
e)
99.9% Confidence interval for the difference:
At α = 0.001, two tailed critical value, z_c = NORM.S.INV(0.001/2) = 3.291
Lower Bound = (p̂1 - p̂2) - z_c*√ [(p̂1*(1-p̂1)/n1)+(p̂2*(1-p̂2)/n2) ] = (0.0249 - 0.015) - 3.291*√[(0.0249*0.9751/5855) + (0.015*0.985/47774)] = 0.0030
Upper Bound = (p̂1 - p̂2) + z_c*√ [(p̂1*(1-p̂1)/n1)+(p̂2*(1-p̂2)/n2) ] = (0.0249 - 0.015) + 3.291*√[(0.0249*0.9751/5855) + (0.015*0.985/47774)] = 0.0169
0.003 < p1 -p2 < 0.0169
As the confidence interval do not contain 0, so we reject the null hypothesis.