In: Statistics and Probability
Problem: Proportion of ”Cured” Cancer Patients: How Does Canada Compare with Europe?
Lung cancer remains the leading cause of cancer death for both Canadian men and women, responsible for the most potential years of life lost to cancer. Lung cancer alone accounts for 28% of all cancer deaths in Canada (32%. in Quebec). Most forms of lung cancer start insidiously and produce no apparent symptoms until they are too far advanced. Consequently, the chances of being cured of lung cancer are not very promising, with the five-year survival rate being less than 15%. The overall data for Europe show that the number of patients who are considered ”cured” is rising steadily. For lung cancer, this proportion rose from 6% to 8%. However, there was a wide variation in the proportion of patients cured in individual European countries. For instance, the study shows that for lung cancer, less than 5% of patients were cured in Denmark, the Czech Republic, and Poland, whereas more than 10% of patients were cured in Spain.
2. Suppose two independent samples were taken. The following data were recorded:
Quebec: n1 = 150 Number of deaths due to cancer= x1 = 47
Rest of Canada: n2 = 1000 Number of deaths due to cancer= x2 = 291
(a) Suppose the scientists have no preconceived theory concerning which proportion parameter is the larger and they wish to detect only a difference between the two parameters, if it exists. What should they choose as the null and alternative hypotheses for a statistical test?
(b) What type of error could occur in testing the null hypothesis in (a), if Ho is false?
(c) Calculate the standard error of difference between the two sample proportions. Make sure to use the pooled estimate for the common value of true proportion.
(d) Calculate the test statistic that you would use for the test in (a). Based on your knowledge of the standard normal distribution, is this a likely or unlikely observation, assuming that Ho is true and the two population proportions are the same?
(e) Find the p-value for the test. Test for a significant difference between the population proportions at the 1% significance level.
(f) Find the rejection region when α = 0.01. Using critical value approach, determine whether the data provide sufficient evidence to indicate a difference between the population proportions.
(g) Use a 95% confidence interval to estimate the actual difference between the cancer death proportions for the people in Quebec versus rest of the Canada. Summarize your findings.
2.
Given that,
sample one, x1 =47, n1 =150, p1= x1/n1=0.313
sample two, x2 =291, n2 =1000, p2= x2/n2=0.291
null, Ho: p1 = p2
alternate, H1: p1 != p2
level of significance, α = 0.01
from standard normal table, two tailed z α/2 =2.576
since our test is two-tailed
reject Ho, if zo < -2.576 OR if zo > 2.576
we use test statistic (z) = (p1-p2)/√(p^q^(1/n1+1/n2))
zo =(0.313-0.291)/sqrt((0.294*0.706(1/150+1/1000))
zo =0.56
| zo | =0.56
critical value
the value of |z α| at los 0.01% is 2.576
we got |zo| =0.56 & | z α | =2.576
make decision
hence value of |zo | < | z α | and here we do not reject
Ho
p-value: two tailed ( double the one tail ) - Ha : ( p != 0.5599 )
= 0.5755
hence value of p0.01 < 0.5755,here we do not reject Ho
ANSWERS
---------------
a.
wish to detect only a difference between the two parameters,
null, Ho: p1 = p2
alternate, H1: p1 != p2
b.
Type 2 error is possible because when its fails to reject the null
hypothesis.
c.
standard error = sqrt( p1 * (1-p1)/n1 + p2 * (1-p2)/n2 )
where
p1, p2 = proportion of both sample observation
n1, n2 = sample size
standard error = sqrt( (0.313*0.687/150) +(0.291 *
0.709/1000))
=0.041
pooled estimate for the common value of true proportion = (n1*p1
+n2*p2)/(n1+n2) = (47+291)/(1150) = 0.293
d.
test statistic: 0.56
critical value: -2.576 , 2.576
decision: do not reject Ho
e.
p-value: 0.5755
f.
we do not have enough evidence to support the claim that the actual
difference between the cancer death proportions for the people in
Quebec versus rest of the Canada.
g.
TRADITIONAL METHOD
given that,
sample one, x1 =47, n1 =150, p1= x1/n1=0.313
sample two, x2 =291, n2 =1000, p2= x2/n2=0.291
I.
standard error = sqrt( p1 * (1-p1)/n1 + p2 * (1-p2)/n2 )
where
p1, p2 = proportion of both sample observation
n1, n2 = sample size
standard error = sqrt( (0.313*0.687/150) +(0.291 *
0.709/1000))
=0.041
II.
margin of error = Z a/2 * (standard error)
where,
Za/2 = Z-table value
level of significance, α = 0.05
from standard normal table, two tailed z α/2 =1.96
margin of error = 1.96 * 0.041
=0.079
III.
CI = (p1-p2) ± margin of error
confidence interval = [ (0.313-0.291) ±0.079]
= [ -0.057 , 0.102]
-----------------------------------------------------------------------------------------------
DIRECT METHOD
given that,
sample one, x1 =47, n1 =150, p1= x1/n1=0.313
sample two, x2 =291, n2 =1000, p2= x2/n2=0.291
CI = (p1-p2) ± sqrt( p1 * (1-p1)/n1 + p2 * (1-p2)/n2 )
where,
p1, p2 = proportion of both sample observation
n1,n2 = size of both group
a = 1 - (confidence Level/100)
Za/2 = Z-table value
CI = confidence interval
CI = [ (0.313-0.291) ± 1.96 * 0.041]
= [ -0.057 , 0.102 ]
-----------------------------------------------------------------------------------------------
interpretations:
1) we are 95% sure that the interval [ -0.057 , 0.102] contains the
difference between
true population proportion P1-P2
2) if a large number of samples are collected, and a confidence
interval is created
for each sample, 95% of these intervals will contains the
difference between
true population mean P1-P2