In: Statistics and Probability
Researchers studying environmental exposures and their potential associations with lung cancer have conducted a study specifically investigating the relationship between exposure to asbestos and lung cancer. They retrospectively collected the following data. Lung Cancer No Lung Cancer Total Asbestos Exposure 153 201 354 No Asbestos Exposure 23 664 687 Total 176 865 1041
a) Perform a hypothesis test to determine if there is a relationship between asbestos exposure and lung cancer. Please use the p-value approach and use α=0.05. (8 pts)
b) Ignoring whether or not these people had exposure to asbestos, construct and interpret the 95% confidence interval for the proportion of people with lung cancer. (4 pts)
a) Null hypothesis : There is no association between asbestos exposure and lung cancer.
Alternative hypothesis : There is association between asbestos exposure and lung cancer.
Observed frequency table :
Lung Cancer | No Lung Cancer | Total | |
Asbestos exposure | 153 (a) | 201 (b) | 354 |
No Asbestos exposure | 23 (c) | 664 (d) | 687 |
Total | 176 | 865 | N = 1041 |
Expected frequency table :
[Ei = (row total*Colomn total)/ Overall total]
Lung Cancer | No Lung Cancer | Total | |
Asbestos exposure |
= 60 |
= 294 |
354 |
No Asbestos exposure |
= 116 |
= 571 |
687 |
Total | 176 | 865 | 1041 |
Test statistic is given by -
Oi | Ei | |
153 201 23 664 |
60 294 116 571 |
145.0 29.5 74.7 15.2 |
So, the value of the test statistic will be -
Degrees of freedom = (no. of row - 1) * (no. of Colomn - 1) =1*1 = 1 
P-value = = This is the area under the chi square curve with one degrees of freedom on the right side of chi square = 264.4. It can be obtained from the Excel using the formula =CHIDIST(264.4,1) and we will get a value close to zero, say P value is < 0.00001
Since, p value < 0.05, we may reject the null hypothesis at 0.05 level of significance.
So, we conclude that there is association between the asbestos exposure and lung cancer.
b) Sample proportion of people with lung cancer p = 176/1041 = 0.17
Sample size = N = 1041
Level of significance = 0.05
Critical value of z = 1.96
So, 95% confidence interval for population proportion of people with lung cancer (P) is given by :
= [0.117, 0.193]
Hence, we are 95% confident that the Population proportion of people with lung cancer will lie between 0.117 and 0.193.