In: Statistics and Probability
Imagine that we conducted a survey and asked individuals about their education and their smoking habits. The results are below. Are these two variables correlated? If so, what are two potential confounding variables that would explain a statistically significant correlation between education and smoking habits? Explain.
LESS EDUCATED |
MORE EDUCATED |
|
HEAVY SMOKER |
15 |
3 |
LIGHT SMOKER |
40 |
17 |
NON SMOKER |
48 |
74 |
Claim: There is a correlation between education and smoking habits.
The null and alternative hypothesis is
H0: There is no correlation between education and smoking habits.
H1: There is a correlation between education and smoking habits.
Level of significance = 0.05
Test statistic is
O: Observed frequency
E: Expected frequency.
E = ( Row total*Column total) / Grand total
MORE EDUCATED | |||
LESS EDUCATED | Total | ||
HEAVY SMOKER | 15 | 3 | 18 |
LIGHT SMOKER | 40 | 17 | 57 |
NON SMOKER | 48 | 74 | 122 |
Total | 103 | 94 | 197 |
O | E | (O-E) | (O-E)^2 | (O-E)^2/E |
15 | 9.411168 | 5.588832 | 31.23505 | 3.318935 |
3 | 8.588832 | -5.58883 | 31.23505 | 3.636705 |
40 | 29.80203 | 10.19797 | 103.9986 | 3.489648 |
17 | 27.19797 | -10.198 | 103.9986 | 3.823763 |
48 | 63.7868 | -15.7868 | 249.2231 | 3.907127 |
74 | 58.2132 | 15.7868 | 249.2231 | 4.281213 |
Total | 22.457 |
Degrees of freedom = ( Number of rows - 1 ) * ( Number of column - 1) = ( 3 - 1) * (2 - 1) = 2 * 1 = 2
Critical value = 5.991
( From chi-square table)
Test statistic > critical value we reject null hypothesis.
Conclusion:
There is a correlation between education and smoking habits.