In: Statistics and Probability
R.A Fisher advanced the "constitutional hypothesis:" there is a genetic factor that disposes you both to smoke and die. To refute Fisher's idea, the epidemiologists used twins studies. They identified sets of smoking-discordant monozygotic twin pairs. Now there is a race. Which twin dies first, the smoker or non-smoker? Data from the Finnish twin study are shown below.
Data from Finnish Twin Study
Smokers | Non-smokers | |
---|---|---|
All Causes | 17 | 5 |
Coronary Heart Diseases | 9 | 0 |
Lung Cancer | 2 | 0 |
According to the first line of the table, there were 22 smoking-discordant monozygotic twin pairs where at least one twin of the pair died. In 17 cases the smoker died first; in 5 cases the non-smoker died first. According to the second line, there were 9 pairs where at least one twin died of coronary heart disease; in all 9 cases, the smoker died first. According to the last line, there were 2 pairs where at least one twin died from lung cancer, and in both pairs, the smoker won the race to death. (Lung cancer is a rare disease, even among smokers.)
For parts (a-c), suppose that each twin in the pair is equally likely to die first, so the number of pairs in which the smoker dies first is like the number of heads in coin-tossing.
(a) On this basis, what is the chance of having 17 or more pairs out of 22 where the smoker dies first?
(b) Repeat the test in part (a), for the 9 deaths from coronary heart disease.
(c) Repeat the test in part (a), for the 2 deaths from lung cancer.
(d) Can the difference between the death rates for smoking and non-smoking twins be explained by ...
(i) chance?
(ii) genetics?
(iii) health effects of smoking?
a)
The sample size of 22 will have a binomial distribution with parameters n=22 and p=0.5. The probability is obtained in excel using the function =BINOM.DIST(number_s, trials, probability_s, cumulative).
The screenshot is shown below,
P(X>=17) = 1 - P(X<=16)
P(X>=17) = 0.00845
b)
The sample size of 9 will have a binomial distribution with parameters n=9 and p=0.5. The probability is obtained in excel. The screenshot is shown below,
P(X=9) = 0.0020
c)
The sample size of 2 will have a binomial distribution with parameters n=2 and p=0.5. The probability is obtained in excel. The screenshot is shown below,
d)
Since the probability for the observed death is very small we can say that the difference between the death rates for smoking and non-smoking twins can not be explained by chance. Which means there may be genetic influence or health effects of smoking responsible for the difference between the death rates for smoking and non-smoking.