In: Statistics and Probability
In a study of Vietnam veterans who were exposed to Agent Orange, a herbicide defoliant used during the Vietnam war, 20 Vietnam veterans were randomly chosen and blood and fat tissue samples were taken from each one of them. The TCDD (a dioxin) levels, measured in ppm, was recorded for the blood plasma and in the fat tissues samples. Data are found in the file TCDDFat.
TCDDFat:
Veteran |
TCDD levels in Blood Plasma |
TCDD level in Fat Tissue |
1 |
2.5 |
4.9 |
2 |
3.1 |
5.9 |
3 |
2.1 |
4.4 |
4 |
3.5 |
6.9 |
5 |
3.1 |
7.0 |
6 |
1.8 |
4.2 |
7 |
6.0 |
10.0 |
8 |
3.0 |
5.5 |
9 |
36.0 |
41.0 |
10 |
4.7 |
4.4 |
11 |
6.9 |
7.0 |
12 |
3.3 |
2.9 |
13 |
4.6 |
4.6 |
14 |
1.6 |
1.4 |
15 |
7.2 |
7.7 |
16 |
1.8 |
1.1 |
17 |
20.0 |
11.0 |
18 |
2.0 |
2.5 |
19 |
2.5 |
2.3 |
20 |
4.1 |
2.5 |
(Note for the non-chemists…maybe all of you! The main ingredients of Agent Orange comprise an equal mixture of two phenoxyl herbicides – 2,4-dichlorophenoxyacetic acid (2,4-D) and 2,4,5-trichlorophenoxyacetic acid (2,4,5-T), and in the manufacturing process of those chemicals they may become contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD)….and that is my chemistry lesson for today!).
a) Is the sampling method dependent or independent?
b) State the null and alternate hypothesis we would use to test whether there is any difference in the TCDD level in the blood plasma and the fat tissues.
c) Calculate the test statistic. Show calculation.
d) Run the appropriate test in Minitab and show output. What is the p-value for the test?
e) Assuming = 0.05, what are your conclusions for these data?
f) What assumptions are required for the test results to be valid? Have we met those assumptions? Prove it with appropriate plots.
g) Is there a correlation between TCDD levels in blood plasma and fat tissues? Prove with an appropriate plot and a correlation coefficient.
a) The individuals are selected randomly but for each individual two measures are observed. Therefore, observations for each individual are expected to be dependent.
b) Null hypothesis: Mean TCDD level in the blood plasma and the fat tissues in the population are same
.Alternative hypothesis: Mean TCDD level in the blood plasma and the fat tissues in the population are different
c) Under the assumption of normality, appropriate test is paired sample t test. Test statistic is
T=sqrt(n)(md-0)/sd , where n=sample size =20,
md=mean of the difference (Blood-Fat tissue), sd=sample variance of the difference.
We calculate md=-.870, sd=2.977 and observed T=-1.306939
d) MINITAB output
Paired T-Test and CI: TCDD levels in Blood Plasma, TCDD level in Fat Tissue
Paired T for TCDD levels in Blood Plasma - TCDD level in Fat Tissue
N Mean StDev SE Mean
TCDD levels in Blood Pla 20 5.99 8.13 1.82
TCDD level in Fat Tissue 20 6.86 8.47 1.89
Difference 20 -0.870 2.977 0.666
95% CI for mean difference: (-2.263, 0.523)
T-Test of mean difference = 0 (vs ≠ 0): T-Value = -1.31 P-Value =
0.207
The p value is .207
e) Since p value is more than .05, evidence is not sufficient to reject the null. Thus Mean TCDD level in the blood plasma and the fat tissues in the population are not significantly different.
f) We require normality of the differences. We provide the probability plot below:
The plot shows most of the points are not lying on the central line and hence normality is not justified. We also provide a boxplot and observe a negatively skewed distribution for the difference with one outlier. Thus normality assumption seems not valid.
g) Now we provide scatterplot. Pearson correlation between TCDD levels in Blood Plasma and TCDD level in Fat Tissue comes out as 0.936. Hence a high positive linear association is observed. The plot also suggests this though unusual observations are seen.