In: Statistics and Probability
Unusually high concentration of metals in drinking water can pose a health hazard. Twenty couples of data were taken from different locations of X Lake measuring zinc concentration in bottom water and surface water.
Table 1. Zinc Concentration (mg/l) in bottom and surface water in different regions and areas of X Lake
Location |
Region |
Area |
Zinc_Bottom |
Zinc_Surface |
Zinc_Mean |
TZ3_Mean |
Loc 1 |
North |
Area A |
0.51 |
0.49 |
0.50 |
2.67 |
Loc 2 |
North |
Area A |
0.46 |
0.45 |
0.46 |
2.45 |
Loc 3 |
North |
Area A |
0.52 |
0.52 |
0.52 |
2.66 |
Loc 4 |
North |
Area A |
0.51 |
0.50 |
0.50 |
2.56 |
Loc 5 |
North |
Area A |
0.58 |
0.57 |
0.58 |
3.03 |
Loc 6 |
North |
Area A |
0.57 |
0.55 |
0.56 |
3.21 |
Loc 7 |
North |
Area A |
0.59 |
0.58 |
0.59 |
3.50 |
Loc 8 |
North |
Area B |
0.46 |
0.44 |
0.45 |
2.40 |
Loc 9 |
North |
Area B |
0.46 |
0.44 |
0.45 |
2.35 |
Loc 10 |
North |
Area B |
0.45 |
0.43 |
0.44 |
2.56 |
Loc11 |
East |
Area B |
0.41 |
0.40 |
0.41 |
1.33 |
Loc 12 |
East |
Area B |
0.41 |
0.39 |
0.40 |
1.56 |
Loc 13 |
East |
Area B |
0.41 |
0.39 |
0.40 |
1.23 |
Loc 14 |
East |
Area B |
0.47 |
0.45 |
0.46 |
1.12 |
Loc 15 |
East |
Area C |
0.41 |
0.39 |
0.40 |
1.23 |
Loc 16 |
East |
Area C |
0.37 |
0.36 |
0.37 |
1.10 |
Loc 17 |
East |
Area C |
0.41 |
0.39 |
0.40 |
1.20 |
Loc 18 |
East |
Area C |
0.40 |
0.39 |
0.39 |
1.10 |
Loc 19 |
East |
Area C |
0.39 |
0.37 |
0.38 |
1.12 |
Loc 20 |
East |
Area C |
0.46 |
0.44 |
0.45 |
1.30 |
QESTION
Including “Region” factor as a dummy variable to your previous model, fit a multiple regression model to explain zinc concentration (Zinc_Mean). Obtain the ANOVA table of the regression. Comment on overall significance of the model and the coefficients of each variable. Do you think adding “Region” next to “TZ3_Mean” improved the model? explain
Here we are given the unusually high concentration of metals in drinking water which can cause a health hazard. 20 samples are given and we have to fit a multiple regression model to explain Zinc concentration by treating Region factor as a dummy variable.
We write 1 for north and 0 for east. Now we perform the following steps in MINITAB:
1. Enter the given values in different columns.
2. Go to “Stat” then “Regression” then “Regression” then “Fit regression model”.
3. Enter Zinc_Mean in “response” and Zinc_Bottom, Zinc_Surface, TZ3_Mean, Region in “continuous predictor”.
4. Click OK.
Hence we get the following output:
Analysis of Variance
Source
DF Adj SS Adj MS F-Value
P-Value
Regression 4 0.084499 0.021125
1613.75 0.000
Zinc_Bottom 1 0.000014
0.000014 1.08 0.316
Zinc_Surface 1 0.001185 0.001185
90.51 0.000
TZ3_Mean 1 0.000031
0.000031 2.38 0.144
Region 1 0.000022
0.000022 1.72 0.210
Error
15 0.000196 0.000013
Lack-of-Fit 14 0.000196
0.000014
* *
Pure Error 1 0.000000 0.000000
Total
19 0.084695
Coefficients
Term
Coef SE Coef T-Value P-Value VIF
Constant 0.01524
0.00863 1.77 0.098
Zinc_Bottom 0.0922
0.0888 1.04 0.316
49.49
Zinc_Surface 0.8657
0.0910 9.51 0.000
54.28
TZ3_Mean 0.00727
0.00472 1.54 0.144
21.90
Region -0.00720
0.00550 -1.31 0.210 11.54
Regression Equation
Zinc_Mean = 0.01524 + 0.0922 Zinc_Bottom + 0.8657 Zinc_Surface +
0.00727 TZ3_Mean
- 0.00720 Region
Interpretation of coefficients:
1) If 1 unit changes in Zinc_Bottom 0.0922 units increase in Zinc_Mean.
2) If 1 unit changes in Zinc_Surface 0.8657 units increase in Zinc_Mean.
3) If 1 unit changes in TZ3_Mean 0.00727 units increase in Zinc_Mean.
4) If 1 unit changes in Region 0.00720 units decrease in Zinc_Mean.
We know if the p value of the coefficients is <= 0.05 then they are significant.
Here, the pvalue of Zinc_Surface is only less than 0.05 so it is significant while the rest are insignificant.
So we can say overall the model is not very significant.
Now we run the regression excluding the region factor and run the following steps in MINITAB:
1. Enter the given values in different columns.
2. Go to “Stat” then “Regression” then “Regression” then “Fit regression model”.
3. Enter Zinc_Mean in “response” and Zinc_Bottom, Zinc_Surface, TZ3_Mean in “continuous predictor”.
4. Click OK.
The r square obtained is 0.9974
The r square obtained in the model including Region factor is 0.9977
Hence we can conclude that if the Region is included in the model it improves the model since the r square value is more.