In: Statistics and Probability
Studying dose response is central to determining ”safe” and ”hazardous” levels and dosages for potential pollutants. These conclusions are often the basis for environmental policy. The U.S. Environmental Protection Agency has developed extensive guidance and reports on dose-response modeling and assessment. In this problem, we study the relationship between the level of microplastics (considered a pollutant) in fresh water and stress response in freshwater mussels (with a higher level of stress indicating shorter survival times).
(a) Go to the course webpage and under Datasets, download the CSV file “exposure.csv” and follow the accompanying Minitab instructions. Copy and paste the Fitted Line Plots and the Residual Plots in a blank document. Print these out and attach them to your homework.
(b) What is the fitted equation before taking observation #21 out? What is the fitted equation after taking observation #21 out?
(c) What is the R2 for the data set before observation #21 was taken out? What is the R2 after this observation was taken out? Comment briefly on what this means.
(d) Comment on the residual plots and the Fitted Line plot for the regression model with observation #21 included and the residual plots and the Fitted Line plot for the regression model with obs. #21 taken out. What has changed after removing this point?
exposure.csv
MicroPlastic,Stress
0.1,0.07
0.45401,4.1673
1.09765,6.5703
1.27936,13.815
2.20611,11.4501
3.50064,12.9554
4.0403,20.1575
5.23583,17.5633
6.45308,26.0317
7.1699,22.7573
8.28474,26.303
9.59238,30.6885
10.92091,33.9402
11.66066,30.9228
12.79953,34.11
13.97943,44.4536
14.41536,46.5022
15.71607,50.0568
16.70156,46.5475
17.16463,45.7762
18.8234,2.3253
I can do this question in R
When we take the 21 observation
MicroPlastic=c(0.1,0.45401,1.09765,1.27936,2.20611,3.50064,4.0403,5.23583,6.45308,7.1699,8.28474,9.59238,10.92091,11.66066,12.79953,13.97943,
14.41536, 15.71607, 16.70156,17.16463,18.8234)
Stress=c(0.07,4.1673,6.5703,13.815,11.4501,12.9554,20.1575,17.5633,26.0317,22.7573,26.303,30.6885,33.9402,30.9228,34.11,44.4536,46.5022,50.0568,46.5475,45.7762,2.3253)
regm=lm(Stress~MicroPlastic, data=faithful)
res=resid(regm)
summary(regm)
Call:
lm(formula = Stress ~ MicroPlastic, data = faithful)
Residuals:
Min 1Q
Median 3Q
Max
-42.035 -1.463 1.886 4.555 11.577
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
8.738 4.305 2.030
0.056617 .
MicroPlastic 1.893
0.410 4.615 0.000189 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.19 on 19 degrees of freedom
Multiple R-squared: 0.5286, Adjusted R-squared:
0.5037
F-statistic: 21.3 on 1 and 19 DF, p-value: 0.0001888
par(mfrow=c(2,1))
plot(MicroPlastic,res, ylab="Residuals", xlab="MicroPlastic",
main="residual Plot")
abline(0,
0)
plot(MicroPlastic,Stress, main="Fitted plot")
abline(regm, lwd=2)
When we ignore the 21st observations, then
MicroPlastic=c(0.1,0.45401,1.09765,1.27936,2.20611,3.50064,4.0403,5.23583,6.45308,7.1699,8.28474,9.59238,10.92091,11.66066,12.79953,13.97943,
14.41536, 15.71607, 16.70156,17.16463)
Stress=c(0.07,4.1673,6.5703,13.815,11.4501,12.9554,20.1575,17.5633,26.0317,22.7573,26.303,30.6885,33.9402,30.9228,34.11,44.4536,46.5022,50.0568,46.5475,45.7762)
regm=lm(Stress~MicroPlastic, data=faithful)
res=resid(regm)
summary(regm)
Call:
lm(formula = Stress ~ MicroPlastic, data = faithful)
Residuals:
Min 1Q
Median 3Q
Max
-5.2800 -1.9875 -0.6429 3.2605 5.3999
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
5.090 1.327 3.837 0.00121
**
MicroPlastic 2.599
0.134 19.395 1.64e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.379 on 18 degrees of freedom
Multiple R-squared: 0.9543, Adjusted R-squared:
0.9518
F-statistic: 376.2 on 1 and 18 DF, p-value: 1.636e-13
par(mfrow=c(2,1))
plot(MicroPlastic,res, ylab="Residuals", xlab="MicroPlastic",
main="residual Plot")
abline(0,
0)
plot(MicroPlastic,Stress, main="Fitted plot")
abline(regm, lwd=2)
When we taken 21st observations, then the data haven one outlier which converted the series non linear.