In: Statistics and Probability
#question1
library("olsrr")
##
## Attaching package: 'olsrr'
## The following object is masked from
'package:datasets':
##
## rivers
bass =
read.csv("bass.csv", header = T)
attach(bass)
bass.lm = lm(Avg_Mercury ~
Alkalinity + pH + Calcium
+ Chlorophyll, data = bass)
bass.all= ols_step_all_possible(bass.lm)
bass.all
## Index
N Predictors R-Square
Adj. R-Square
## 1 1
1 Alkalinity
0.4254905 0.4142256
## 2 2
1 pH
0.3310853 0.3179693
## 3 3
1 Calcium
0.2386129 0.2236838
## 4 4
1 Chlorophyll
0.2130176 0.1975865
## 6 5
2 Alkalinity
Calcium 0.4478582 0.4257726
## 7 6
2 Alkalinity
Chlorophyll 0.4436411 0.4213868
## 5 7
2 Alkalinity
pH 0.4292584 0.4064287
## 9 8
2 pH
Chlorophyll 0.3444788 0.3182580
## 8 9
2 pH
Calcium 0.3348995 0.3082955
## 10 10
2 Calcium
Chlorophyll 0.3009248 0.2729618
## 13 11 3 Alkalinity
Calcium Chlorophyll
0.4705171 0.4380997
## 11 12
3 Alkalinity
pH Calcium 0.4576077 0.4244001
## 12 13
3 Alkalinity
pH Chlorophyll
0.4436478 0.4095855
## 14 14
3 pH
Calcium Chlorophyll
0.3484270 0.3085347
## 15 15 4 Alkalinity pH Calcium Chlorophyll
0.4719492 0.4279450
## Mallow's Cp
## 1 3.223111
## 2 11.804576
## 3 20.210347
## 4 22.536973
## 6 3.189877
## 7 3.573211
## 5 4.880607
## 9 12.587099
## 8 13.457860
## 10 16.546176
## 13 3.130182
## 11 4.303642
## 12 5.572602
## 14 14.228213
## 15 5.000000
plot(bass.all)
detach(bass)
a. Give the ??2 and Adjusted??2 for the best models with one, two, three, and four predictors. Comment on these results (include the variables involved.)
b. Suppose that you want to predict the average mercury level of fish in a new lake with alkalinity 3.0, calcium 2.5, chlorophyll 2.5, and pH 6.0. The predicted value for the model including all four predictors is .545 (.0164, 1.073) [mean (PI).] The predicted value for the model including only alkalinity, calcium, and chlorophyll is .532 (.0133, 1.051). Have the predicted values and the prediction intervals changed considerably between the two models? Explain why or why not (based on the inspection of these results.)
c. Explain how your results of a) and b) agree.
a)
The ?2 and Adjusted ?2 for the best models with one predictors is
Alkalinity 0.4254905 0.4142256
The ?2 and Adjusted ?2 for the best models with two predictors is
Alkalinity Calcium 0.4478582 0.4257726
The ?2 and Adjusted ?2 for the best models with three predictors is
Alkalinity Calcium Chlorophyll 0.4705171 0.4380997
The ?2 and Adjusted ?2 for the best models with four predictors is
Alkalinity pH Calcium Chlorophyll 0.4719492 0.4279450
The Adjusted ?2 of models with three predictors (Alkalinity, Calcium, Chlorophyll ) is the highest. So, the best models to predict mercury levels is model with predictors Alkalinity, Calcium, Chlorophyll.
b)
The predicted value for the model including all four predictors is .545 (.0164, 1.073)
The predicted value for the model including only alkalinity, calcium, and chlorophyll is .532 (.0133, 1.051)
The prediction intervals for both the models overlap. Thus, the predicted values and the prediction intervals does not changed considerably between the two models.
c.
As per results in part (a), the ?2 of model with three predictors (Alkalinity, Calcium, Chlorophyll ) and model with four predictors (Alkalinity, pH, Calcium, Chlorophyll ) are almost same. This is in agreement to part (b), where the prediction intervals of model with three and four predictors are not significantly different.