Question

In: Statistics and Probability

Use multiple regression analysis to study the variation in mercury concentration in largemouth bass living in...

  1. Use multiple regression analysis to study the variation in mercury concentration in largemouth bass living in Florida lakes. The data (bass.csv on Canvas) come from a study of 53 lakes in Florida sampled from the summer of 1990 to the spring of 1991 (Lange, Royals, and Connor 1993). During this time, samples of water were taken from the lakes and the follows factors were measured: pH, alkalinity, amount of chlorophyll from suspended plant matter, and the concentration of calcium. At the same time, fish were caught, and their flesh was tested for mercury levels. The response variable is the average mercury level in the flesh of bass in each of the 53 lakes (avg_mercury). You will use this data to determine if the level of mercury in the fish can be predicted based on the water chemistry. Import the data and fit the multiple regression model, fitting all possible models.

#question1
library("olsrr")

##
## Attaching package: 'olsrr'

## The following object is masked from 'package:datasets':
##
##     rivers

bass = read.csv("bass.csv", header = T)
attach(bass)
bass.lm = lm(Avg_Mercury ~ Alkalinity + pH + Calcium + Chlorophyll, data = bass)
bass.all= ols_step_all_possible(bass.lm)
bass.all

##    Index N                        Predictors  R-Square Adj. R-Square
## 1      1 1                        Alkalinity 0.4254905     0.4142256
## 2      2 1                                pH 0.3310853     0.3179693
## 3      3 1                           Calcium 0.2386129     0.2236838
## 4      4 1                       Chlorophyll 0.2130176     0.1975865
## 6      5 2                Alkalinity Calcium 0.4478582     0.4257726
## 7      6 2            Alkalinity Chlorophyll 0.4436411     0.4213868
## 5      7 2                     Alkalinity pH 0.4292584     0.4064287
## 9      8 2                    pH Chlorophyll 0.3444788     0.3182580
## 8      9 2                        pH Calcium 0.3348995     0.3082955
## 10    10 2               Calcium Chlorophyll 0.3009248     0.2729618
## 13    11 3    Alkalinity Calcium Chlorophyll 0.4705171     0.4380997
## 11    12 3             Alkalinity pH Calcium 0.4576077     0.4244001
## 12    13 3         Alkalinity pH Chlorophyll 0.4436478     0.4095855
## 14    14 3            pH Calcium Chlorophyll 0.3484270     0.3085347
## 15    15 4 Alkalinity pH Calcium Chlorophyll 0.4719492     0.4279450
##    Mallow's Cp
## 1     3.223111
## 2    11.804576
## 3    20.210347
## 4    22.536973
## 6     3.189877
## 7     3.573211
## 5     4.880607
## 9    12.587099
## 8    13.457860
## 10   16.546176
## 13    3.130182
## 11    4.303642
## 12    5.572602
## 14   14.228213
## 15    5.000000

plot(bass.all)

detach(bass)

a. Give the ??2 and Adjusted??2 for the best models with one, two, three, and four predictors. Comment on these results (include the variables involved.)

b. Suppose that you want to predict the average mercury level of fish in a new lake with alkalinity 3.0, calcium 2.5, chlorophyll 2.5, and pH 6.0. The predicted value for the model including all four predictors is .545 (.0164, 1.073) [mean (PI).] The predicted value for the model including only alkalinity, calcium, and chlorophyll is .532 (.0133, 1.051). Have the predicted values and the prediction intervals changed considerably between the two models? Explain why or why not (based on the inspection of these results.)

c. Explain how your results of a) and b) agree.

Solutions

Expert Solution

a)

The ?2 and Adjusted ?2 for the best models with one predictors is

Alkalinity 0.4254905     0.4142256

The ?2 and Adjusted ?2 for the best models with two predictors is

Alkalinity Calcium 0.4478582     0.4257726

The ?2 and Adjusted ?2 for the best models with three predictors is

Alkalinity Calcium Chlorophyll  0.4705171     0.4380997

The ?2 and Adjusted ?2 for the best models with four predictors is

Alkalinity pH Calcium Chlorophyll 0.4719492     0.4279450

The Adjusted ?2 of models with three predictors (Alkalinity, Calcium, Chlorophyll ) is the highest. So, the best models to predict mercury levels is model with predictors Alkalinity, Calcium, Chlorophyll.

b)

The predicted value for the model including all four predictors is .545 (.0164, 1.073)

The predicted value for the model including only alkalinity, calcium, and chlorophyll is .532 (.0133, 1.051)

The prediction intervals for both the models overlap. Thus, the predicted values and the prediction intervals does not changed considerably between the two models.

c.

As per results in part (a), the ?2 of model with three predictors (Alkalinity, Calcium, Chlorophyll ) and model with four predictors (Alkalinity, pH, Calcium, Chlorophyll ) are almost same. This is in agreement to part (b), where the prediction intervals of model with three and four predictors are not significantly different.


Related Solutions

You used multiple linear regression analysis to predict community reintegration (Reintegration to Normal Living Index RNLI;...
You used multiple linear regression analysis to predict community reintegration (Reintegration to Normal Living Index RNLI; interval scale) from depression (Geriatric Depression Scale or GDS; interval scale) and balance (Berg balance scale; interval scale) in a sample of 200 individuals with stroke. The results are as follows:                                         Model Summary R R Square Adjusted R Square Std. Error of the Estimate .670 .449 .431 14.40081 Predictors: (Constant), berg, depression                                                            ANOVA(c) Sum of Squares df Mean Square F Sig. Regression 10156.489 2...
Multiple regression analysis was used to study how an individual's income (Y in thousands of dollars)...
Multiple regression analysis was used to study how an individual's income (Y in thousands of dollars) is influenced by age (X1 in years), level of education (X2 ranging from 1 to 5), and the person's gender (X3 where 0 =female and 1=male). The following is a partial result of a computer program that was used on a sample of 20 individuals. Coefficient    Standard Error              X1 0.6251 0.094              X2 0.9210 0.190              X3 -0.510 0.920 Analysis of Variance...
Multiple regression analysis was used to study the relationship between a dependent variable, y, and four...
Multiple regression analysis was used to study the relationship between a dependent variable, y, and four independent variables; x1, x2, x3, and x4. The following is a partial result of the regression analysis involving 31 observations. Coefficients Standard Error Intercept 18.00 6.00 x1 12.00 8.00 x2 24.00 48.00 x3 -36.00 36.00 x4 16.00 2.00 ANOVA df SS MS F Regression 125 Error Total 760 a) Compute the multiple coefficient of determination. b) Perform a t test and determine whether or...
The standard project is to use multiple regression analysis to analyze a data set. The data...
The standard project is to use multiple regression analysis to analyze a data set. The data set is a study of student persistent enrolling in the next semester based on Gender, Age, GPA, a 22 questionnaire on self-efficacy, and student enrollment status. The educational researcher wants to study the relationship between student enrollment status as it relates to gender, age, GPA, and the total response to a 22 questionnaire survey. a. The estimated multiple regression analysis equation. b. Does the...
i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as...
i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as the response variable and Helpfulness and Clarity as the explanatory variables. Write down the corresponding coefficient estimates and provide the regression output. j. Perform an F-test for the overall usefulness of the model in part i) using a 5% significance level. Make sure you follow all the steps for hypothesis testing indicated in the Instructions section and clearly state your conclusion. k. Test manually...
How would you differentiate among multiple discriminant analysis, regression analysis, logistic regression analysis, and analysis of...
How would you differentiate among multiple discriminant analysis, regression analysis, logistic regression analysis, and analysis of variance and demonstrate statistical significance for each?
Summarize a business scenario relevant to your field of study and discuss how multiple regression analysis...
Summarize a business scenario relevant to your field of study and discuss how multiple regression analysis can be used in the scenario. Explain what type of data you could collect and how you could use that output to make business decisions.
A student used multiple regression analysis to study how family spending (y) is influenced by income...
A student used multiple regression analysis to study how family spending (y) is influenced by income (x1), family size (x2), and additions to savings (x3). The variables y, x1, and x3 are measured in thousands of dollars. The following results were obtained. Anova df ss regression 3 45.9634 residual 11 2.6218 Total coefficient Standard error intercept 0.0136 x1 0.7992 0.074 x2 0.2280 0.190 x3 -0.5796 0.920 Calculate the estimated regression equation for the relationship between the variables,coefficient of determination. What...
A student used multiple regression analysis to study how family spending (y) is influenced by income...
A student used multiple regression analysis to study how family spending (y) is influenced by income (x1), family size (x2), and additionsto savings(x3). The variables y, x1, and x3 are measured in thousands of dollars. The following results were obtained. ANOVA df SS Regression 3 45.9634 Residual 11 2.6218 Total Coefficients Standard Error Intercept 0.0136 x1 0.7992 0.074 x2 0.2280 0.190 x3 -0.5796 0.920 a. Write out the estimated regression equation for the relationship between the variables. (1 mark) b....
Assume you are a business manager that's considering the use of multiple regression analysis to gather...
Assume you are a business manager that's considering the use of multiple regression analysis to gather data about what impacts consumer demand. 1. What must you do to make sure this tool is implemented wisely so it will provide useful information? 2. What specific kind of data does multiple regression analysis provide, and what makes such data change? 3. What is a basic time limitation of multiple regression analysis, and why is there such a limitation?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT