In: Math
Please answer this using Rstudio
For the oyster data, calculate regression fits (simple regression) for the 2D and 3D data
a.1) Give null and alternative hypotheses
a.2) Fit the regression model
a.3) Summarize the fit and evaluation of the regression model (is the linear relationship significant).
a.4 )Calculate residuals and make a qqplot. Is the normal assumption reasonable?
Actual 2D 3D
13.04 47.907 5.136699
11.71 41.458 4.795151
17.42 60.891 6.453115
7.23 29.949 2.895239
10.03 41.616 3.672746
15.59 48.070 5.728880
9.94 34.717 3.987582
7.53 27.230 2.678423
12.73 52.712 5.481545
12.66 41.500 5.016762
10.53 31.216 3.942783
10.84 41.852 4.052638
13.12 44.608 5.334558
8.48 35.343 3.527926
14.24 47.481 5.679636
11.11 40.976 4.013992
15.35 65.361 5.565995
15.44 50.910 6.303198
5.67 22.895 1.928109
8.26 34.804 3.450164
10.95 37.156 4.707532
7.97 29.070 3.019077
7.34 24.590 2.768160
13.21 48.082 4.945743
7.83 32.118 3.138463
11.38 45.112 4.410797
11.22 37.020 4.558251
9.25 39.333 3.449867
13.75 51.351 5.609681
14.37 53.281 5.292105
a.1) Give null and alternative hypotheses
Let the estimated regression line is,
Actual = + X2D + X3D
Null Hypothesis H0:
Alternative Hypothesis H1: or
a.2) Fit the regression model
Loaded the above data into a dataframe df.
df = read.table("data.txt", header = TRUE)
Run the linear regression model on the given data with Actual as dependent variable.
model = lm(Actual ~ ., data = df)
The output of the model is,
> model
Call:
lm(formula = Actual ~ ., data = df)
Coefficients:
(Intercept) X2D X3D
-0.04645 0.06815 1.93979
a.3) Summarize the fit and evaluation of the regression model (is the linear relationship significant).
The estimated regression equation is,
Actual = -0.04645 + 0.06815 X2D + 1.93979 X3D
Generate the summary report of the model.
> summary(model)
Call:
lm(formula = Actual ~ ., data = df)
Residuals:
Min 1Q Median 3Q Max
-1.4490 -0.3333 -0.0215 0.3746 1.2475
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04645 0.44195 -0.105 0.91708
X2D 0.06815 0.02297 2.967 0.00623 **
X3D 1.93979 0.20216 9.595 3.42e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard error: 0.5737 on 27 degrees of
freedom
Multiple R-squared: 0.9651, Adjusted R-squared:
0.9625
F-statistic: 373.3 on 2 and 27 DF, p-value: < 2.2e-16
The p-value for F test is < 2.2e-16 which is less than the significance level. So, we reject H0 and conclude that there is significant evidence that or and the linear model is significant.
a.4 )Calculate residuals and make a qqplot. Is the normal assumption reasonable?
The residuals will be generated by the command model$residuals
> model$residuals
1 2 3 4 5 6 7 8 9
-0.14264365 -0.37059721 0.79889624 -0.38080399 0.11586169
1.24754228 -0.11466858 0.52507979 -1.44904467
10 11 12 13 14 15 16 17 18
0.14666218 0.80083412 0.17286799 -0.22161268 -0.72569721 0.03320704
0.57753467 0.14507837 -0.21006581
19 20 21 22 23 24 25 26 27
0.41597079 -0.75812123 -0.66744260 0.17888274 0.34093189 0.38584368
-0.40042976 -0.20406170 -0.09860046
28 29 30
-0.07620812 -0.58484561 0.51964982
qqplot will be generated as second plot of the command plot(model)
Since the data points form a linear line, the normality assumption seems reasonable.