In: Statistics and Probability
"FATALS","CUTTING"
270,15692
183,16198
319,17235
103,18463
149,18959
124,19103
62,19618
298,20436
330,21229
486,18660
302,17551
373,17466
187,17388
347,15261
168,14731
234,14237
68,13216
162,12017
27,11845
40,11905
26,11881
41,11974
116,11892
84,11810
43,12076
292,12342
89,12608
148,13049
166,11656
32,13305
72,13390
27,13625
154,13865
44,14445
3,14424
3,14315
153,13761
11,12471
9,10960
17,9218
2,9054
5,9218
63,8817
41,7744
10,6907
3,6440
26,6021
52,5561
31,5309
3,5320
19,4784
10,4311
12,3663
88,3060
0,2779
41,2623
2,2058
5,1890
2,1535
0,1515
0,1595
23,1803
4,1495
0,1432
The le hmw7 prob3.txt contains data on the following two
variables
FATALS: the annual number of fatalities from gas and dust
explosions in coal mines for
years 1915 to 1978.
CUTTING: the number of cutting machines in use
(a) Fit the regression model using FATALS as the dependent variable
and CUTTING as
the independent variable.
(b) Using appropriate residual plots and formal tests, investigate
the violation of any
assumptions. Do any assumptions of the linear regression model
appear to be violated?
If so, which one (or ones)? (Hint: Plot of residuals versus tted
values can be used for
linearity, zero mean, and constant variance. Normal probability
plot of the residuals can
be used for normality. We also have formal tests for the constant
variance and normality
assumptions that you can do in R).
Sol:
RCODE:
fatalaties <-
read.csv("C:/Users/M1045151/Downloads/fatalaties.txt",
comment.char="#")
View(fatalaties)
reg <- lm(fatalaties$FATALS~fatalaties$CUTTING)
coefficients(reg)
summary(reg)
Output:
(Intercept) fatalaties$CUTTING
-47.70922903 0.01343187
Regression eq is
FATALS= -47.70922903 + 0.01343187 *CUTTING
Solutionb:
par(mfrow=c(2,2))
plot(reg)
From Residuals vs Fitted: non random pattern is not observed.constant variance assumption is violated
Residuals are normally distributed since residuals points follow the straight dashed line.
From residuals v leverageplot we see outliers that influence the regression.
Homogenity of variance assumption is violated.