In: Statistics and Probability
Consider the pressure drop data in Table B9.
a) Build a tentative regression model.
b) Perform a Box Cox.
c) Take out outliers using cooks distance.
d) Build new model on Box Cox result and on new subset.
e) Check residuals 3 parts.
Please do this problem in R. You can use below lines insert data in R.
x1<-c(2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,4.3,5.6,5.6,5.6
5.6,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,2.4,2.4,2.4,2.4,5.6,5.6,5.6,5.6,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15)
x2<-c(10,10,10,10,10,10,10,10,10,10,10,10,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,1.25,1.25,1.25,1.25,2.63,2.63,2.63,2.63,2.63,10.1,10.1,10.1,10.1,
10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,112,112,112,112,112,112,112,112,112)
x3
<-c(0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,
0.34,0.25,0.25,0.25,0.25,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.55,0.55,0.55,0.55,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34)
x4<-c(1,1,1,0.246,0.379,0.474,0.141,0.234,0.311,0.076,0.132,0.184,0.679,0.804,0.89,0.514,0.672,0.801,0.346,0.506,0.669,1,1,1,0.848,0.737,0.651,0.554,0.748,0.682,0.524,
0.472,0.398,0.789,0.677,0.59,0.523,0.789,0.677,0.59,0.523,0.741,0.617,0.524,0.457,0.615,0.473,0.381,0.32,0.789,0.677,0.59,0.523,0.68,0.803,0.889,0.514,0.672,0.801,0.306,
0.506,0.668)
y<-c(28.9,31,26.4,27.2,26.1,23.2,19.7,22.1,22.8,29.2,23.6,23.6,24.2,22.1,20.9,17.6,15.7,15.8,14,17.1,18.3,33.8,31.7,28.1,18.1,16.5,15.4,15,19.1,16.2,16.3,15.8,15.4,19.2,
8.4,15,12,21.9,21.3,21.6,19.8,21.6,17.3,20,18.6,22.1,14.7,15.8,13.2,30.8,27.5,25.2,22.8,41.7,33.7,29.7,41.8,37.1,40.1,42.7,48.6,42.4)
a)
x1<-c(2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,4.3,5.6,5.6,5.6,
5.6,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,2.4,2.4,2.4,2.4,5.6,5.6,5.6,5.6,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15)
x2<-c(10,10,10,10,10,10,10,10,10,10,10,10,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,1.25,1.25,1.25,1.25,2.63,2.63,2.63,2.63,2.63,10.1,10.1,10.1,10.1,
10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,112,112,112,112,112,112,112,112,112)
x3
<-c(0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,
0.34,0.25,0.25,0.25,0.25,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.55,0.55,0.55,0.55,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34)
x4<-c(1,1,1,0.246,0.379,0.474,0.141,0.234,0.311,0.076,0.132,0.184,0.679,0.804,0.89,0.514,0.672,0.801,0.346,0.506,0.669,1,1,1,0.848,0.737,0.651,0.554,0.748,0.682,0.524,
0.472,0.398,0.789,0.677,0.59,0.523,0.789,0.677,0.59,0.523,0.741,0.617,0.524,0.457,0.615,0.473,0.381,0.32,0.789,0.677,0.59,0.523,0.68,0.803,0.889,0.514,0.672,0.801,0.306,
0.506,0.668)
y<-c(28.9,31,26.4,27.2,26.1,23.2,19.7,22.1,22.8,29.2,23.6,23.6,24.2,22.1,20.9,17.6,15.7,15.8,14,17.1,18.3,33.8,31.7,28.1,18.1,16.5,15.4,15,19.1,16.2,16.3,15.8,15.4,19.2,
8.4,15,12,21.9,21.3,21.6,19.8,21.6,17.3,20,18.6,22.1,14.7,15.8,13.2,30.8,27.5,25.2,22.8,41.7,33.7,29.7,41.8,37.1,40.1,42.7,48.6,42.4)
lm_model=lm(y~x1+x2+x3+x4) # regression
model
lm_model
Call:
lm(formula = y ~ x1 + x2 + x3 + x4)
Coefficients:
(Intercept)
x1
x2
x3
x4
5.8945
-0.4779
0.1827
35.4028 5.8439
b)
library(MASS)
bc=boxcox(lm_model) # Box cox
library(MASS)
bc=boxcox(lm_model) # Box cox
c)
cooksd=cooks.distance(lm_model) # cooks distance
n=length(y) # sample size
plot(cooksd, pch="*", cex=2, main="Influential Obs by Cooks
distance") # plot cook's distance
abline(h = 4/n, col="red") # add cutoff line
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>4/n,
names(cooksd),""), col="red")
d)
trans=bc$x[which.max(bc$y)]
mnew=lm(((y^trans-1)/trans)~x1+x2+x3+x4)
Call:
lm(formula = ((y^trans - 1)/trans) ~ x1 + x2 + x3 + x4)
Coefficients:
(Intercept)
x1
x2
x3
x4
4.36691
-0.17633
0.06841
14.80114 2.33205
e)
qqnorm(lm_model$residuals)
qqline(lm_model$residuals)
qqnorm(mnew$residuals)
qqline(mnew$residuals)