Question

In: Statistics and Probability

Consider the pressure drop data in Table B9. a) Build a tentative regression model. b) Perform...

Consider the pressure drop data in Table B9.

a) Build a tentative regression model.

b) Perform a Box Cox.

c) Take out outliers using cooks distance.

d) Build new model on Box Cox result and on new subset.

e) Check residuals 3 parts.

Please do this problem in R. You can use below lines insert data in R.  

x1<-c(2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,4.3,5.6,5.6,5.6
5.6,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,2.4,2.4,2.4,2.4,5.6,5.6,5.6,5.6,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15)

x2<-c(10,10,10,10,10,10,10,10,10,10,10,10,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,1.25,1.25,1.25,1.25,2.63,2.63,2.63,2.63,2.63,10.1,10.1,10.1,10.1,
10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,112,112,112,112,112,112,112,112,112)

x3 <-c(0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,
0.34,0.25,0.25,0.25,0.25,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.55,0.55,0.55,0.55,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34)


x4<-c(1,1,1,0.246,0.379,0.474,0.141,0.234,0.311,0.076,0.132,0.184,0.679,0.804,0.89,0.514,0.672,0.801,0.346,0.506,0.669,1,1,1,0.848,0.737,0.651,0.554,0.748,0.682,0.524,
0.472,0.398,0.789,0.677,0.59,0.523,0.789,0.677,0.59,0.523,0.741,0.617,0.524,0.457,0.615,0.473,0.381,0.32,0.789,0.677,0.59,0.523,0.68,0.803,0.889,0.514,0.672,0.801,0.306,
0.506,0.668)


y<-c(28.9,31,26.4,27.2,26.1,23.2,19.7,22.1,22.8,29.2,23.6,23.6,24.2,22.1,20.9,17.6,15.7,15.8,14,17.1,18.3,33.8,31.7,28.1,18.1,16.5,15.4,15,19.1,16.2,16.3,15.8,15.4,19.2,
8.4,15,12,21.9,21.3,21.6,19.8,21.6,17.3,20,18.6,22.1,14.7,15.8,13.2,30.8,27.5,25.2,22.8,41.7,33.7,29.7,41.8,37.1,40.1,42.7,48.6,42.4)



Solutions

Expert Solution

a)

x1<-c(2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,4.3,5.6,5.6,5.6,
5.6,5.6,5.6,5.6,5.6,4.3,4.3,4.3,4.3,2.4,2.4,2.4,2.4,5.6,5.6,5.6,5.6,2.14,4.14,8.15,2.14,4.14,8.15,2.14,4.14,8.15)
x2<-c(10,10,10,10,10,10,10,10,10,10,10,10,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,2.63,1.25,1.25,1.25,1.25,2.63,2.63,2.63,2.63,2.63,10.1,10.1,10.1,10.1,
10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,112,112,112,112,112,112,112,112,112)
x3 <-c(0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,
0.34,0.25,0.25,0.25,0.25,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.55,0.55,0.55,0.55,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34,0.34)
x4<-c(1,1,1,0.246,0.379,0.474,0.141,0.234,0.311,0.076,0.132,0.184,0.679,0.804,0.89,0.514,0.672,0.801,0.346,0.506,0.669,1,1,1,0.848,0.737,0.651,0.554,0.748,0.682,0.524,
0.472,0.398,0.789,0.677,0.59,0.523,0.789,0.677,0.59,0.523,0.741,0.617,0.524,0.457,0.615,0.473,0.381,0.32,0.789,0.677,0.59,0.523,0.68,0.803,0.889,0.514,0.672,0.801,0.306,
0.506,0.668)
y<-c(28.9,31,26.4,27.2,26.1,23.2,19.7,22.1,22.8,29.2,23.6,23.6,24.2,22.1,20.9,17.6,15.7,15.8,14,17.1,18.3,33.8,31.7,28.1,18.1,16.5,15.4,15,19.1,16.2,16.3,15.8,15.4,19.2,
8.4,15,12,21.9,21.3,21.6,19.8,21.6,17.3,20,18.6,22.1,14.7,15.8,13.2,30.8,27.5,25.2,22.8,41.7,33.7,29.7,41.8,37.1,40.1,42.7,48.6,42.4)
lm_model=lm(y~x1+x2+x3+x4)     # regression model
lm_model

Call:
lm(formula = y ~ x1 + x2 + x3 + x4)

Coefficients:
(Intercept)           x1           x2           x3           x4
     5.8945      -0.4779       0.1827      35.4028       5.8439

b)

library(MASS)
bc=boxcox(lm_model)     # Box cox

library(MASS)
bc=boxcox(lm_model)     # Box cox

c)
cooksd=cooks.distance(lm_model) # cooks distance
n=length(y) # sample size
plot(cooksd, pch="*", cex=2, main="Influential Obs by Cooks distance") # plot cook's distance
abline(h = 4/n, col="red") # add cutoff line
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>4/n, names(cooksd),""), col="red")

d)

trans=bc$x[which.max(bc$y)]
mnew=lm(((y^trans-1)/trans)~x1+x2+x3+x4)

Call:
lm(formula = ((y^trans - 1)/trans) ~ x1 + x2 + x3 + x4)

Coefficients:
(Intercept)           x1           x2           x3           x4
    4.36691     -0.17633      0.06841     14.80114      2.33205


e)

qqnorm(lm_model$residuals)
qqline(lm_model$residuals)

qqnorm(mnew$residuals)
qqline(mnew$residuals)


Related Solutions

Use the “home” data to build a regression model that predicts market as a function of...
Use the “home” data to build a regression model that predicts market as a function of square feet. Give a 99% confidence interval for the value of one square foot. (select the closest answer) Home Market Value House Age Square Feet Market Value 33 1,812 $90,000.00 32 1,914 $104,400.00 32 1,842 $93,300.00 33 1,812 $91,000.00 32 1,836 $101,900.00 33 2,028 $108,500.00 32 1,732 $87,600.00 33 1,850 $96,000.00 32 1,791 $89,200.00 33 1,666 $88,400.00 32 1,852 $100,800.00 32 1,620 $96,700.00 32...
Use the “home” data to build a regression model that predicts market as a function of...
Use the “home” data to build a regression model that predicts market as a function of square feet. Is the coefficient for square feet significant at a .05 level? Home Market Value House Age Square Feet Market Value 33 1,812 $90,000.00 32 1,914 $104,400.00 32 1,842 $93,300.00 33 1,812 $91,000.00 32 1,836 $101,900.00 33 2,028 $108,500.00 32 1,732 $87,600.00 33 1,850 $96,000.00 32 1,791 $89,200.00 33 1,666 $88,400.00 32 1,852 $100,800.00 32 1,620 $96,700.00 32 1,692 $87,500.00 32 2,372 $114,000.00...
Using R and the data in the table below, perform the regression of D on C...
Using R and the data in the table below, perform the regression of D on C (i.e., report the regression equation). C D 3 2 6 7 8 5 9 4 1 0 3 4 Hint: The code to enter the vectors C and D into R is: C <- c(3, 6, 8, 9, 1, 3) D <- c(2, 7, 5, 4, 0, 4) You must figure out how to obtain the regression equation from R. Enter the code below...
How do you perform Hypothesis Testing on a regression model using an ANOVA table below. This...
How do you perform Hypothesis Testing on a regression model using an ANOVA table below. This is to show the significance of the 4 independent variables. ANOVA df SS MS F Significance F Regression 3 17643.17 5881.05 22.21 0.00001376 Residual 14 3706.59 264.75 Total 17 21349.76
1. In the iris data, build a linear regression model to predict Sepal.Length based on both...
1. In the iris data, build a linear regression model to predict Sepal.Length based on both Petal.Length and Species. a. Calculate the regression equation, including the interaction. b. From this equation, you should be able to find 3 regression lines (one for each Species). Interpret each of the 3 slopes of the lines in the context of the problem. Remember that both numerical variables are measured in centimeters. c. Plot the 3 regression lines in a scatterplot of Sepal.Length vs....
Consider a regression model of monthly time series data where we model the price of petrol...
Consider a regression model of monthly time series data where we model the price of petrol which is dependent on the Crude Oil price and Exchange rate (against US$). Data for the three variables were collected over a 50 month period. Suppose the estimation results showed that the Durbin-Watson (DW) test value d is 1.38. Perform the DW test for first order positive autocorrelation of the error terms at the 5% level of significance.               Model: et = r...
The maintenance manager at a trucking company wants to build a regression model to forecast the...
The maintenance manager at a trucking company wants to build a regression model to forecast the time (in years) until the first engine overhaul based on four explanatory variables: (1) annual miles driven (in 1,000s of miles), (2) average load weight (in tons), (3) average driving speed (in mph), and (4) oil change interval (in 1,000s of miles). Based on driver logs and onboard computers, data have been obtained for a sample of 25 trucks. A portion of the data...
The maintenance manager at a trucking company wants to build a regression model to forecast the...
The maintenance manager at a trucking company wants to build a regression model to forecast the time (in years) until the first engine overhaul based on four explanatory variables: (1) annual miles driven (in 1,000s of miles), (2) average load weight (in tons), (3) average driving speed (in mph), and (4) oil change interval (in 1,000s of miles). Based on driver logs and onboard computers, data have been obtained for a sample of 25 trucks. A portion of the data...
The maintenance manager at a trucking company wants to build a regression model to forecast the...
The maintenance manager at a trucking company wants to build a regression model to forecast the time (in years) until the first engine overhaul based on four explanatory variables: (1) annual miles driven (in 1,000s of miles), (2) average load weight (in tons), (3) average driving speed (in mph), and (4) oil change interval (in 1,000s of miles). Based on driver logs and onboard computers, data have been obtained for a sample of 25 trucks. A portion of the data...
The maintenance manager at a trucking company wants to build a regression model to forecast the...
The maintenance manager at a trucking company wants to build a regression model to forecast the time (in years) until the first engine overhaul based on four explanatory variables: (1) annual miles driven (in 1,000s of miles), (2) average load weight (in tons), (3) average driving speed (in mph), and (4) oil change interval (in 1,000s of miles). Based on driver logs and onboard computers, data have been obtained for a sample of 25 trucks. A portion of the data...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT