Question

In: Statistics and Probability

The data file glakes.txt on the textbook website provides the cargo volume (in tons) and the...

The data file glakes.txt on the textbook website provides the cargo volume (in tons) and the time (in hours) it takes to load and unload the cargo for each of the 31 ships that docked at a Canadian port on the Great Lakes. Note that the variable measuring the cargo volume is called “Tonnage” in the data file. We are interested in learning how Tonnage affects the Time.

(a). Fit a simple linear regression model ( Time ? Tonnage ) and provide the summary ˆ = ˆ 0 + ? ˆ 1 output of the model.

(b). Draw the four default diagnostic plots for the model in (a). Does this model seem to fit the data well? If not, list any problems you see.

(c). To improve the model, we consider transforming the variables. First, we consider only transforming the response variable by using the R function boxcox(y~x, data = dataName) from the MASS library.

(1) What transformation does this boxcox plot suggest? Is it a transformation of the response variable or predictor variable?

(2) What is the goal of applying the transformation in (1)? Please select one of the following answers.

A: Make the distribution of transformed Time as close to a normal distribution as possible.

B: Make the distribution of transformed Tonnage as close to a normal distribution as possible.

C: Make the conditional distribution of transformed Time given Tonnage as close to a normal distribution as possible.

(3) Fit a model with transformed variable(s) and provide the summary output of the model.

(4) Draw the four default diagnostic plots for the model in (3). Does this model seem to fit the data well? If not, list any problems you see.

(d) Now we consider transforming both the response variable and the predictor variable.

(1) Use R function boxcox(x~1, data = dataName) to identify an appropriate transformation for the predictor variable.

(2) What is the goal of applying the transformation in (1)? Please select one of the following answers. A: Make the distribution of transformed Time as close to a normal distribution as possible. B: Make the distribution of transformed Tonnage as close to a normal distribution as possible. C: Make the conditional distribution of transformed Time given Tonnage as close to a normal distribution as possible.

(3) Use the inverse response plot to identify an appropriate transformation for the response variable. Hint: First fit a new model with the transformed predictor variable. Then use the R function invResPlot(newModelName) or inverseResponsePlot(newModelName) from the car library to draw the inverse response plot. You may also load the alr4 library instead, since loading the alr4 library automatically loads the car library.

(e) Fit a model with transformed predictor and response variable and provide the summary output of the model. Use the power transformations identified in (d).

(f) Draw the four default diagnostic plots for the model in (e). Does this model seem to fit the data well? If not, list any problems you see.

(g) Interpret the slope of the model in (e) in terms of the percentage change of the variables.

dataset:

glakes.txt:

Case    Tonnage Time
1       2213    17
2       3256    30
3       12203   68
4       7021    64
5       529     11
6       3192    55
7       547     20
8       4682    49
9       6112    69
10      5375    68
11      6666    49
12      3930    43
13      4263    31
14      1849    17
15      663     13
16      329     13
17      2790    43
18      353     15
19      2829    30
20      363     20
21      7084    41
22      1328    15
23      294     13
24      268     11
25      1732    24
26      507     11
27      1486    28
28      536     22
29      851     9
30      6760    43
31      15900   131

Part a,b,c and d are not needed. Not sure if this is too much for a single question so just as much as possible.

Solutions

Expert Solution

Following is the R-code to get the output for e,f,g. The codes are well documented and you will understand what lambda is taken to do the power transformation of both predictor and response variable given as hint in (d)

#######################
## Glakes Data ###
#######################
library(MASS)
library(car)
bc<-boxcox(Tonnage~1, data = glakes)
(lambda <- bc$x[which.max(bc$y)])
#Lambda Comes out to be 0.06
# Applying this lambda to make a transformed predictor variable
glakes$Tonnage.tran<-(glakes$Tonnage)^lambda
# Fit lm with the transformed predictor and untransformed response
Model<-lm(Time~Tonnage.tran,data=glakes)
# Inverse response plot
inv<-car::invResPlot(Model)
# Lambda=0.01 gives smallest RSS
# 0.01 is appropriate for response
# Applying this lambda to make a transformed response variable
glakes$Time.tran<-(glakes$Time)^0.01
# fit the transformed model
tran.model<-lm(Time.tran~Tonnage.tran,data=glakes)
tran.model
#Diagnostic plot
par(mfrow=c(2,2)) # Change the panel layout to 2 x 2
plot(tran.model)
dev.off()
par(mfrow=c(1,1)) # Change back to 1 x 1

The model fit is given by

> tran.model

Call:
lm(formula = Time.tran ~ Tonnage.tran, data = glakes)

Coefficients:
(Intercept) Tonnage.tran  
0.94627 0.05523  

f) The diagnostic plot comes as

Which means all the diagnostic plots comes out to be promising. There is no particular pattern in residual vs fitted and scale-location plot, which means homescedasticity assumption appears valid here. The Q-Q plot appears almost linear which is much improvement over normal linear regression model. And lastly, the leverage plot suggests presence of no influential observations in the data.

g) The slope of the model is 0.055


Related Solutions

Data from a cloud-seeding experiment are in the file CLOUDS on the course website. The file...
Data from a cloud-seeding experiment are in the file CLOUDS on the course website. The file contains rainfall in acre-feet from 52 clouds: 26 of which were chosen at random and seeded with the compound silver nitrate and the other 26 were not seeded with silver nitrate. Simpson, Alsen, and Eden. (1975). A Bayesian analysis of a multiplicative treatment effect in weather modification. Technometrics 17, 161-166. Make a boxplot of the amount of rainfall from the seeded clouds. Discuss. Make...
This file is from a statistics textbook and provides information on over 100 recently sold homes...
This file is from a statistics textbook and provides information on over 100 recently sold homes in a town in Arizona (fictional). The variables that are included are: Price: Sales prices in thousands of dollars Bedrooms: Number of bedrooms Baths: Number of bathrooms Size: Size of the house measured in square feet Pool: Variable is 0 if no pool, and 1 if there is a pool Garage: Variable is 0 if no garage, and 1 if there is a garage...
Textbook: Business Data Networks and Security In your browser you enter the URL of a website...
Textbook: Business Data Networks and Security In your browser you enter the URL of a website you use every day. But today after some delay you receive the infamous 404 Host Not Found error message. What are some of the many possibilities that can cause this problem. List at least 3 that are directly related to the components of TCP/IP communications and how you might troubleshoot each one. (It may be helpful to refer to Chapter 1A to identify some...
the largest cargo airports in the world have a capacity that averages 2000 thousand metric tons...
the largest cargo airports in the world have a capacity that averages 2000 thousand metric tons per year , with a median 1500 thousand metric tons and a standard deviation of 450 thousand metric tons. a.At least what proportion of airports woudl we expect to have capacity between 1,300 and 2,700 thousand metric tons ? b. A cargo airport had a capacity of 1,500 thousand metric tons.How many standard deviations away from the average is that airports capacity?
4. According to the textbook and/or BLS website, U.S. Department of Labor data show that minimum-wage...
4. According to the textbook and/or BLS website, U.S. Department of Labor data show that minimum-wage workers A. tend to work part time and in the communication industry. B. tend to work part time and in the leisure and hospitality industry. C. tend to work full time and in the communication industry. D. tend to work full time and in the leisure and hospitality industry. E. tend to work full time and in the agricultural and mining industries. 5. The...
the excel file cereal data provides a variety of nutritional information about 67 cereals and their...
the excel file cereal data provides a variety of nutritional information about 67 cereals and their shelf location. Use regression analysis to find the best model that explains the relationship between calories and the other variables. Investigat the model assumptions and clearly explain your conclusion. Keep in mind the principle of parsimony! DATA: Cereal Data Product Cereal Name Manufacturer Calories Sodium Fiber Carbs Sugars 1 100% Bran Nabisco 70 130 10 5 6 2 AlI-Bran Kellogg 70 260 9 7...
Using the initial pressure and temperature for neon gas from data file. calculate the volume (in...
Using the initial pressure and temperature for neon gas from data file. calculate the volume (in m^3) per one mole of gas. Round your answer to 4 decimal place. Time Pressure Ne Temperature 0 56.921 22.337 0.5 56.93 22.357 1 56.939 22.379 1.5 56.962 22.398 2 56.972 22.423 2.5 56.995 22.436 3 57.004 22.456 3.5 57.004 22.488 4 56.995 22.507 4.5 56.967 22.526 5 57.031 22.545 5.5 57.059 22.567 6 56.985 22.589 6.5 57.091 22.617 7 57.1 22.634 7.5 57.1...
The Excel file Store and Regional Sales Database provides sales data for computers and peripherals showing...
The Excel file Store and Regional Sales Database provides sales data for computers and peripherals showing the store identification number, sales region, item number, item description, unit price, units sold, and month when the sales were made during the fourth quarter of last year.3 Modify the spreadsheet to calculate the total sales revenue for each of the eight stores as well as each of the three sales regions.
The Excel file Myatt Steak House provides five years of data on key business results for...
The Excel file Myatt Steak House provides five years of data on key business results for a restaurant. Identify the leading and lagging measures, find the correlation matrix, and propose a cause-and-effect model using the strongest correlations. Myatt Steak House 2010 2011 2012 2013 2014 Order Accuracy X1 86.0% 86.0% 89.0% 90.0% 95.0% Timeliness of Delivery X2 84.0% 82.0% 86.0% 93.0% 95.0% Table Cleanliness X3 4.8 4.8 5.1 5.6 5.8 Customer Satisfication X4 93.4% 93.2% 94.2% 95.3% 96.7% Total #...
Delta Airlines provides scheduled air transportation for passengers and cargo throughout the United States and globally...
Delta Airlines provides scheduled air transportation for passengers and cargo throughout the United States and globally a fleet of more than 900 aircraft. Information for its 2015 annual report follows. December 31, ($ in millions)   2015 2014 Asset             Flight and ground equipment under capital leases                  $1,112             $1,141                         Less accumulated amortization                                        782                  767                                                                                                             $   330             $   374 Total Assets                                                                                       $53,134          $54,005 Liabilities             Current Liabilities                                                                                           Current obligation under capital leases                       $    148            $   107 Obligations...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT