Question

In: Statistics and Probability

The data below contains the total costs and depths of 16 offshore oil wells. It is...

The data below contains the total costs and depths of 16 offshore oil wells. It is expected that cost is a linear function of the depth.

(a) Write out the equation of the regression line. Interpret the slope and intercept in the context of this problem. Do they make sense?

b) Test the hypothesis that the slope parameter is zero 4 different ways (ANOVA, t-test for β1, t-test for ρ, and a confidence interval for β1).

(c) What is the R2 for the SLR you have obtained? What does it mean?

(d) Plot the standardized residuals against the independent variable. What can you say about the regression using this graph? (HINT: Are there outliers? Does it seem reasonable to claim the data has a linear fit?)

Depth(feet) vs Cost($1000)

Depth (feet) Cost ($1000)
5000 2596.800049
5200 3328
6000 3181.100098
6538 3198.399902
7109 4779.899902
7556 5905.600098
8005 5769.200195
8207 8089.5
8210 4813.100098
8600 5618.700195
9026 7736
9197 6788.299805
9926 7840.799805
10813 8882.5
13800 10489.5
14311 12506.59961

Solutions

Expert Solution

Depth (feet), X Cost ($1000), Y XY
5000 2596.800049 12984000.25 25000000 6743370.494
5200 3328 17305600 27040000 11075584
6000 3181.100098 19086600.59 36000000 10119397.83
6538 3198.399902 20911138.56 42745444 10229761.93
7109 4779.899902 33980308.4 50537881 22847443.07
7556 5905.600098 44622714.34 57093136 34876112.52
8005 5769.200195 46182447.56 64080025 33283670.89
8207 8089.5 66390526.5 67354849 65440010.25
8210 4813.100098 39515551.8 67404100 23165932.55
8600 5618.700195 48320821.68 73960000 31569791.88
9026 7736 69825136 81468676 59845696
9197 6788.299805 62431993.31 84584809 46081014.24
9926 7840.799805 77827778.86 98525476 61478141.58
10813 8882.5 96046472.5 116920969 78898806.25
13800 10489.5 144755100 190440000 110029610.3
14311 12506.59961 178981947 204804721 156415033.8
Ʃx = Ʃy = Ʃxy = Ʃx² = Ʃy² =
137498 101523.9998 979168137.4 1287960086 762099377.6
Sample size, n = 16
x̅ = Ʃx/n = 137498/16 = 8593.625
y̅ = Ʃy/n = 101523.9998/16 = 6345.249985
SSxx = Ʃx² - (Ʃx)²/n = 1287960086 - (137498)²/16 = 106353835.8
SSyy = Ʃy² - (Ʃy)²/n = 762099377.55589 - (101523.99976)²/16 = 117904219.6
SSxy = Ʃxy - (Ʃx)(Ʃy)/n = 979168137.36836 - (137498)(101523.99976)/16 = 106708955

a)

Slope, b = SSxy/SSxx = 106708954.95661/106353835.75 = 1.003339035

y-intercept, a = y̅ -b* x̅ = 6345.24998 - (1.00334)*8593.625 = -2277.069432

Regression equation :

ŷ = -2277.0694 + (1.0033) x

Slope interpretation: A unit increase in depth will increase the cost by 1.0033 units

Y -Intercept: this the value at x = 0. this value is not reasonable because at x = 0, the cost is negative which cannot be true.

b)

Anova test:

Null and alternative hypothesis:

Ho: β₁ = 0

Ha: β₁ ≠ 0

SSE = SSyy - SSxy²/SSxx =10838959.72

SSR = SSxy²/SSxx = 107065259.9188

Test statistic:

F = SSR/(SSE/(n-2)) = 107065259.9188/(10838959.7209/14) = 138.2894

P-value = 0.0000

Conclusion:

p-value < α Reject the null hypothesis.

Slope Hypothesis test:

Null and alternative hypothesis:

Ho: β₁ = 0

Ha: β₁ ≠ 0

Slope, b = 1.003339035

Sum of Square error, SSE = SSyy -SSxy²/SSxx = 117904219.63968 - (106708954.95661)²/106353835.75 = 10838959.72

Standard error, se = √(SSE/(n-2)) = √(10838959.7209/(16-2)) = 879.89284

Test statistic:

t = b/(se/√SSxx) = 11.7597

df = n-2 = 14

p-value = T.DIST.2T(ABS(11.7597), 14) = 0.0000

Conclusion:

p-value < α Reject the null hypothesis.

Correlation Hypothesis test:

Null and alternative hypothesis:

Ho: ρ = 0

Ha: ρ ≠ 0

Correlation coefficient, r = SSxy/√(SSxx*SSyy) = 106708954.95661/√(106353835.75*117904219.63968) = 0.9529

Test statistic :  

t = r*√(n-2)/√(1-r²) = 0.9529 *√(16 - 2)/√(1 - 0.9529²) = 11.7597

df = n-2 = 14

p-value = T.DIST.2T(ABS(11.7597), 14) = 0.0000

Conclusion:

p-value < α Reject the null hypothesis. There is a correlation between x and y.

95% Confidence interval for slope:

Lower limit = β₁ - tc*se/√SSxx = 0.8203

Upper limit = β₁ + tc*se/√SSxx = 1.1863

As the confidence interval do not contain 0, we reject the null hypothesis.

c)

Coefficient of determination, r² = (SSxy)²/(SSxx*SSyy)

= (106708954.95661)²/(106353835.75*117904219.63968) = 0.9081

90.81% variation in y is due to the linear relationship between y and x variables.

d) Residuals:

X Y Predicted value, ŷ Residual, y-ŷ
5000 2596.8 -2277.0694 + (1.0033) * 5000 = 2739.6257 -142.8257
5200 3328 -2277.0694 + (1.0033) * 5200 = 2940.2936 387.7064
6000 3181.1 -2277.0694 + (1.0033) * 6000 = 3742.9648 -561.8647
6538 3198.4 -2277.0694 + (1.0033) * 6538 = 4282.7612 -1084.3613
7109 4779.9 -2277.0694 + (1.0033) * 7109 = 4855.6678 -75.7679
7556 5905.6 -2277.0694 + (1.0033) * 7556 = 5304.1603 601.4398
8005 5769.2 -2277.0694 + (1.0033) * 8005 = 5754.6595 14.5406
8207 8089.5 -2277.0694 + (1.0033) * 8207 = 5957.334 2132.1660
8210 4813.1 -2277.0694 + (1.0033) * 8210 = 5960.344 -1147.2439
8600 5618.7 -2277.0694 + (1.0033) * 8600 = 6351.6463 -732.9461
9026 7736 -2277.0694 + (1.0033) * 9026 = 6779.0687 956.9313
9197 6788.3 -2277.0694 + (1.0033) * 9197 = 6950.6397 -162.3399
9926 7840.8 -2277.0694 + (1.0033) * 9926 = 7682.0738 158.7260
10813 8882.5 -2277.0694 + (1.0033) * 10813 = 8572.0356 310.4644
13800 10489.5 -2277.0694 + (1.0033) * 13800 = 11569.0093 -1079.5093
14311 12506.6 -2277.0694 + (1.0033) * 14311 = 12081.7155 424.8841


Related Solutions

The table below contains the demand and price and total cost data for the production of...
The table below contains the demand and price and total cost data for the production of x widgets. Here p is the price (in dollars) of a widget for an annual demand of x widgets, and C is the annual total cost (in dollars) of producing x widgets per year. Annual demand Price 10 147 20 132 30 125 40 128 50 113 60 97 70 85 80 82 90 79 100 53 Use the given data to find a...
Regression Project: Data The table below contains the price, demand, and total cost data for the...
Regression Project: Data The table below contains the price, demand, and total cost data for the production of x widgets. Here p is the price (in dollars) of a widget for an annual demand of x widgets, and C is the annual total cost (in dollars) of producing x widgets per year. Demand x (widgets) Price p ($/widget) Total Cost C ($) 10 141 609 20 133 1103 30 126 1618 40 128 2109 50 113 2603 60 97 3111...
Question 10 The data set below contains total Medicare enrollment data for the state of California...
Question 10 The data set below contains total Medicare enrollment data for the state of California for the years 2013 - 2017. Estimate a linear trend regression model and forecast Medicare enrollment in California for the year 2020. Year Enrollment 2013 5,300,177 2014 5,481,592 2015 5,653,896 2016 5,814,275 2017 5,965,489 Please round intermediate results to one decimal and your final answer to the closest integer. (To facilitate grading, please enter only the number without commas separating thousands and millions. For...
Q12.   Using the data in the table below, plot the total fixed costs (TFC), total variable...
Q12.   Using the data in the table below, plot the total fixed costs (TFC), total variable costs (TVC) and total cost (TC) schedules. Quantity   TFC($) TVC($)    TC($)    0 120    0    120    1    120 60 18    2    120 80 200    3    120    90 210    4    120 105    225    5    120 140    260    6    120 210 330 a.   Given the table in Q12,...
The data set below contains the electricity​ costs, in​ dollars, during July 2013 for a random...
The data set below contains the electricity​ costs, in​ dollars, during July 2013 for a random sample of 30 ​one-bedroom apartments in a large city. Complete parts​ (a) and​ (b). 136 91 199 184 114 196 124 132 146 140 138 168 141 155 193 159 86 175 181 148 210 155 98 163 222 149 111 162 170 116 a. Decide whether the data appear to be approximately normally distributed by comparing data characteristics to theoretical properties. A.​No, the...
1. The table below contains price-demand and total cost data for the production of projectors, where...
1. The table below contains price-demand and total cost data for the production of projectors, where p is the wholesale price (in dollars) of a projector for an annual demand of x projectors and C is the total cost (in dollars) of producing x projectors. x1 p c 1943 1035 900 3190 581 1130 4570 405 1241 6490 124 1800 7330 85 1620 Price-Demand: a. Make a scatter plot for p vs x (price-demand plot). b. Get a regression line...
16. A firm with total costs of $1,400,000 and sales of $2,000,000 merges with a smaller...
16. A firm with total costs of $1,400,000 and sales of $2,000,000 merges with a smaller firm with total costs of $700,000 and sales of $1,200,000 million. Calculate the difference in average cost of production expressed in percent. A. 4.38 percent B. 15 percent C. 9 percent D. 6.72 percent
Sunflower Oil has total sales of $1,300,000 and costs of $900,000. Depreciation is $42,000 and the...
Sunflower Oil has total sales of $1,300,000 and costs of $900,000. Depreciation is $42,000 and the tax rate is 34 percent. The firm has interest expenses of $10,000.   Prepare an income statement of this company. What is the operating cash flow? 2. The 2008 balance sheet of Global Tours showed current assets of $1,360 and current liabilities of $940. The 2009 balance sheet showed current assets of $1,640 and current liabilities of $1,140. What was the change in net working...
Crandall Oil has total sales of $1,100 and costs of $705. Depreciation is $149 and the...
Crandall Oil has total sales of $1,100 and costs of $705. Depreciation is $149 and the tax rate is 35 percent. The firm does not have any interest expense. What is the operating cash flow?
The data file "supermarket.csv" contains data on the total amount of eggs sold, relevant prices and...
The data file "supermarket.csv" contains data on the total amount of eggs sold, relevant prices and advertising information at a supermarket. 1) Treating the data as linear, run a multivariate regression of total egg sales on the three other variables. 2) Perform a log-log transformation of the data (Achieved by taking the natural logarithm of all variables except ad type). Run a multivariate regression of log total egg sales on log egg prices and log cookie prices (discard the dummy...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT