In: Statistics and Probability
A production plant cost-control engineer is responsible for cost reduction. One of the costly items in his plant is the amount of water used by the production facilities each month. He decided to investigate water usage by collecting seventeen observations on his plant's water usage and other variables.
Variable Description
Temperature Average monthly temperate (F)
Production Amount of production (M pounds)
Days Number of plant operating days in the month
Persons Number of persons on the monthly plant payroll
Water Monthly water usage (gallons)
Temperature |
Production |
Days |
Persons |
Water |
58.8 |
7107 |
21 |
129 |
3067 |
65.2 |
6373 |
22 |
141 |
2828 |
70.9 |
6796 |
22 |
153 |
2891 |
77.4 |
9208 |
20 |
166 |
2994 |
79.3 |
14792 |
25 |
193 |
3082 |
81.0 |
14564 |
23 |
189 |
3898 |
71.9 |
11964 |
20 |
175 |
3502 |
63.9 |
13526 |
23 |
186 |
3060 |
54.5 |
12656 |
20 |
190 |
3211 |
39.5 |
14119 |
20 |
187 |
3286 |
44.5 |
16691 |
22 |
195 |
3542 |
43.6 |
14571 |
19 |
206 |
3125 |
56.0 |
13619 |
22 |
198 |
3022 |
64.7 |
14575 |
22 |
192 |
2922 |
73.0 |
14556 |
21 |
191 |
3950 |
78.9 |
18573 |
21 |
200 |
4488 |
79.4 |
15618 |
22 |
200 |
3295 |
Assume that you want to develop a linear model to predict the amount of “water” based on the monthly “production”.
i. Use the least squares method to estimate the regression coefficients b0 and b1 and state the regression equation.
ii. Which coefficients are statistically significant at the 5% level?
iii. Give the interpretation of the regression coefficients b0 and b1. What is the expected amount of “water”, if the “production” level is equal to 10000? Produce confidence interval for your estimate. Describe your finding in context.
iv. Interpret the value of the R2. Try to improve your model – Show an example
x | y | (x-x̅)² | (y-ȳ)² | (x-x̅)(y-ȳ) |
7107 | 3067 | 33564301.46 | 56029.67 | 1371348.57 |
6373 | 2828 | 42607872.28 | 226296.09 | 3105156.16 |
6796 | 2891 | 37264561.16 | 170326.15 | 2519350.92 |
9208 | 2994 | 13634339.04 | 95917.73 | 1143579.86 |
14792 | 3082 | 3577883.52 | 49153.50 | -419363.20 |
14564 | 3898 | 2767330.10 | 353185.50 | 988625.74 |
11964 | 3502 | 876977.16 | 39320.56 | -185696.61 |
13526 | 3060 | 391287.04 | 59392.56 | -152445.20 |
12656 | 3211 | 59765.87 | 8594.38 | 22663.86 |
14119 | 3286 | 1484813.93 | 313.50 | -21575.14 |
16691 | 3542 | 14368113.22 | 56784.09 | 903260.86 |
14571 | 3125 | 2790668.52 | 31935.79 | -298533.43 |
13619 | 3022 | 516284.52 | 79358.20 | -202413.96 |
14575 | 2922 | 2804048.75 | 145699.38 | -639177.73 |
14556 | 3950 | 2740777.63 | 417696.09 | 1069958.92 |
18573 | 4488 | 32177589.93 | 1402552.56 | 6717943.21 |
15618 | 3295 | 7384966.10 | 75.79 | -23658.49 |
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 219308 | 56163 | 199011580.2 | 3192631.5 | 15899024.35 |
mean | 12900.47 | 3303.71 | SSxx | SSyy | SSxy |
1)
sample size , n = 17
here, x̅ = Σx / n= 12900.47 ,
ȳ = Σy/n =
3303.71
SSxx = Σ(x-x̅)² =
199011580.2353
SSxy= Σ(x-x̅)(y-ȳ) =
15899024.4
estimated slope , ß1 = SSxy/SSxx =
15899024.4 / 199011580.235
= 0.0799
intercept, ß0 = y̅-ß1* x̄ =
2273.0880
so, regression line is Ŷ =
2273.0880 + 0.0799
*x
2)slope hypothesis test
tail= 2
Ho: ß1= 0
H1: ß1╪ 0
n= 17
alpha = 0.05
estimated std error of slope =Se(ß1) = Se/√Sxx =
358.000 /√ 199011580.24
= 0.0254
t stat = estimated slope/std error =ß1 /Se(ß1) =
0.0799 / 0.0254 =
3.1481
p-value = 0.0066
decison : p-value<α , reject Ho
Conclusion: Reject Ho and conclude that slope is
significantly different from zero
3)
Ŷ = 2273.0880 + 0.0799 *x
slope interpretation
for every 1 unit increase in production , water uses will increase by 0.0799 gallons
intercept interpretation
when amount of production is 0 ,water consumption will be 2273.08 gallons.
X Value= 10000
Confidence Level= 95%
Predicted Y at X= 10000 is
Ŷ = 2273.08799 +
0.079890 * 10000 =
3071.987
standard error, S(ŷ)=Se*√(1/n+(X-X̅)²/Sxx) =
113.828
margin of error,E=t*Std error=t* S(ŷ) =
2.1314 * 113.8283 =
242.6192
Confidence Lower Limit=Ŷ +E =
3071.987 - 242.619
= 2829.368
Confidence Upper Limit=Ŷ +E = 3071.987
+ 242.619 =
3314.607
4)
R² = (Sxy)²/(Sx.Sy) = 0.3978
only 39.78% of variation is explained by regression model