In: Statistics and Probability
the electric power consumed each month by a chemical plant is thought to be related to the average ambient temperature (x1) and the number of days in the month (x2). the past year’s historical data are available and are presented in the following table: y (electric power consumption) x1 (temperature) x2 (no of days) 240 25 24 236 31 21 270 45 24 274 60 25 301 65 25 316 72 26 300 80 25 296 84 25 267 75 24 276 60 25 288 50 25 261 38 23 i. find the equation of least squares regression line ŷ = a + bx1. ii. find the equation of least squares regression line ŷ = a + bx2. iii. for each regression line, interpret the values of a and b. iv. what measurement should be calculated to determine which variables (x1 or x2) that best describe y? calculate the measurement and interpret it. v. which variables (x1 or x2) that best describe y? why do you choose this variable?
The given data is:
y | x1 | x2 |
240 | 25 | 24 |
236 | 31 | 21 |
270 | 45 | 24 |
274 | 60 | 25 |
301 | 65 | 25 |
316 | 72 | 26 |
300 | 80 | 25 |
296 | 84 | 25 |
267 | 75 | 24 |
276 | 60 | 25 |
288 | 50 | 25 |
261 | 38 | 23 |
i. find the equation of least squares regression line ŷ = a + bx1
First let us estimate the equation
For this, we need to have the the sum of squares and cross products.Let us create the following table for the same:
x1 | y | x1^2 | y^2 | x1*y | |
25 | 240 | 625 | 57600 | 6000 | |
31 | 236 | 961 | 55696 | 7316 | |
45 | 270 | 2025 | 72900 | 12150 | |
60 | 274 | 3600 | 75076 | 16440 | |
65 | 301 | 4225 | 90601 | 19565 | |
72 | 316 | 5184 | 99856 | 22752 | |
80 | 300 | 6400 | 90000 | 24000 | |
84 | 296 | 7056 | 87616 | 24864 | |
75 | 267 | 5625 | 71289 | 20025 | |
60 | 276 | 3600 | 76176 | 16560 | |
50 | 288 | 2500 | 82944 | 14400 | |
38 | 261 | 1444 | 68121 | 9918 | |
Total | 685 | 3325 | 43245 | 927875 | 193990 |
We have n=12
The estimate
The estimate of a,
The correlation coefficient r between y and X1 is
Coefficient of determination is
The equation of regression line is
ii. find the equation of least squares regression line ŷ = a + bx2.
x2 | y | x2^2 | y^2 | x2*y | |
24 | 240 | 576 | 57600 | 5760 | |
21 | 236 | 441 | 55696 | 4956 | |
24 | 270 | 576 | 72900 | 6480 | |
25 | 274 | 625 | 75076 | 6850 | |
25 | 301 | 625 | 90601 | 7525 | |
26 | 316 | 676 | 99856 | 8216 | |
25 | 300 | 625 | 90000 | 7500 | |
25 | 296 | 625 | 87616 | 7400 | |
24 | 267 | 576 | 71289 | 6408 | |
25 | 276 | 625 | 76176 | 6900 | |
25 | 288 | 625 | 82944 | 7200 | |
23 | 261 | 529 | 68121 | 6003 | |
Total | 292 | 3325 | 7124 | 927875 | 81198 |
Coefficient of determination is
The equation of least squares regression line is
iii. for each regression line, interpret the values of a and b.
The equation with power and average ambient temperature is
Here, we have the intercept as 219.38. This is the value of the power when teperature is zero or in other words, this is the overhead charges. The slope is 1.0109 which is the increase in power for unit increase in temperature.
The equation relating power and the number of days is .
Here theoretically when number of days in a month is zero, the power consumption is -100.5167. For every additional day of work, the raise in power is 15.5178.
iv. what measurement should be calculated to determine which variables (x1 or x2) that best describe y? calculate the measurement and interpret it.
Here, we measure the coefficient of determination which describes how much of the variation in power consumption is explained by the liner models.
For the model involving X1, the . ie 64.41% of the variation in power consumption is explained by the average ambient temperature.
For the model involving X2, the . ie 68.39% of the variation in power consumption is explained by the number of days in the month.
v. which variables (x1 or x2) that best describe y? why do you choose this variable?
From the foregoing, we see that the for the model with X2 was more and hence, I choose the variable X2 since it best describe the model.