In: Statistics and Probability
Supose you have a table with the next values that represent the
change of temperature in 10 hours, based on the data of the table
respond the next questions
Hour | Temperature (K) |
9:00 |
287.5 |
10:00 | 289.3 |
11:00 | 290.75 |
12:00 | 292.8 |
13:00 | 294.5 |
14:00 | 296.37 |
15:00 | 298.15 |
16:00 | 300.00 |
17:00 | 301.8 |
18:00 | 303.5 |
Considering that the measurements can only be taken at those
hours and there can be small errors in them, if you were asked to
know what was the temperature at 14:30 it would need to use linear
regression.
1- Describe as better you can the algorithm of that method
2- Explain and express clearly the outcome of that hour (14:30)
using linear regression
3- Respecting the linear regression, Is the given temperature
exactly what it was at that hour?
4- How could you know exactly the answer to this question?
The given data is:
Hour (H) |
Temperature (K) |
9 | 287.5 |
10 | 289.3 |
11 | 290.75 |
12 | 292.8 |
13 | 294.5 |
14 | 296.37 |
15 | 298.15 |
16 | 300 |
17 | 301.8 |
18 | 303.5 |
We can fit this data using the Linear Regression Algorithm. Hence, it would follow an equation of the form K = mH + c, where m is the slope of the line and c is the y-intercept. To solve for the values of these parameters, we solve the equation:
where N is the number of data points. Hence, N = 10.
First, we shall evaluate the sums that are required in the formulae. Doing so, we get:
Hour (H) | Temperature (K) | HK | H^2 | |
9 | 287.5 | 2587.5 | 81 | |
10 | 289.3 | 2893 | 100 | |
11 | 290.75 | 3198.25 | 121 | |
12 | 292.8 | 3513.6 | 144 | |
13 | 294.5 | 3828.5 | 169 | |
14 | 296.37 | 4149.18 | 196 | |
15 | 298.15 | 4472.25 | 225 | |
16 | 300 | 4800 | 256 | |
17 | 301.8 | 5130.6 | 289 | |
18 | 303.5 | 5463 | 324 | |
TOTAL | 135 | 2954.67 | 40035.88 | 1905 |
Substituting the values in the formulae, we get:
Hence, the equation of the linear regression model becomes:
Using this model, we can calculate the temperature at 14:30. This can be done by substituting H=14.5 in the model. Doing so, we get:
Hence, the temperature as predicted by the model at 14:30 is 297.26 K.
To check for how accurate the model is predicting the values, we can determine the coefficient of determination.
To do so, we first normalize each of the values of H and K. This is done by subtracting each value from the mean of that column and then dividing it by the standard deviation. Let Z_H be the normalized H and Z_K be the normalized K. Thus, we get:
Hour (H) | Temperature (K) | Z_H | Z_K |
9 | 287.5 | -1.4863 | -1.4682 |
10 | 289.3 | -1.156 | -1.1365 |
11 | 290.75 | -0.8257 | -0.8693 |
12 | 292.8 | -0.4954 | -0.4915 |
13 | 294.5 | -0.1651 | -0.1782 |
14 | 296.37 | 0.1651 | 0.1664 |
15 | 298.15 | 0.4954 | 0.4944 |
16 | 300 | 0.8257 | 0.8354 |
17 | 301.8 | 1.156 | 1.1671 |
18 | 303.5 | 1.4863 | 1.4804 |
Next, we determine the product of Z_H and Z_K, we get:
Hour (H) | Temperature (K) | Z_H | Z_K | Z_H * Z_K |
9 | 287.5 | -1.4863 | -1.4682 | 2.1822 |
10 | 289.3 | -1.156 | -1.1365 | 1.3138 |
11 | 290.75 | -0.8257 | -0.8693 | 0.7178 |
12 | 292.8 | -0.4954 | -0.4915 | 0.2435 |
13 | 294.5 | -0.1651 | -0.1782 | 0.0294 |
14 | 296.37 | 0.1651 | 0.1664 | 0.0275 |
15 | 298.15 | 0.4954 | 0.4944 | 0.2449 |
16 | 300 | 0.8257 | 0.8354 | 0.6898 |
17 | 301.8 | 1.156 | 1.1671 | 1.3492 |
18 | 303.5 | 1.4863 | 1.4804 | 2.2003 |
The coefficient of determination si given using the formula:
Summing the last column, we get:
Substituting this in the above formula, we get:
Since, R2 = 1, the model is giving exact values for each of the intermediate points. Thus, the value obtained using the model at time 14:30 is exact and reliable.