In: Statistics and Probability
The number of unemployed persons registered in the 1000s during
the corona pandemic in Sweden has been noted as follows:
January: 410, February: 454, March: 389, April: 438, May:
460.
(a) Fit a linear model with the least squares method to this
data
(b) Estimate the number of registered unemployed in June. Describe
the meaning of extrapolation
in this context.
Part a
Here we need to fit a linear model with the least square method using the given data.
Here month and number of patients are given but in order to fit the linear model we need to convert the month in numeric values hence months can be masked as:
January. | 1 |
February | 2 |
March | 3 |
April | 4 |
May | 5 |
June | 6 |
So data can be represented as
The following data are passed:
X (month) | Y( Number of unemployed persons in 1000s) |
1 | 410 |
2 | 454 |
3 | 389 |
4 | 438 |
5 | 460 |
The independent variable is X(month), and the dependent variable is Y(Number of unemployed persons in 1000s). In order to compute the regression coefficients, the following table needs to be used:
Theory
A linear regression model depicts a linear regression model that minimizes the sum of squared errors for a set of pairs (Xi,Yi).
The linear regression equation, also known as least squares equation has the following form: Y=a+bX, where the regression coefficients a and b are computed by this regression calculator as follows:
Calculation
X | Y | X*Y | X2 | Y2 | |
1 | 410 | 410 | 1 | 168100 | |
2 | 454 | 908 | 4 | 206116 | |
3 | 389 | 1167 | 9 | 151321 | |
4 | 438 | 1752 | 16 | 191844 | |
5 | 460 | 2300 | 25 | 211600 | |
Sum = | 15 | 2151 | 6537 | 55 | 928981 |
Based on the above table, the following is calculated:
Therefore, based on the above calculations, the regression coefficients (the slope m, and the y-intercept n) are obtained as follows:
Therefore, we find that the regression equation is:
Y( number of unemployed person in 1000s) = 405+8.4 X (month)
Y=405+8.4X
Graphically
Part b
What is extrapolation?
"Extrapolation" means that estimating the value of a dependent variable which is beyond the "scope of the model/ scope of the data set".
In the case of linear regression, it occurs when one uses a linear regression equation to estimate the average values of Y (dependent variable) or tries to predict a new value of Y on the basis of X value which is not present in the given data set or sample data. In the context of the given data set, we have to estimate the number of registered unemployed in June. Here June is not present in the data set so we need a linear model to estimate the value, which is calculated as below:
Y = 6 for Month June
Y=405+8.4X
Y = 405 + 8.4*6
Y = 405 + 50.4
Y = 455.4 ( in 1000s )