In: Math
Problem 3: A linear regression by using famous data set found in Freedman et al. (1991) in Table 1: ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950.
Table 1: Death rate data in in Freedman |
|||
Obs |
Country |
Cigarette |
Deaths per million |
1 |
Australia |
480 |
180 |
2 |
Canada |
500 |
150 |
3 |
Denmark |
380 |
170 |
4 |
Finland |
1100 |
350 |
5 |
GreatBritain |
1100 |
460 |
6 |
Iceland |
230 |
60 |
7 |
Netherlands |
490 |
240 |
8 |
Norway |
250 |
90 |
9 |
Sweden |
300 |
110 |
10 |
Switzerland |
510 |
250 |
11 |
USA |
1300 |
200 |
Here, Number of cigarettes consumed is independent variable => x
Deaths per million is dependent variable => y
Thus, we have
X | Y | X^2 | Y^2 | XY | |
480 | 180 | 230400 | 32400 | 86400 | |
500 | 150 | 250000 | 22500 | 75000 | |
380 | 170 | 144400 | 28900 | 64600 | |
1100 | 350 | 1210000 | 122500 | 385000 | |
1100 | 460 | 1210000 | 211600 | 506000 | |
230 | 60 | 52900 | 3600 | 13800 | |
490 | 240 | 240100 | 57600 | 117600 | |
250 | 90 | 62500 | 8100 | 22500 | |
300 | 110 | 90000 | 12100 | 33000 | |
510 | 250 | 260100 | 62500 | 127500 | |
1300 | 200 | 1690000 | 40000 | 260000 | |
Total | 6640 | 2260 | 5440400 | 601800 | 1691400 |
We have, y = a + bx where
Thus,
a = (2260*5440400 - 6640*1691400)/ (11*5440400 - 6640^2) = 67.56
b = (11*1691400 - 6640*2260)/ (11*5440400 - 6640^2) = 0.228
Thus, y = 67.56 + 0.228*x
Now, this includes USA data.
For the regression line pass through the origin, its intercept should be 0. Here, it is 67.56 and not zero
Thus, regression line wont pass through the origin
Now, if we exclude USA data, we get
X | Y | X^2 | Y^2 | XY | |
480 | 180 | 230400 | 32400 | 86400 | |
500 | 150 | 250000 | 22500 | 75000 | |
380 | 170 | 144400 | 28900 | 64600 | |
1100 | 350 | 1210000 | 122500 | 385000 | |
1100 | 460 | 1210000 | 211600 | 506000 | |
230 | 60 | 52900 | 3600 | 13800 | |
490 | 240 | 240100 | 57600 | 117600 | |
250 | 90 | 62500 | 8100 | 22500 | |
300 | 110 | 90000 | 12100 | 33000 | |
510 | 250 | 260100 | 62500 | 127500 | |
Total | 5340 | 2060 | 3750400 | 561800 | 1431400 |
We have, y = a + bx where
Thus,
a = (2060*3750400 - 5340*1431400)/ (10*3750400 - 5340^2) = 9.14
b = (10*1431400 - 5340*2060)/ (10*3750400 - 5340^2) = 0.369
Thus, y = 9.14 + 0.369*x