In: Math
Problem 3: A linear regression by using famous data set found in Freedman et al. (1991) in Table 1: ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950.
|
Table 1: Death rate data in in Freedman |
|||
|
Obs |
Country |
Cigarette |
Deaths per million |
|
1 |
Australia |
480 |
180 |
|
2 |
Canada |
500 |
150 |
|
3 |
Denmark |
380 |
170 |
|
4 |
Finland |
1100 |
350 |
|
5 |
GreatBritain |
1100 |
460 |
|
6 |
Iceland |
230 |
60 |
|
7 |
Netherlands |
490 |
240 |
|
8 |
Norway |
250 |
90 |
|
9 |
Sweden |
300 |
110 |
|
10 |
Switzerland |
510 |
250 |
|
11 |
USA |
1300 |
200 |
The data
|
Table 1: Death rate data in in Freedman |
|||
|
Obs |
Country |
Cigarette |
Deaths per million |
|
1 |
Australia |
480 |
180 |
|
2 |
Canada |
500 |
150 |
|
3 |
Denmark |
380 |
170 |
|
4 |
Finland |
1100 |
350 |
|
5 |
GreatBritain |
1100 |
460 |
|
6 |
Iceland |
230 |
60 |
|
7 |
Netherlands |
490 |
240 |
|
8 |
Norway |
250 |
90 |
|
9 |
Sweden |
300 |
110 |
|
10 |
Switzerland |
510 |
250 |
|
11 |
USA |
1300 |
200 |
we can draw the scatter plot for the data


Since the scatter plot of cigarette and the standard deviation of the 5 groups in Figure 2: almost perfect linear relationship through origin.



Model 5 is used to stablize the variance .