In: Math
Problem 3: A linear regression by using famous data set found in Freedman et al. (1991) in Table 1: ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950.
Table 1: Death rate data in in Freedman |
|||
Obs |
Country |
Cigarette |
Deaths per million |
1 |
Australia |
480 |
180 |
2 |
Canada |
500 |
150 |
3 |
Denmark |
380 |
170 |
4 |
Finland |
1100 |
350 |
5 |
GreatBritain |
1100 |
460 |
6 |
Iceland |
230 |
60 |
7 |
Netherlands |
490 |
240 |
8 |
Norway |
250 |
90 |
9 |
Sweden |
300 |
110 |
10 |
Switzerland |
510 |
250 |
11 |
USA |
1300 |
200 |
The data
Table 1: Death rate data in in Freedman |
|||
Obs |
Country |
Cigarette |
Deaths per million |
1 |
Australia |
480 |
180 |
2 |
Canada |
500 |
150 |
3 |
Denmark |
380 |
170 |
4 |
Finland |
1100 |
350 |
5 |
GreatBritain |
1100 |
460 |
6 |
Iceland |
230 |
60 |
7 |
Netherlands |
490 |
240 |
8 |
Norway |
250 |
90 |
9 |
Sweden |
300 |
110 |
10 |
Switzerland |
510 |
250 |
11 |
USA |
1300 |
200 |
we can draw the scatter plot for the data
Since the scatter plot of cigarette and the standard deviation of the 5 groups in Figure 2: almost perfect linear relationship through origin.
Model 5 is used to stablize the variance .