Question

In: Math

Problem 3: A linear regression by using famous data set found in Freedman et al. (1991)...

Problem 3: A linear regression by using famous data set found in Freedman et al. (1991) in Table 1: ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950.

Table 1: Death rate data in in Freedman

Obs

Country

Cigarette

Deaths per million

1

Australia

480

180

2

Canada

500

150

3

Denmark

380

170

4

Finland

1100

350

5

GreatBritain

1100

460

6

Iceland

230

60

7

Netherlands

490

240

8

Norway

250

90

9

Sweden

300

110

10

Switzerland

510

250

11

USA

1300

200

  1. Perform the simple linear regression with and without USA and make the overlay graph.
  2. The question is: should we use USA data, should the regression line pass through the original? Give your answer.

Solutions

Expert Solution

Here, Number of cigarettes consumed is independent variable => x

Deaths per million is dependent variable => y

Thus, we have

X Y X^2 Y^2 XY
480 180 230400 32400 86400
500 150 250000 22500 75000
380 170 144400 28900 64600
1100 350 1210000 122500 385000
1100 460 1210000 211600 506000
230 60 52900 3600 13800
490 240 240100 57600 117600
250 90 62500 8100 22500
300 110 90000 12100 33000
510 250 260100 62500 127500
1300 200 1690000 40000 260000
Total 6640 2260 5440400 601800 1691400

We have, y = a + bx where

Thus,

a = (2260*5440400 - 6640*1691400)/ (11*5440400 - 6640^2) = 67.56

b = (11*1691400 - 6640*2260)/ (11*5440400 - 6640^2) = 0.228

Thus, y = 67.56 + 0.228*x

Now, this includes USA data.

For the regression line pass through the origin, its intercept should be 0. Here, it is 67.56 and not zero

Thus, regression line wont pass through the origin

Now, if we exclude USA data, we get

X Y X^2 Y^2 XY
480 180 230400 32400 86400
500 150 250000 22500 75000
380 170 144400 28900 64600
1100 350 1210000 122500 385000
1100 460 1210000 211600 506000
230 60 52900 3600 13800
490 240 240100 57600 117600
250 90 62500 8100 22500
300 110 90000 12100 33000
510 250 260100 62500 127500
Total 5340 2060 3750400 561800 1431400

We have, y = a + bx where

Thus,

a = (2060*3750400 - 5340*1431400)/ (10*3750400 - 5340^2) = 9.14

b = (10*1431400 - 5340*2060)/ (10*3750400 - 5340^2) = 0.369

Thus, y = 9.14 + 0.369*x


Related Solutions

Problem 3: A linear regression by using famous data set found in Freedman et al. (1991)...
Problem 3: A linear regression by using famous data set found in Freedman et al. (1991) in Table 1: ‘Statistics’ refers to the percapita consumption of cigarettes in various countries in 1930 and the death rates (number of deaths per million people) from lung cancer for 1950. Table 1: Death rate data in in Freedman Obs Country Cigarette Deaths per million 1 Australia 480 180 2 Canada 500 150 3 Denmark 380 170 4 Finland 1100 350 5 GreatBritain 1100...
Problem 3 (A Real Data Application). Recall in the simple linear regression model in Module 3,...
Problem 3 (A Real Data Application). Recall in the simple linear regression model in Module 3, I gave a real data example using the Nobel-winning Capital Asset Pricing Model (CAPM). In that example, we obtained R2 = 0.108, or 10.8%, which is a small value way less than 100%. This means that the single independent variable, the market return, RM, does not explain the return of an individual stock or portfolio very well in this simple linear regression model. Researchers...
Run a linear regression using Excel’s Data Analysis regression tool. Construct the linear regression equation and...
Run a linear regression using Excel’s Data Analysis regression tool. Construct the linear regression equation and determine the predicted total sales value if the number of promotions is 6. Is there a significant relationship? Clearly explain your reasoning using the regression results. Number of Promotions Total Sales 3 2554 2 1746 11 2755 14 1935 15 2461 4 2727 5 2231 14 2791 12 2557 4 1897 2 2022 7 2673 11 2947 11 1573 14 2980
9. The data presented in Problem 7 are analyzed using muliple linear regression analysis and the...
9. The data presented in Problem 7 are analyzed using muliple linear regression analysis and the models are shown here. In the models, the data are coded as 1= new treatment and 0= standard treatment, and age greater than 65 is coded as 1= yes and 0= no. y= 53.85- 23.54 (Treatment) y= 45.31- 19.88 (Treatment) + 14.64 (Age > 65) y= 45.51 - 20.21 (Treatment) + 14.29 (Age> 65) + .75 (Treatment X Age > 65) Patients < 65...
The data presented in Problem 7 are analyzed using multiple linear regression analysis and the models...
The data presented in Problem 7 are analyzed using multiple linear regression analysis and the models are shown here. In the models, the data are coded as 1 = new medication and 0 = standard medication, and age 65 and older is coded as 1 = yes and 0 = no. ŷ = 53.85 − 23.54 (Medication) ŷ = 45.31 − 19.88 (Medication) + 14.64 (Age 65 +) ŷ = 45.51 − 20.21 ( Medication ) + 14.29 ( Age...
Directions: Use SPSS to compute the Regression Line. Problem: Using the following set of data and...
Directions: Use SPSS to compute the Regression Line. Problem: Using the following set of data and Excel, compute the regression line. The data set represents the number of hours of training to predict how severe injuries will be if someone is injured playing football. Briefly summarize your findings. Training Injuries Training Injuries 12 8 11 5 3 7 16 7 22 2 14 8 12 5 15 3 11 4 16 7 31 1 22 3 27 5 24 8...
The data set cherry.csv, from Hand et al. (1994), contains measurements of diameter (inches), height (feet),...
The data set cherry.csv, from Hand et al. (1994), contains measurements of diameter (inches), height (feet), and timber volume (cubic feet) for a sample of 31 black cherry trees. Diameter and height of trees are easily measured, but volume is more difficult to measure. (i) Suppose that these trees are a SRS from a forest of N = 2967 trees and that the sum of the diameters for all trees in the forest is tx = 41835 inches. Use ratio...
.  Draw a plot of the following set of data and determine the linear regression equation.  What is...
.  Draw a plot of the following set of data and determine the linear regression equation.  What is the      value of the slope and intercept?   What is r and R2?  Are there any outlier values?   (15 points)                                 Age (X):     20  25  36  29  41  35  56  43  66  50  59  67  51  75  75  81  54  66  52  48            Total Body Water (Y):     61  57  52  59  53  58  48  51  37  44  42  41  48  38  41  39  47  42  51  50  
13. Linear regression analysis was performed for a standard addition data set based on this experiment....
13. Linear regression analysis was performed for a standard addition data set based on this experiment. The analysis yielded the following results: m = 0.006110, b = 0.008170. What is the unknown's riboflavin concentration (ppm) in the standard addition solutions? a) 2.0 b) 13.37 c) 1.337 d) 1.67 15. The riboflavin concentration in the unknown powder solution is 58.0 ppm. The total volume of the powder solution is 250.00 mL. What is the concentration in the tablet (mg ribo/tablet) if...
According to Mojoli et al (2019), what 3 complications from mechanical ventilation can be diagnosed using...
According to Mojoli et al (2019), what 3 complications from mechanical ventilation can be diagnosed using lung ultrasound?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT