In: Statistics and Probability
In a study of the role of young drivers in automobile accidents, data on percentage of licensed drivers under the age of 21 and the number of fatal accidents per 1000 licenses were determined for 32 cities. The data are stored in Table B. The first column contains a number as the city code, the second column contains the percentage of drivers who are under 21, and the third column contains the number of fatal accidents per 1000 drivers. The primary interest is whether or not the number of fatal accidents is dependent upon the proportion of licensed drivers that are under 21.
City Number | % of drivers under 21 | # of fatal accidents per 1000 drivers |
1 | 5 | 0 |
2 | 5 | 0 |
3 | 5 | 0 |
4 | 13 | 2.029 |
5 | 17 | 5.12 |
6 | 7 | 0.468 |
7 | 13 | 1.463 |
8 | 14 | 3.412 |
9 | 8 | 2.104 |
10 | 15 | 3.146 |
11 | 11 | 2.081 |
12 | 14 | 3.612 |
13 | 11 | 2.117 |
14 | 12 | 2.758 |
15 | 9 | 1.819 |
16 | 8 | 1.483 |
17 | 14 | 3.211 |
18 | 10 | 1.157 |
19 | 10 | 0.871 |
20 | 9 | 1.34 |
21 | 15 | 2.751 |
22 | 6 | 0 |
23 | 9 | 0.712 |
24 | 12 | 1.93 |
25 | 17 | 3.899 |
26 | 4 | 0 |
27 | 14 | 2.992 |
28 | 9 | 0.577 |
29 | 8 | 1.819 |
30 | 11 | 2.218 |
31 | 9 | 1.075 |
32 | 15 | 2.105 |
Regression analysis, where one variable depends on another, can be used to predict levels of a dependent variable for specified levels of an independent variable. Use the EXCEL REGRESSION command to calculate the intercept and slope of the least‑squares line, as well as the analysis of variance associated with that line. Fill in the following table and use the results to answer the next few questions. Carefully choose your independent and dependent variables and input them correctly using EXCEL’s regression command. In this example, the percentage of drivers under the age of 21 affects the number of Fatals/1000 licenses.
The regression equation (least‑squares line) is
Fatals/1000 licenses = + % under 21
(intercept) (slope)
Analysis of variance
Source DF SS MS F P
Regression 1 ________ _______ ________ _______
Residual (Error) 30 ________ _______
10. What is the estimated increase in number of fatal accidents per 1000 licenses due to a one percent increase in the percentage of drivers under 21 (i.e. the slope)? |
11. What is the standard deviation of the estimated slope? |
12. What is the estimated number of fatal accidents per 1000 licenses if there were no drivers under the age of 21 (i.e. the y intercept)? |
13. What percentage of the variation in accident fatalities can be explained by the linear relationship with drivers under 21 (i.e. 100 ´ the unadjusted coefficient of determination)? |
x | y | (x-x̅)² | (y-ȳ)² | (x-x̅)(y-ȳ) |
5 | 0 | 31.29 | 3.32 | 10.19 |
5 | 0 | 31.29 | 3.32 | 10.19 |
5 | 0 | 31.29 | 3.32 | 10.19 |
13 | 2.029 | 5.79 | 0.04 | 0.50 |
17 | 5.12 | 41.04 | 10.88 | 21.13 |
7 | 0.468 | 12.92 | 1.83 | 4.86 |
13 | 1.463 | 5.79 | 0.12810 | -0.8612 |
14 | 3.412 | 11.60 | 2.53158 | 5.420 |
8 | 2.104 | 6.73 | 0.08 | -0.73 |
15 | 3.146 | 19.42 | 1.76 | 5.84 |
11 | 2.081 | 0.17 | 0.07 | 0.11 |
14 | 3.612 | 11.60 | 3.21 | 6.10 |
11 | 2.117 | 0.17 | 0.09 | 0.12 |
12 | 2.758 | 1.98 | 0.88 | 1.32 |
9 | 1.819 | 2.54 | 0.00 | 0.00 |
8 | 1.483 | 6.73 | 0.11 | 0.88 |
14 | 3.211 | 11.60 | 1.93 | 4.74 |
10 | 1.157 | 0.35 | 0.44 | 0.39 |
10 | 0.871 | 0.35 | 0.90 | 0.56 |
9 | 1.34 | 2.54 | 0.23 | 0.77 |
15 | 2.751 | 19.42 | 0.87 | 4.10 |
6 | 0 | 21.10 | 3.32 | 8.36 |
9 | 0.712 | 2.54 | 1.23 | 1.77 |
12 | 1.93 | 1.978 | 0.012 | 0.153 |
17 | 3.899 | 41.040 | 4.318 | 13.313 |
4 | 0 | 43.478 | 3.316 | 12.007 |
14 | 2.992 | 11.603 | 1.371 | 3.989 |
9 | 0.577 | 2.540 | 1.547 | 1.982 |
8 | 1.819 | 6.728 | 0.000 | 0.005 |
11 | 2.218 | 0.165 | 0.158 | 0.161 |
9 | 1.075 | 2.540 | 0.556 | 1.189 |
15 | 2.105 | 19.415 | 0.081 | 1.252 |
ΣX | ΣY | Σ(x-x̅)² | Σ(y-ȳ)² | Σ(x-x̅)(y-ȳ) | |
total sum | 339.00 | 58.27 | 407.72 | 51.83 | 129.98 |
mean | 10.59 | 1.82 | SSxx | SSyy | SSxy |
sample size , n = 32
here, x̅ = Σx / n= 10.594 ,
ȳ = Σy/n = 1.821
SSxx = Σ(x-x̅)² = 407.7188
SSxy= Σ(x-x̅)(y-ȳ) = 130.0
estimated slope , ß1 = SSxy/SSxx = 130.0
/ 407.719 = 0.31881
intercept, ß0 = y̅-ß1* x̄ =
-1.55643
so, regression line is Ŷ =
-1.5564 + 0.3188 *x
==================
Fatals/1000 licenses = -1.5564 + 0.3188 % under 21
Anova table | |||||
variation | SS | df | MS | F-stat | p-value |
regression | 41.439 | 1 | 41.4392 | 119.6079 | 0.0000 |
error, | 10.394 | 30 | 0.3465 | ||
total | 51.833 | 31 |
10)
answer: 0.3188
11) estimated std error of slope =Se(ß1) = Se/√Sxx = 0.589 /√ 407.72 = 0.0292
12) -1.5564
13) R² = (Sxy)²/(Sx.Sy) = 0.799 or 79.9%