In: Statistics and Probability
e. Multiple Regression. Identify at least 3 variables for which you could calculate a multiple regression. Describe the variables and their scale of measurement. Which variables would you include as the predictor variables and which as the outcome variable? Why? Which regression method would you use and why? What would R2 and adjusted R2 tell you about the relationship between the variables?
Consider following data,
Delivery Time | Number of cases | distance |
16.68 | 7 | 560 |
11.5 | 3 | 220 |
12.03 | 3 | 340 |
14.88 | 4 | 80 |
13.75 | 6 | 150 |
18.11 | 7 | 330 |
8 | 2 | 110 |
17.83 | 7 | 210 |
79.24 | 30 | 1460 |
21.5 | 5 | 605 |
40.33 | 16 | 688 |
21 | 10 | 215 |
13.5 | 4 | 255 |
19.75 | 6 | 462 |
24 | 9 | 448 |
29 | 10 | 776 |
15.35 | 6 | 200 |
19 | 7 | 132 |
9.5 | 3 | 36 |
35.1 | 17 | 770 |
17.9 | 10 | 140 |
52.32 | 26 | 810 |
18.75 | 9 | 450 |
19.83 | 8 | 635 |
10.75 | 4 | 150 |
y= Delivery time which is measured in minute.
x1= number of cases = Number cases of product stocked (it is count )
x2= distance walked by driver in ft to deliver that product.
We choose delivery time variable as outcome variable because it depends on predictor variable number of cases (x1) and distance (x2).
I used least square regression method to find fitted regression model.
R2 and adjusted R2 tells us how much of variation in outcome variable is explained by predictor variables.
Here R2 and adjusted R2 tells us how much of variation in delivery time is explained by number of cases and distance walked by driver
R codes
> y=scan('clipboard')
Read 25 items
> y
[1] 16.68 11.50 12.03 14.88 13.75 18.11 8.00 17.83 79.24 21.50
40.33 21.00
[13] 13.50 19.75 24.00 29.00 15.35 19.00 9.50 35.10 17.90 52.32
18.75 19.83
[25] 10.75
> x1=scan('clipboard')
Read 25 items
> x1
[1] 7 3 3 4 6 7 2 7 30 5 16 10 4 6 9 10 6 7 3 17 10 26 9 8 4
> x2=scan('clipboard')
Read 25 items
> x2
[1] 560 220 340 80 150 330 110 210 1460 605 688 215 255 462
448
[16] 776 200 132 36 770 140 810 450 635 150
> fit=lm(y~x1+x2)
> summary(fit)
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-5.7880 -0.6629 0.4364 1.1566 7.4197