In: Statistics and Probability
OrderNo. | DeliveryTime | NumberOfPizzas | Distance | Location |
1 | 16.68 | 7 | 5.6 | Downtown |
2 | 11.5 | 3 | 2.2 | Not Downtown |
3 | 12.03 | 3 | 3.4 | Downtown |
4 | 14.88 | 8 | 0.8 | Not Downtown |
5 | 13.75 | 6 | 1.5 | Not Downtown |
6 | 18.11 | 7 | 3.3 | Downtown |
7 | 8 | 2 | 1.1 | Downtown |
8 | 17.83 | 7 | 2.1 | Downtown |
9 | 79.24 | 30 | 14.6 | Not Downtown |
10 | 21.5 | 5 | 6.05 | Not Downtown |
11 | 40.33 | 16 | 6.88 | Downtown |
12 | 21 | 10 | 2.15 | Not Downtown |
13 | 13.5 | 4 | 2.55 | Not Downtown |
14 | 19.75 | 6 | 4.62 | Not Downtown |
15 | 24 | 9 | 4.48 | Not Downtown |
16 | 29 | 10 | 7.76 | Not Downtown |
17 | 15.35 | 6 | 2 | Not Downtown |
18 | 19 | 7 | 1.32 | Downtown |
19 | 9.5 | 3 | 0.36 | Not Downtown |
20 | 35.1 | 17 | 7.7 | Downtown |
21 | 17.9 | 10 | 1.4 | Downtown |
22 | 52.32 | 26 | 8.1 | Not Downtown |
23 | 18.75 | 9 | 4.5 | Downtown |
24 | 19.83 | 8 | 6.35 | Not Downtown |
25 | 10.75 | 4 | 1.5 | Downtown |
26 | 26 | 9 | 7.3 | Not Downtown |
27 | 14.21 | 5 | 2.4 | Not Downtown |
28 | 21 | 8 | 1.4 | Downtown |
29 | 10 | 4 | 0.9 | Not Downtown |
30 | 36 | 18 | 8 | Downtown |
31 | 18.1 | 9 | 1.5 | Downtown |
What is the correlation between downtown variable (1= downtown, 0=not downtown) and each of the other three variables? What does each of the relationship suggest?
Model A. Run a linear regression that predicts delivery time using number of pizzas, distance, and downtown as independent variables. Summarize and show results in a table. Explain each relationship.
What is R2? What does it mean?
What are the business implications of the regression results?
Model B. Run a linear regression that predicts delivery time using number of pizzas and distance. Summarize and show results in a table. Explain any differences from the previous model.
> Book1 <- read_excel("Book1.xlsx")
> d=Book1
>
r1=cor(d$Location,d$DeliveryTime);r2=cor(d$NumberOfPizzas,d$Location);r3=cor(d$Distance,d$Location)
> r1;r2;r3
[1] -0.09029982
[1] -0.006621049
[1] -0.1282853
> ##here correlation of location variable (downtown=1, not
downtwn=0) with remaining three variables (delievery time, number
of pizzas and distance) are negative so if we are changing our
location from not downtown to downtown then remaining three
variable will decreases (i.e delievry time, no. pizzas, distance
will decreases). Decrease in number of pizza will be very very
small and decrease in delivery time will be small and dicrease in
distance will be significant comparatively
>
> #
>
m=lm(d$DeliveryTime~d$NumberOfPizzas+d$Distance+d$Location)
> m
Call:
lm(formula = d$DeliveryTime ~ d$NumberOfPizzas + d$Distance +
d$Location)
Coefficients:
(Intercept) d$NumberOfPizzas d$Distance d$Location
2.922 1.619 1.344 -1.346
> # fitted linear regression is DeliveryTime=2.922 +
1.619*NumberOfPizzas + 1.344*Distance - 1.346*Location it means
that for increase in number of pizza by one delivery time will
increase by 1.619 units while increasing distance by one unit
delivery time will increase by 1.344 units and if we change
location from not downtown to downtown then delievery time will
reduces by 1.346 units
> summary(m)
Call:
lm(formula = d$DeliveryTime ~ d$NumberOfPizzas + d$Distance +
d$Location)
Residuals:
Min 1Q Median 3Q Max
-5.4641 -1.5229 0.0274 1.1334 8.1359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9223 1.1123 2.627 0.014016 *
d$NumberOfPizzas 1.6189 0.1470 11.016 1.71e-11 ***
d$Distance 1.3435 0.2978 4.511 0.000113 ***
d$Location -1.3461 1.1381 -1.183 0.247219
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.088 on 27 degrees of freedom
Multiple R-squared: 0.9586, Adjusted R-squared:
0.954
F-statistic: 208.5 on 3 and 27 DF, p-value: < 2.2e-16
> # from summary we can say that all independent variables
except location are significant for explaining delievery time
> summary(m)$r.squared ##R^2 = 0.9586 indicates that 95.86% of
the total error is explained by above regression line
[1] 0.9586213
>
> #
> m1=lm(d$DeliveryTime~d$NumberOfPizzas+d$Distance)
> m1
Call:
lm(formula = d$DeliveryTime ~ d$NumberOfPizzas + d$Distance)
Coefficients:
(Intercept) d$NumberOfPizzas d$Distance
2.275 1.591 1.415
> summary(m1)
Call:
lm(formula = d$DeliveryTime ~ d$NumberOfPizzas + d$Distance)
Residuals:
Min 1Q Median 3Q Max
-6.2376 -1.0912 0.0869 1.3644 8.5679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2746 0.9750 2.333 0.0271 *
d$NumberOfPizzas 1.5912 0.1461 10.890 1.42e-11 ***
d$Distance 1.4151 0.2937 4.818 4.56e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.11 on 28 degrees of freedom
Multiple R-squared: 0.9565, Adjusted R-squared:
0.9534
F-statistic: 307.7 on 2 and 28 DF, p-value: < 2.2e-16
> summary(m1)$r.squared #R^2=0.9565 indicates that 95.65% of
total variation is explianed by above regression line
[1] 0.9564774
> #Difference between modelA (model m) and modelB(model m1) is
that coefficients of number of pizza and distance are changed. also
R^2 for model m is greater(but not much significantly greater) than
that for model m1 this is because nuber of independent variable in
model m is more than that in model m1.
>