In: Statistics and Probability
Better Blooms is a florist shop which takes orders by internet and phone, prepares the desired floral arrangements, and delivers within a 25-mile radius of the shop. Since the profit margin on the business is small for each arrangement and the competition from other florists in the area is high, the owner is concerned about setting the appropriate costs for each delivery to be profitable while still being competitive. To investigate, the owner has gathered information on a random sample of 50 recent orders and has asked you to analyze the data using statistical methods and to report to her your findings. Factors thought to be related to the cost of delivering a floral arrangement are thought to be: (1) Preparation Time – the time in minutes between when the customer has placed an order and when it is ready for delivery; more expensive arrangements take longer to prepare, (2) Delivery Time – the travel time in minutes from the shop to the recipient of the arrangement, and (3) Mileage – the distance in miles from the shop to the customer (mileage and delivery time are sometimes inversely proportional when the delivery truck can use an interstate instead of traveling through heavy downtown traffic, for instance). Develop a multiple linear regression equation that describes the relationship between the cost of delivery and the other variables. Do these three variables explain a reasonable amount of the variation in the dependent variable? Estimate the delivery cost for an arrangement that takes 20 minutes for preparation, takes 30 minutes to deliver, and must cover a distance of 12 miles. Test to determine that at least one regression coefficient differs from zero. Also test to see whether any of the variables can be dropped, rerun the regression equation until only significant variables are included. Write management report interpreting the final regression equation, show your analysis in charts, tables, graphs so it is easy for the owner to see the analysis and conclusions, and back up your charts with detailed analysis using formulas and explanations of your statistical analysis steps. The owner wants to know your report is credible, but is not a statistician, so make your explanations clear.
Cost Prep Delivery
Distance
22.60 20 51 20
23.37 11 33 12
31.49 16 47 19
19.31 22 18 8
28.35 18 88 17
22.63 19 20 11
22.63 29 39 11
11.53 10 23 10
21.16 13 20 8
11.53 20 32 10
28.17 15 35 16
20.42 18 23 9
21.53 19 21 10
27.55 17 37 16
23.37 19 25 12
17.10 15 15 6
27.06 13 34 15
15.99 18 13 4
17.96 12 12 4
25.22 16 41 14
24.29 13 28 13
22.76 24 26 10
28.17 19 54 16
19.68 17 18 8
25.15 16 50 13
20.36 19 19 7
21.16 13 19 8
15.22 10 45 14
18.76 12 12 5
18.76 18 16 5
24.29 17 35 13
19.56 12 12 6
22.63 18 30 11
21.16 15 13 8
21.16 11 20 8
19.68 15 19 8
18.76 17 14 7
17.96 15 11 4
23.37 10 25 12
25.22 16 32 14
27.06 18 44 16
21.96 19 28 9
22.63 18 31 11
19.68 17 19 8
22.76 18 28 10
21.96 13 18 9
26.95 10 32 14
26.14 18 44 15
24.29 20 34 13
24.35 16 33 12
Excel result for regression
SUMMARY OUTPUT | |||||
Regression Statistics | |||||
Multiple R | 0.724762262 | ||||
R Square | 0.525280336 | ||||
Adjusted R Square | 0.494320358 | ||||
Standard Error | 2.904395109 | ||||
Observations | 50 | ||||
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 3 | 429.3615464 | 143.1205155 | 16.96643112 | 1.46728E-07 |
Residual | 46 | 388.0335036 | 8.435510948 | ||
Total | 49 | 817.39505 | |||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
Intercept | 13.08089406 | 2.155438076 | 6.06878676 | 2.28357E-07 | 8.742222241 |
Prep | 0.055605071 | 0.113029404 | 0.491952264 | 0.625092687 | -0.171911319 |
Delivery | -0.01721792 | 0.054410003 | -0.316447699 | 0.75309431 | -0.126739575 |
Distance | 0.79827032 | 0.194996702 | 4.093763175 | 0.00016971 | 0.405762316 |
y^= 13.0808 + 0.0556 Prep -0.0172 Delivery +0.7983 Distance
when
y^= 13.0808 + 0.0556* 20 -0.0172 *30+0.7983 *12
= 23.2564
to determine that at least one regression coefficient differs from zero.
since Significance F = 1.467*10^(-7) << 0.05 (alpha)
the model is significant
hence at least one regression coefficient differs from zero.
here p-value of Prep and Delivery are more than 0.05
hence we can remove them
running regression only on Distance
SUMMARY OUTPUT | |||||
Regression Statistics | |||||
Multiple R | 0.722707398 | ||||
R Square | 0.522305983 | ||||
Adjusted R Square | 0.512354025 | ||||
Standard Error | 2.852136351 | ||||
Observations | 50 | ||||
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 1 | 426.9303254 | 426.9303254 | 52.48273231 | 3.10504E-09 |
Residual | 48 | 390.4647246 | 8.134681763 | ||
Total | 49 | 817.39505 | |||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | |
Intercept | 13.99914801 | 1.188341766 | 11.78040561 | 9.09031E-16 | 11.60982675 |
Distance | 0.751192207 | 0.103691428 | 7.244496691 | 3.10504E-09 | 0.542706618 |
y^= 13.9991 + 0.7512 Distance