In: Statistics and Probability
The WalMart’s fiscal year starts the first week of February. This means that when analyzing the data, week 26 is actually week 30 (26+4 weeks for January) in 2002 or the end of July 2002. Also, week 52 is actually week 4 (52+4 weeks for January 2002 minus 52 weeks for 2002) in 2003 or the end of January 2003. As an example, the spike in sales (revenue) at week 75 occurs in week 27 (75+4 weeks for January 2002 minus 52 weeks for 2002) in 2003 or the first week in July 2003. This corresponds to sales for the July 4th holiday when people are buying barbecue related items. Please use excel.
Week | Sales in $ |
26 | 15200 |
27 | 15600 |
28 | 16400 |
29 | 15600 |
30 | 14200 |
31 | 14400 |
32 | 16400 |
33 | 15200 |
34 | 14400 |
35 | 13800 |
36 | 15000 |
37 | 14100 |
38 | 14400 |
39 | 14000 |
40 | 15600 |
41 | 15000 |
42 | 14400 |
43 | 17800 |
44 | 15000 |
45 | 15200 |
46 | 15800 |
47 | 18600 |
48 | 15400 |
49 | 15500 |
50 | 16800 |
51 | 18700 |
52 | 21400 |
53 | 20900 |
54 | 18800 |
55 | 22400 |
56 | 19400 |
57 | 20000 |
58 | 18100 |
59 | 18000 |
60 | 19600 |
61 | 19000 |
62 | 19200 |
63 | 18000 |
64 | 17600 |
65 | 17200 |
66 | 19800 |
67 | 19600 |
68 | 19600 |
69 | 20000 |
70 | 20800 |
71 | 22800 |
72 | 23000 |
73 | 20800 |
74 | 25000 |
75 | 30600 |
76 | 24000 |
77 | 21200 |
Identify spikes (outliers) in the data where extreme sales values occur and correlate these spikes with actual calendar dates in 2002 or 2003 and with holidays or special events that may occur during these periods.
1. Modeling the data linearly - a. Generate a linear model for this data by choosing two points.
b. Generate a least squares linear regression model for this data.
c. How good is this regression model? Output and discuss the R2 value.
d. What are the marginal sales (derivative, i.e. rate of change) for this department using the linear model with two data points and the regression model?
e. Compare the two models. Which do you feel is better?
f. Remove appropriate outliers as you deem necessary and rerun the linear regression model. What is the marginal sales and discuss improvements.
2. Modeling the data quadratically - a. Generate a quadratic model for this data. Also output and discuss the R2 value.
b. What are the marginal sales for this department using this model?
c. Calculate the model generated relative max/min value. Show backup analytical work.
d. Compare actual and model generated relative max/min value.
e. Remove outliers and rerun the quadratic least squares model. What is the marginal sales and discuss improvements.
3. Comparing models - a. Based on all models run, which model do you feel best predicts future trends? Explain your rationale.
b. Based on the model selected, what type of seasonal adjustments, if any, would be required to meet customer needs?
Since 1st question has four sub questions, I'm answering the 1st one
1. Holidays or special events could be either increases or decreases. They do not necessarily have to be recurring. If it is recurring, then the same effect will be applied to every instance. You can have a one-time holiday, which would effectively allow the model to fit any one-time spikes without affecting the future at all.
If we identified outliers, it is better to remove them from the historical data then to fit them as one-time holidays or special events. That is because every holiday or special events introduces an additional parameter into the model, and so fitting and predicting will become slower with no benefit over just removing the points.
In the data where extreme sales values occcur and correlate these soikes with actual calender dates in 2002 or 2003 and with holidays or special events that may occur during these periods spikes is 23. Because the holidays or special events not taken as outliers. At week 75 occurs in week 27(75+4 weeks for January 2002 minus 52 weeks for 2002) in 2003 or the first week in July 2003. This corresponds to sales for the July 4th holiday when people are buying barbecue related items. 27-3 = 23 outliers.
In the data where extreme sales values occcur and correlate these soikes with actual calender dates in 2002 or 2003 and with holidays or special events that may occur during these periods spikes is 23.