In: Statistics and Probability
A national chain of women’s clothing stores with locations in the large shopping malls thinks that it can do a better job of planning more renovations and expansions if it understands what variables impact sales. It plans a small pilot study on stores in 25 different mall locations. The data it collects consist of monthly sales, store size (sq. ft), number of linear feet of window display, number of competitors located in mall, size of the mall (sq. ft),and distance to nearest competitor (ft).
1. Test the individual regression coefficients. At the 0.05 level of significance, what are your conclusions?
2. f you were going to drop just one variable from the model, which one would you choose? Why?
The store planners for the women’s clothing chain want to find the best model that they can for understanding what store characteristics impact monthly sales.
3. Use stepwise regression to find the best model for the data.
4. Analyze the model you have identified to determine whether it has any problem
5. Write a memo reporting your findings to your boss. Identify the strengths and weaknesses of the model you have chosen.
Sales | Size | Windows | Competitors | Mall Size | Nearest Competitor |
4453 | 3860 | 39 | 12 | 943700 | 227 |
4770 | 4150 | 41 | 15 | 532500 | 142 |
4821 | 3880 | 39 | 15 | 390500 | 263 |
4912 | 4000 | 39 | 13 | 545500 | 219 |
4774 | 4140 | 40 | 10 | 329600 | 232 |
4638 | 4370 | 48 | 14 | 802600 | 257 |
4076 | 3570 | 37 | 16 | 463300 | 241 |
3967 | 3870 | 39 | 16 | 855200 | 220 |
4000 | 4020 | 44 | 21 | 443000 | 188 |
4379 | 3990 | 38 | 16 | 613400 | 209 |
5761 | 4930 | 50 | 15 | 420300 | 220 |
3561 | 3540 | 34 | 15 | 626700 | 167 |
4145 | 3950 | 36 | 14 | 601500 | 187 |
4406 | 3770 | 36 | 12 | 593000 | 199 |
4972 | 3940 | 38 | 11 | 347100 | 204 |
4414 | 3590 | 35 | 10 | 355900 | 146 |
4363 | 4090 | 38 | 13 | 490100 | 206 |
4499 | 4580 | 45 | 16 | 649200 | 144 |
3573 | 3580 | 35 | 18 | 685900 | 178 |
5287 | 4380 | 42 | 15 | 106200 | 149 |
5339 | 4330 | 40 | 10 | 354900 | 231 |
4656 | 4060 | 37 | 11 | 598700 | 225 |
3943 | 3380 | 34 | 16 | 381800 | 163 |
5121 | 4760 | 44 | 17 | 597900 | 224 |
4557 | 3800 | 36 | 14 | 745300 | 195 |
1)to check the individual regression coefficient the test statistic is:
H0: Beta is equal to zero.
vs
H1: Beta not equal to zero.
The minitab output is as follows:
Regression Analysis: sales versus size, window, compititors, mall size, nearest compititors
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 5 5761406 1152281 19.21 0.000
size 1 560848 560848 9.35 0.006
window 1 5946 5946 0.10 0.756
compititors 1 570069 570069 9.51 0.006
mall size 1 620772 620772 10.35 0.005
nearest compititors 1 103620 103620 1.73 0.204
Error 19 1139390 59968
Total 24 6900796
Model Summary
S R-sq R-sq(adj) R-sq(pred)
244.883 83.49% 79.14% 72.06%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 1507 672 2.24 0.037
size 0.919 0.301 3.06 0.006 5.26
window 9.1 28.8 0.31 0.756 5.82
compititors -67.7 22.0 -3.08 0.006 1.40
mall size -0.000903 0.000281 -3.22 0.005 1.12
nearest compititors 2.10 1.59 1.31 0.204 1.24
Regression Equation
sales = 1507 + 0.919 size + 9.1 window - 67.7 compititors - 0.000903 mall size
+ 2.10 nearest compititors
If p-value > level of significance, then we accept H0.
Here, the regression coefficients for window and nearest compititors is greater than 0.05. Therefore, the two variables window and nearest neigbour must be deleted from model.
2) If I were to drop only one variable from the model then i would drop the window variable since its p-value is greater than 0.05 and also the VIF is 5.82 which indicates that the variable is correlated to other variable causing problem of multicollinearity.
3)
Regression Analysis: sales versus size, window, compititors, mall size, nearest compititors
Stepwise Selection of Terms
α to enter = 0.15, α to remove = 0.15
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 3 5627674 1875891 30.94 0.000
size 1 3755185 3755185 61.94 0.000
compititors 1 856069 856069 14.12 0.001
mall size 1 514713 514713 8.49 0.008
Error 21 1273122 60625
Total 24 6900796
Model Summary
S R-sq R-sq(adj) R-sq(pred)
246.221 81.55% 78.92% 75.09%
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 1770 611 2.90 0.009
size 1.045 0.133 7.87 0.000 1.01
compititors -71.0 18.9 -3.76 0.001 1.03
mall size -0.000792 0.000272 -2.91 0.008 1.04
Regression Equation
sales = 1770 + 1.045 size - 71.0 compititors - 0.000792 mall size
4)
a) Normal probability plot implies residuals are normally distribued.
b)Use the residuals versus fits plot to verify the assumption that the residuals are randomly distributed and have constant variance. Here, the points fall randomly on both sides of 0, with no recognizable patterns in the points. Therefore residual are independently distributed.
c)Use the histogram of the residuals to determine whether the data are skewed or include outliers. The bar that is far from other bars may indicates presence of an outlier.
d)similary as in (b) it indicates no problem as residuals indicate no pattern.