In: Statistics and Probability
Price($) |
Promotional Exp (K) |
Quality |
City: 1/Suburban: 0 |
Sales(K) |
949 |
5 |
100 |
1 |
168 |
941 |
4.3 |
94 |
1 |
150 |
934 |
3 |
89 |
1 |
168 |
921 |
2 |
85 |
1 |
148 |
915 |
0.75 |
79 |
1 |
152 |
909 |
4.8 |
75 |
1 |
162 |
904 |
3.6 |
70 |
1 |
160 |
1014 |
3 |
63 |
0 |
123 |
1006 |
1.5 |
60 |
0 |
130 |
990 |
0.7 |
55 |
0 |
116 |
978 |
4.7 |
51 |
1 |
142 |
962 |
3.5 |
47 |
1 |
145 |
955 |
2.8 |
42 |
1 |
134 |
953 |
1.3 |
35 |
0 |
128 |
1050 |
0.25 |
30 |
0 |
117 |
1040 |
4.5 |
26 |
0 |
118 |
1038 |
3.2 |
22 |
0 |
107 |
1022 |
2.4 |
17 |
0 |
124 |
1021 |
1.2 |
12 |
0 |
104 |
1018 |
0 |
6 |
0 |
106 |
1010 |
2.9 |
60 |
0 |
120 |
935 |
4.4 |
91 |
1 |
153 |
Case Study
Please consider the data presented above for the monthly sales of Ever-cool brand of refrigerators in 1,000s of dollars (Dependent Variable) and the four independent variables.
Independent variables are:
Price (in dollars); Promotional Expenditure (in 1,000s of dollars); Quality of service (scale of 1-100); location (categorical variable: city area: 1; suburban area: 0).
Develop a multiple linear regression equation using either Excel or Minitab and based on the relevant outputs please answer the following questions:
a. Based on the relevant residual and normality plots, do you see any evidence of violation of assumptions (Linearity, Normality, Equal variance)? You must attach the relevant plots as part of your report.
b. State the multiple regression equation and interpret the statistical meaning of the estimated slopes, b1, b2, b3, and b4 (corresponding to the four independent variables).
c. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. Based on these results, indicate the independent variable(s) to include in this model. (Based on t - test results)
d. Construct a 95% confidence interval estimate of the population slope between the independent variable ‘Quality’ and the dependent variable ‘Monthly Sales’ (B3) (please note that Minitab can’t do this directly, however you may use the relevant information from Minitab output and then construct the confidence interval manually)
e. Perform the overall F- test and comment on the significance of the model.
Please follow the following instructions:
Let the dependent variable, Y = Sales, and the independent variables, x1 = Price, x2 = Promotional Exp, x3 = Quality, and x4 = City
The regression analysis is done in excel by following steps
Step 1: Write the data values in excel. The screenshot is shown below,
Step 2: DATA > Data Analysis > Regression > OK. The screenshot is shown below,
Step 3: Select Input Y Range: 'Hours' column, Input X Range: 'Feet and Large' column, Click on Residual plot and Normal Probability Plot then OK. The screenshot is shown below,
The result is obtained. The screenshot is shown below,
a)
Linearity
From the line fit plot (scatter plot), there is a linear trend which indicates that the normality assumption is met.
Normality
From the normal probability plot, the residual values are along the straight line which indicates that the normality assumption is met.
Equality of variance
From the normal probability plot, the residual values are randomly distributed along the horizontal line which indicates that the equality of variances assumption is met.
b)
From the regression summary report
Coefficient | Interpretation | |
x1 | -0.136 | For 1 unit increase in price, the sales decrease by 0.136 |
x2 | 1.759 | For 1 unit increase in promotional experience, the sales increase by 1.759 |
x3 | 0.241 | For 1 unit increase in quality, the sales increase by 0.241 |
x4 | 12.286 | If suburban, the sales increase by 12.286 |
c)
From the regression summary report
P-value | ||||
x1 | 0.086 | < | 0.05 | Significant |
x2 | 0.187 | > | 0.05 | Not Significant |
x3 | 0.014 | < | 0.05 | Significant |
x4 | 0.089 | > | 0.05 | Not Significant |
Hence only variable x1 and x3 should be included
d)
From the regression summary report, the 95% CI for the coefficient estimate of Quality is,
Lower 95% | Upper 95% | |
Quality | 0.054963 | 0.426603 |
e)
From the regression summary report,
Significance F | |
Regression | 2.31294E-08 |
The significance F value is 0.0000 which is less than 0.05 at a 5% significance level which mean the model fits the data value at the 5% significance level. Hence we can conclude that independent variables fit the model significantly compared to intercept only model.