Question

In: Math

sales sqft adv_cost inventory distance district_size storecount 231 1.47 7.62 897 10.9 79.48 40 232 1.53...

sales sqft adv_cost inventory distance district_size storecount
231 1.47 7.62 897 10.9 79.48 40
232 1.53 9.57 892 9.4 51.154 12
156 1.68 8.37 542 7.9 60.358 41
157 1.355 6.73 552 6.8 55.561 68
10 1.33 1.66 242 3.5 89.624 14
10 1.33 1.17 235 3.6 86.898 62
519 1.89 12.96 3670 18.5 108.857 56
520 1.885 12.02 3657 19.1 100.685 75
437 1.7 12.29 3345 17.4 90.138 59
487 1.86 12.5 3322 16.5 111.284 22
299 1.4 9.86 1784 11.5 75.606 26
195 1.63 7.22 1230 9.8 64.245 27
20 1.24 5.23 483 2.4 55.929 11
68 1.51 3.93 114 4.5 73.187 33
428 1.78 11.04 2829 16.4 101.192 51
429 1.725 9.43 3410 15.7 80.694 16
464 1.72 12.19 2873 15.8 105.254 84
15 1.2 1.17 289 3.2 80.937 31
65 1.47 6.56 292 3.9 80.187 97
66 1.51 5.55 312 3.8 85.897 66
98 1.24 5.79 235 6.4 90.219 75
338 1.65 3.34 1160 12.1 121.988 84
249 1.513 2.23 1184 9.7 115.277 12
161 1.4 6.95 399 7.9 50.188 14
467 1.46 13.17 2062 16.1 101.211 89
398 1.84 11.68 2103 15.9 95.406 49
497 1.68 12.11 2743 18 80.195 14
528 1.94 10.98 3779 18 110.025 58
529 1.765 11.11 3916 18.9 103.26 52
99 1.31 4.35 782 4.8 111.732 52
100 1.525 3.79 804 4.7 99.7 41
1 1.45 4.68 1116 3.4 85.882 50
347 1.65 10.08 2223 13.4 94.181 49
348 1.811 7.87 2180 12.1 95.242 50
341 1.64 10.34 1494 14.3 70.693 28
557 1.66 13.55 3522 18.5 94.329 43
508 1.698 11.53 3521 16.7 99.917 50

In the “HomeSales” dataset, the response variable, sales, depends on six potential predictor variables, sq_ft, adv_cost, inventory, distance, district_size, and storecount. Fit four simple linear regression (SLR) models corresponding to the four predictors, sq_ft, adv_cost, inventory, and distance. Then, for each model, create a normal probability plot and a histogram for the residuals, together with the two residual scatterplots: residuals vs. fitted values and residuals vs. observation order.

What do the residual plots for the model with sq_ft as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with adv_cost as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with inventory as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with distance as the predictor indicate about the validity of this regression model and assumptions made about the errors?

One objective of this analysis is to obtain an appropriate simple linear regression model that can be used to estimate the average sales based on a single predictor. State your “best” choice based on your conclusions in parts (a)–(d).

Complete the table below, using the regression analysis results of the four simple linear regression models considered in parts (a)–(d). Based on the table entries, would you change your “best” choice from part (e).

Model predictor

S

R2

t-stat

sqft

110.75

66.44%

8.32

adv_cost

inventory

distance

A model including the predictor variable adv_cost is of specific interest. Obtain appropriate residual plots and determine if adding either district_size or storecount as an additional predictor to the SLR model with predictor adv_cost is likely to improve its fit.

Solutions

Expert Solution

a)Model 1: The predictor variable sqft to predict sales

Call:

lm(formula = sales ~ sqft, data = data)

Residuals:

     Min       1Q   Median       3Q      Max

-200.740 -80.410    7.266   47.567 277.668

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -921.65     145.55 -6.332 2.82e-07 ***

sqft          760.94      91.41   8.324 8.16e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 110.8 on 35 degrees of freedom

Multiple R-squared: 0.6644,   Adjusted R-squared: 0.6548

F-statistic: 69.29 on 1 and 35 DF, p-value: 8.161e-10

RESIDUAL PLOTS

In general,The residual plot should be symmetric around zero

  • From the plot 1(residuals vs fitted plot),we can see that there is no trend and the points follow a horizontal band pattern. thus, the error term has a constant variance. this assumption is not violated.
  • From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
  • From the plot3(Residual vs Order plot), there is an no drift in the process.

b)Model 2: The predictor variable adv_cost to predict sales

Call:

lm(formula = sales ~ adv_cost, data = data)

Residuals:

    Min      1Q Median      3Q     Max

-147.37 -56.78 -17.33   40.86 265.56

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -72.712     37.047 -1.963   0.0577 .

adv_cost      43.458      4.145 10.486 2.42e-12 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 93.94 on 35 degrees of freedom

Multiple R-squared: 0.7585,   Adjusted R-squared: 0.7516

F-statistic: 109.9 on 1 and 35 DF, p-value: 2.417e-12

  • From the plot 1(residuals vs fitted plot),we can see that there is a slight increasing U-pattern but it is symmetric about zero, thus, the error term has a constant variance. this assumption is not violated.
  • From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
  • From the plot3(Residual vs Order plot), there is an no drift in the process.

c)Model 3: The predictor variable inventory to predict sales

Call:

lm(formula = sales ~ inventory, data = data)

Residuals:

     Min       1Q   Median       3Q      Max

-195.594 -46.620   -0.477   37.107 142.349

Coefficients:

             Estimate Std. Error t value Pr(>|t|)   

(Intercept) 45.524456 18.640588   2.442   0.0198 *

inventory    0.135367   0.008634 15.679   <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 67.49 on 35 degrees of freedom

Multiple R-squared: 0.8754,   Adjusted R-squared: 0.8718

F-statistic: 245.8 on 1 and 35 DF, p-value: < 2.2e-16

  • From the plot 1(residuals vs fitted plot),,we can see that there is a slight decreasing U-pattern but it is symmetric about zero, thus, the error term has a constant variance. this assumption is not violated.
  • From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
  • From the plot3(Residual vs Order plot), there is an no drift in the process.

d)Model 4: The predictor variable distance to predict sales

Call:

lm(formula = sales ~ distance, data = data)

Residuals:

    Min      1Q Median      3Q     Max

-48.421 -21.467 -0.902 24.457 45.440

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -82.838      9.862   -8.40 6.59e-10 ***

distance      32.659      0.791   41.29 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27.12 on 35 degrees of freedom

Multiple R-squared: 0.9799,   Adjusted R-squared: 0.9793

F-statistic: 1705 on 1 and 35 DF, p-value: < 2.2e-16

  • From the plot 1(residuals vs fitted plot),we can see that there is no trend and the points follow a horizontal band pattern. thus, the error term has a constant variance. this assumption is not violated.
  • From the plot2 (Normal Probabilty plot), we can observe that the errors are not normally distributed. Plot 4 also shows the same. the normality assumption is violated.
  • From the plot3(Residual vs Order plot), there is an no drift in the process.

e)From the residual plots of the above model, model 4.i.e., the model with the predictor variable distance seems to the best model.

Model Predictor

Sum of Squares

R-Square

t-stat

Sqft

110.75

66.44%

8.32

adv_cost

93.94

75.85%

10.486

inventory

67.49

87.54%

15.679

distance

27.12

97.99%

41.29

Based on the above table, among the four models the model with the predictor variable distance is the best model(R-squared value is 97..99%)

For the model including the predictor variable adv_cost and district_size to predict sales gives the following results

Call:
lm(formula = sales ~ adv_cost + district_size, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-130.830  -42.406    0.737   45.658  136.643 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -355.9997    57.8682  -6.152 5.47e-07 ***
adv_cost        40.9861     3.0788  13.312 4.83e-15 ***
district_size    3.4468     0.6213   5.548 3.33e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.05 on 34 degrees of freedom
Multiple R-squared:  0.8733,    Adjusted R-squared:  0.8658 
F-statistic: 117.1 on 2 and 34 DF,  p-value: 5.613e-16

Residual plots

The model including the predictor variable adv_cost and store count to predict sales gives the following results

Call:
lm(formula = sales ~ adv_cost + storecount, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-151.68  -55.02  -18.10   41.50  262.08 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -75.89629   45.75969  -1.659    0.106    
adv_cost     43.38473    4.24679  10.216 6.73e-12 ***
storecount    0.08221    0.67404   0.122    0.904    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 95.29 on 34 degrees of freedom
Multiple R-squared:  0.7586,    Adjusted R-squared:  0.7444 
F-statistic: 53.43 on 2 and 34 DF,  p-value: 3.202e-11

Residual plots

By Comparing the above two model, we can say that including store count to the model does not improve the model.

But adding the variable district_size to the model improves the model to a greater extent. This can be seen by the increase in R square value and also from the residual plots.


Related Solutions

Thorton Co. reported the following data at year end. Sales, $500 000; beginning inventory,$40 000; ending...
Thorton Co. reported the following data at year end. Sales, $500 000; beginning inventory,$40 000; ending inventory, $45 000; cost of goods sold, $350 000; and gross margin, $150 000. What was the amount of merchandise purchased during the year? $370 000 $355 000 $348 000 $341 000 None of the above If a current ratio has been increasing over the past several years, which of these would cause the ratio to rise? A decrease in accounts payable An increase...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT