Question

In: Math

sales sqft adv_cost inventory distance district_size storecount 231 1.47 7.62 897 10.9 79.48 40 232 1.53...

sales	sqft	adv_cost	inventory	distance	district_size	storecount
231	1.47	7.62	897	10.9	79.48	40
232	1.53	9.57	892	9.4	51.154	12
156	1.68	8.37	542	7.9	60.358	41
157	1.355	6.73	552	6.8	55.561	68
10	1.33	1.66	242	3.5	89.624	14
10	1.33	1.17	235	3.6	86.898	62
519	1.89	12.96	3670	18.5	108.857	56
520	1.885	12.02	3657	19.1	100.685	75
437	1.7	12.29	3345	17.4	90.138	59
487	1.86	12.5	3322	16.5	111.284	22
299	1.4	9.86	1784	11.5	75.606	26
195	1.63	7.22	1230	9.8	64.245	27
20	1.24	5.23	483	2.4	55.929	11
68	1.51	3.93	114	4.5	73.187	33
428	1.78	11.04	2829	16.4	101.192	51
429	1.725	9.43	3410	15.7	80.694	16
464	1.72	12.19	2873	15.8	105.254	84
15	1.2	1.17	289	3.2	80.937	31
65	1.47	6.56	292	3.9	80.187	97
66	1.51	5.55	312	3.8	85.897	66
98	1.24	5.79	235	6.4	90.219	75
338	1.65	3.34	1160	12.1	121.988	84
249	1.513	2.23	1184	9.7	115.277	12
161	1.4	6.95	399	7.9	50.188	14
467	1.46	13.17	2062	16.1	101.211	89
398	1.84	11.68	2103	15.9	95.406	49
497	1.68	12.11	2743	18	80.195	14
528	1.94	10.98	3779	18	110.025	58
529	1.765	11.11	3916	18.9	103.26	52
99	1.31	4.35	782	4.8	111.732	52
100	1.525	3.79	804	4.7	99.7	41
1	1.45	4.68	1116	3.4	85.882	50
347	1.65	10.08	2223	13.4	94.181	49
348	1.811	7.87	2180	12.1	95.242	50
341	1.64	10.34	1494	14.3	70.693	28
557	1.66	13.55	3522	18.5	94.329	43
508	1.698	11.53	3521	16.7	99.917	50

In the “HomeSales” dataset, the response variable, sales, depends on six potential predictor variables, sq_ft, adv_cost, inventory, distance, district_size, and storecount. Fit four simple linear regression (SLR) models corresponding to the four predictors, sq_ft, adv_cost, inventory, and distance. Then, for each model, create a normal probability plot and a histogram for the residuals, together with the two residual scatterplots: residuals vs. fitted values and residuals vs. observation order.

What do the residual plots for the model with sq_ft as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with adv_cost as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with inventory as the predictor indicate about the validity of this regression model and assumptions made about the errors?

What do the residual plots for the model with distance as the predictor indicate about the validity of this regression model and assumptions made about the errors?

One objective of this analysis is to obtain an appropriate simple linear regression model that can be used to estimate the average sales based on a single predictor. State your “best” choice based on your conclusions in parts (a)–(d).

Complete the table below, using the regression analysis results of the four simple linear regression models considered in parts (a)–(d). Based on the table entries, would you change your “best” choice from part (e).

Model predictor	S	R²	t-stat
sqft	110.75	66.44%	8.32
adv_cost
inventory
distance

A model including the predictor variable adv_cost is of specific interest. Obtain appropriate residual plots and determine if adding either district_size or storecount as an additional predictor to the SLR model with predictor adv_cost is likely to improve its fit.

Expert Solution

a)Model 1: The predictor variable sqft to predict sales

Call:

lm(formula = sales ~ sqft, data = data)

Residuals:

Min 1Q Median 3Q Max

-200.740 -80.410 7.266 47.567 277.668

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -921.65 145.55 -6.332 2.82e-07 ***

sqft 760.94 91.41 8.324 8.16e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 110.8 on 35 degrees of freedom

Multiple R-squared: 0.6644, Adjusted R-squared: 0.6548

F-statistic: 69.29 on 1 and 35 DF, p-value: 8.161e-10

RESIDUAL PLOTS

In general,The residual plot should be symmetric around zero

From the plot 1(residuals vs fitted plot),we can see that there is no trend and the points follow a horizontal band pattern. thus, the error term has a constant variance. this assumption is not violated.
From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
From the plot3(Residual vs Order plot), there is an no drift in the process.

b)Model 2: The predictor variable adv_cost to predict sales

Call:

lm(formula = sales ~ adv_cost, data = data)

Residuals:

Min 1Q Median 3Q Max

-147.37 -56.78 -17.33 40.86 265.56

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -72.712 37.047 -1.963 0.0577 .

adv_cost 43.458 4.145 10.486 2.42e-12 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 93.94 on 35 degrees of freedom

Multiple R-squared: 0.7585, Adjusted R-squared: 0.7516

F-statistic: 109.9 on 1 and 35 DF, p-value: 2.417e-12

From the plot 1(residuals vs fitted plot),we can see that there is a slight increasing U-pattern but it is symmetric about zero, thus, the error term has a constant variance. this assumption is not violated.
From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
From the plot3(Residual vs Order plot), there is an no drift in the process.

c)Model 3: The predictor variable inventory to predict sales

Call:

lm(formula = sales ~ inventory, data = data)

Residuals:

Min 1Q Median 3Q Max

-195.594 -46.620 -0.477 37.107 142.349

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 45.524456 18.640588 2.442 0.0198 *

inventory 0.135367 0.008634 15.679 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 67.49 on 35 degrees of freedom

Multiple R-squared: 0.8754, Adjusted R-squared: 0.8718

F-statistic: 245.8 on 1 and 35 DF, p-value: < 2.2e-16

From the plot 1(residuals vs fitted plot),,we can see that there is a slight decreasing U-pattern but it is symmetric about zero, thus, the error term has a constant variance. this assumption is not violated.
From the plot2 (Normal Probabilty plot), we can observe that the errors are normally distributed. Plot 4 also shows the same
From the plot3(Residual vs Order plot), there is an no drift in the process.

d)Model 4: The predictor variable distance to predict sales

Call:

lm(formula = sales ~ distance, data = data)

Residuals:

Min 1Q Median 3Q Max

-48.421 -21.467 -0.902 24.457 45.440

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -82.838 9.862 -8.40 6.59e-10 ***

distance 32.659 0.791 41.29 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27.12 on 35 degrees of freedom

Multiple R-squared: 0.9799, Adjusted R-squared: 0.9793

F-statistic: 1705 on 1 and 35 DF, p-value: < 2.2e-16

From the plot 1(residuals vs fitted plot),we can see that there is no trend and the points follow a horizontal band pattern. thus, the error term has a constant variance. this assumption is not violated.
From the plot2 (Normal Probabilty plot), we can observe that the errors are not normally distributed. Plot 4 also shows the same. the normality assumption is violated.
From the plot3(Residual vs Order plot), there is an no drift in the process.

e)From the residual plots of the above model, model 4.i.e., the model with the predictor variable distance seems to the best model.

Model Predictor	Sum of Squares	R-Square	t-stat
Sqft	110.75	66.44%	8.32
adv_cost	93.94	75.85%	10.486
inventory	67.49	87.54%	15.679
distance	27.12	97.99%	41.29

Based on the above table, among the four models the model with the predictor variable distance is the best model(R-squared value is 97..99%)

For the model including the predictor variable adv_cost and district_size to predict sales gives the following results

Call:
lm(formula = sales ~ adv_cost + district_size, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-130.830  -42.406    0.737   45.658  136.643 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -355.9997    57.8682  -6.152 5.47e-07 ***
adv_cost        40.9861     3.0788  13.312 4.83e-15 ***
district_size    3.4468     0.6213   5.548 3.33e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.05 on 34 degrees of freedom
Multiple R-squared:  0.8733,    Adjusted R-squared:  0.8658 
F-statistic: 117.1 on 2 and 34 DF,  p-value: 5.613e-16

Residual plots

The model including the predictor variable adv_cost and store count to predict sales gives the following results

Call:
lm(formula = sales ~ adv_cost + storecount, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-151.68  -55.02  -18.10   41.50  262.08 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -75.89629   45.75969  -1.659    0.106    
adv_cost     43.38473    4.24679  10.216 6.73e-12 ***
storecount    0.08221    0.67404   0.122    0.904    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 95.29 on 34 degrees of freedom
Multiple R-squared:  0.7586,    Adjusted R-squared:  0.7444 
F-statistic: 53.43 on 2 and 34 DF,  p-value: 3.202e-11

Residual plots

By Comparing the above two model, we can say that including store count to the model does not improve the model.

But adding the variable district_size to the model improves the model to a greater extent. This can be seen by the increase in R square value and also from the residual plots.

milcah answered 1 year ago

Thorton Co. reported the following data at year end. Sales, $500 000; beginning inventory,$40 000; ending...

Thorton Co. reported the following data at year end. Sales, $500 000; beginning inventory,$40 000; ending inventory, $45 000; cost of goods sold, $350 000; and gross margin, $150 000. What was the amount of merchandise purchased during the year? $370 000 $355 000 $348 000 $341 000 None of the above If a current ratio has been increasing over the past several years, which of these would cause the ratio to rise? A decrease in accounts payable An increase...

Question

sales sqft adv_cost inventory distance district_size storecount 231 1.47 7.62 897 10.9 79.48 40 232 1.53...

Solutions

Expert Solution

Related Solutions

Thorton Co. reported the following data at year end. Sales, $500 000; beginning inventory,$40 000; ending...