Question

In: Statistics and Probability

Food Lion Inc. would like to investigate the feasibility and future prospects of setting up stores...

Food Lion Inc. would like to investigate the feasibility and future prospects of setting up stores in Denton County. Food Lion has provided you with a sample from a database of household financial variables. They would like you to use regression techniques to predict the monthly expenditure on groceries, of families who either rent or own their homes in Denton. The sample contains 100 records. The various fields in the sample are:

Income1 Þ Annual income of head of household or primary wage earner

Income2 Þ Annual income of secondary wage earner

Famlsize Þ Size of family (number of people permanently residing in the household)

Ownorent Þ 1 if household is owned; 0 if it is rented

Autodebt Þ Automobile related debt pending for wage earners in the household

Hpayrent Þ Household mortgage payment or rent per month

Groc Þ Monthly expenditure on groceries

Loc Þ 1 = East Denton (E); 0 = West Denton(W); -1 = North Denton (N); 2 = South Denton (S)

For this assignment you will be given minimal instructions since, having completed the first three assignments, you should now be quite familiar with Excel functions and pull down menus.

1. The sample data are contained in a file named Assgt#4.xls to be found in the folder Excel Assignments for All Sections. Download the files from the folder onto a disk.

2. Import the data into the first nine columns (A-I) of your spreadsheet.   The first row contains labels for the variables in the order outlined in the data description. The table below illustrates how the first 10 data records should look, after this step is completed.

Obs

Income1

Income2

Famlsize

Ownorent

Autodebt

Hpayrent

Groc

Loc

1

36557

20610

4

0

15290

1339

278

0

2

27045

25490

5

0

14676

1175

220

0

3

38878

0

8

0

10317

1108

456

0

4

41448

0

2

1

9504

729

253

0

5

33136

25300

6

1

17802

875

344

0

6

44308

24559

2

1

20537

1282

326

-1

7

31997

25419

4

1

11725

919

308

1

8

43437

0

6

1

12084

970

311

2

9

41625

22802

4

1

14863

945

305

0

10

40140

32158

5

0

15708

1470

432

2

3. Insert three columns prior to the column labeled “Groc”. In these columns, create three logical (dummy) variables to represent the four locations. You may use the IF function from the pull down menu, to do this. Please follow the illustration below, where: ED=1, if record is for East Denton; WD=1, if record is for West Denton; and SD=1, if the record is for South Denton (thus North Denton representing the base). After the creation of the dummy variables, the first 11 records should look as shown below.

Obs

Income1

Income2

Famlsize

Ownrent

Autodebt

Hpayrent

ED

WD

SD

Groc

1

36557

20610

4

0

15290

1339

0

1

0

278

2

27045

25490

5

0

14676

1175

0

1

0

220

3

38878

0

8

0

10317

1108

0

1

0

456

4

41448

0

2

1

9504

729

0

1

0

253

5

33136

25300

6

1

17802

875

0

1

0

344

6

44308

24559

2

1

20537

1282

0

0

0

326

7

31997

25419

4

1

11725

919

1

0

0

308

8

43437

0

6

1

12084

970

0

0

1

311

9

41625

22802

4

1

14863

945

0

1

0

305

10

40140

32158

5

0

15708

1470

0

0

1

432

11

31448

0

4

0

6160

985

1

0

0

259

4. Conduct the regression analysis with Groc as the dependent variable (Y) and Income1 (X1), Income2 (X2), Ownorent (X3), Autodebt (X4), Famlsize (X5), and Location variables (X6 through X8) as the independent variables at 1% level of significance.   Use the Regression tool accessed from the Data Tab/ Data Analysis (in 2003 use Tools/Data Analysis) pull-down menu. Check only the labels box and specify that you want the output in cell M1. (Make sure that you enter “Confidence Level = 99%”).   Also, make sure to check the Standardized residuals box to obtain outlier information. A partial output is shown below for your guidance.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.779966

R Square

xxxx

Adjusted R Square

0.5739159

Standard Error

xxxx

Observations

100

ANOVA

df

SS

MS

F

Significance F

Regression

8

760733.044

95091.6

xxxx

xxxx

Residual

91

xxxx

xxxx

Total

99

xxxx

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 99.0%

Upper 99.0%

Intercept

59.847981

43.87639859

1.364

0.176

-27.307103

147.0031

-55.58863

175.284587

Income1

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Income2

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Ownrent

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Autodebt

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

Famlsize

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

ED

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

WD

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

SD

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

xxxx

4. By substituting appropriate values directly into the sample regression equation given in your output generate a point estimate for (Alternatively, you may use the TREND function from the pull down menu, and follow directions given in Excel or use the KPK macro to run regression and for prediction as shown in class):

a) the monthly Grocery Bill for a family of 4 living in a rented home in South Denton, whose primary income is $42,457 and with the secondary wage earner having an income of $10000, and Autodebt is $6,000

b) the monthly Grocery Bill for a family of size 7 living in a home in the North Denton that is owned, whose primary income is $35,000 with the secondary wage earner having an income of $25,000, and Autodebt is $10,000.

Details on the trend function can be found by clicking the fx icon on the toolbar then selecting statistical and trend. The new values for Income1, Income2, Ownorent, Autodebt, Famlsize, and Location variables can be placed in convenient cells in the two rows below your regression output and the point estimates for Groc evaluated in the adjacent cells.

Use the output from the regression analysis and trend function to answer the following questions.

A. Write one to three sentences to interpret the meaning of your model’s R-square value.  

B. Conduct an F test for the regression model containing the eight independent variables, at the 1% significance level. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.

C. Conduct a t-test, at the 1% significance level, for the usefulness of the dummy (indicator) variable Ownorent. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.

D. Conduct a t-test, at the 1% significance level, for the usefulness of the variable Income2. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.

E. State which of the eight X (independent) variables are statistically significant and which ones are not, at the 1% significance level.

F. Write one or two sentences on the interpretation of the coefficient of each of the six independent variables, making specific reference to the value of each coefficient given by your model.

Experiential Exercise

As you work this assignment think about the following questions. Then form a team of 3 to 5 and discuss each of the following. You can engage in this discussion by meeting or your group can use a Wiki to engage in an online discussion. Instructions for setting up a Wiki are provided on the excel assignment page of our course web site.

What wording tells you the alternative and null hypothesis?

What wording tells you the type of statistical test to perform? For example, is a z or t statistic appropriate?

What wording tells you that this is a one or two tail hypothesis test?

What were the steps you used to obtain the calculated value of the test statistic?

What were the steps you used to obtain the critical value of the test statistic?

How do you use the calculated and critical value to make a statistical decision about this test?

How do you obtain the p value for the test statistic?

How do you use the p value and the level of significance to make a statistical decision about this test?

How does the result of your test relate to the statistical significance of your findings?

What managerial implications can you conclude from the results of your test?

To be ready for the “Excel Quiz 43” HLS Web Test you should prepare the following:

1. A printout of the regression analysis

2. The results of the prediction/estimation of Groc (computed directly or by using the trend function) in the two rows immediately below the regression output

3. Answers for questions A-F

4. Clearly labeled or highlighted parts of the output that pertain to each answer for ques. A-F.

The Excel assignments are each graded via a short Excel Quiz in HLS Web Test that is open for about 48 hours as listed in the syllabus and in your HLS progress report. You are expected to use your output and written answers to complete the quiz. You are not required to turn in the output.

The questions below are much like the quiz you will have in WEBTEST. If you can answer these, you should have no difficulty with those that will be asked. However, these questions are not the exact questions that you will have to answer.

SAMPLE WEBTEST QUIZ: The correct answers to the sample questions are highlighted.

1. What is the p-value of the test statistic for the global F test for the regression model?

A. 0.444              B. 0.0000                C. 0.9650                   D. 50              E. 2.0696

2. What is the calculated value of the test statistic to conduct the test for the usefulness of the Autodebt variable?

A. 118.98            B. 0.9650                C. 0.234                    D. -1.314         E. 0.0079

3. The best interpretation of R-square for this regression analysis is

60.8 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8

77.80 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8

57.4 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8

72.84 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8

95 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8

4. The result of the t-test for the usefulness of the Income2 (X2) variable, at a = .05 is to

F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is not useful for predicting Y

F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is useful for predicting Y

Reject Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y

F.T.R. Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y

Reject Ho, since the p-value is less than .05. Conclude X1 is useful for predicting Y

5. What is the coefficient for Income2 in the sample regression equation?

1. 20.10           B. -1.2504       C. 0.00104      D. 118.97        E. 0.0115                                

Obs Income1 Income2 Famlsize Ownrent Autodebt Hpayrent Groc Loc
1 36557 20610 4 0 15290 1339 278 0
2 27045 25490 5 0 14676 1175 220 0
3 38878 0 8 0 10317 1108 456 0
4 41448 0 2 1 9504 729 253 0
5 33136 25300 6 1 17802 875 344 0
6 44308 24559 2 1 20537 1282 326 -1
7 31997 25419 4 1 11725 919 308 1
8 43437 0 6 1 12084 970 311 2
9 41625 22802 4 1 14863 945 305 0
10 40140 32158 5 0 15708 1470 432 2
11 31448 0 4 0 6160 985 259 1
12 44850 14878 6 1 16528 1211 333 -1
13 29334 0 2 0 7979 300 114 0
14 36579 23304 5 0 15442 1234 342 0
15 26977 20265 1 1 13920 274 184 2
16 35050 0 2 0 10538 993 174 0
17 46469 0 4 1 11580 1055 234 2
18 38446 22976 4 1 16095 981 258 0
19 45135 24819 5 1 19338 776 209 0
20 52240 0 8 1 12967 1394 454 1
21 56264 0 1 1 15849 746 160 2
22 31536 0 7 0 6965 1519 419 -1
23 36924 0 4 0 11799 690 184 -1
24 45540 0 3 0 8307 854 232 -1
25 28004 0 4 1 10336 1034 267 1
26 34971 0 4 1 15163 450 278 2
27 35327 0 6 0 6984 1057 347 -1
28 55231 0 5 0 13523 1039 367 0
29 34969 22296 2 1 12752 593 254 1
30 35766 0 1 0 10175 376 103 0
31 35716 16419 5 1 10227 1073 208 2
32 32575 0 6 0 10550 1352 393 2
33 41154 0 1 0 6096 713 150 2
34 30909 0 2 1 12811 515 191 -1
35 40450 0 4 0 9177 479 259 -1
36 42236 15492 3 0 15556 1719 313 2
37 39602 22935 1 0 15197 639 140 -1
38 24029 0 2 0 3651 834 265 -1
39 28913 0 3 1 6628 674 273 2
40 32005 0 2 1 10740 770 196 1
41 31342 0 4 0 6040 502 212 0
42 38429 22218 2 1 17750 907 257 2
43 50319 0 4 1 13279 934 187 2
44 31862 13799 3 1 9958 790 243 1
45 29308 13975 6 1 11597 1414 356 2
46 38392 18192 2 1 10501 870 155 -1
47 37146 0 6 1 8017 1145 284 -1
48 40817 23313 4 1 15722 1129 246 2
49 41019 0 4 0 11958 958 200 2
50 40040 25261 1 0 16238 875 112 1
51 36804 20519 4 0 15693 1340 290 -1
52 27033 25809 4 0 14649 1245 261 2
53 38861 0 10 0 9907 1180 596 0
54 41124 0 4 1 9711 675 341 -1
55 32837 25304 5 0 18210 913 236 -1
56 44243 24931 5 1 20602 1277 334 1
57 32116 25189 2 0 11973 963 390 2
58 43416 0 7 0 11882 899 223 -1
59 41669 22817 3 1 15011 827 155 -1
60 40407 32077 6 0 15329 1388 386 0
61 31147 0 4 0 5550 1011 165 0
62 44675 0 7 0 16936 1226 398 0
63 29445 0 3 1 8597 201 50 -1
64 36841 23274 3 1 15391 1420 341 2
65 26867 0 1 0 13564 225 144 0
66 34782 0 3 0 10345 972 168 0
67 46381 0 3 0 11409 1224 210 0
68 38261 22711 7 0 16434 1009 307 2
69 45194 24467 7 0 19378 957 677 1
70 52465 0 8 0 12420 1504 482 -1
71 55837 0 2 0 15990 665 193 -1
72 30990 0 7 0 6698 1416 366 1
73 36748 0 4 0 12120 853 202 0
74 45582 0 2 0 8575 1008 217 2
75 28123 17998 5 0 9960 1027 294 -1
76 34742 19100 5 0 14953 516 437 -1
77 35370 0 6 0 6786 1028 326 0
78 55270 0 6 0 13284 1115 346 1
79 35389 22421 2 1 12638 660 237 1
80 35821 0 2 0 10068 284 179 -1
81 35786 16557 5 0 10675 1072 356 1
82 32597 0 7 0 10432 1421 455 0
83 41172 0 1 0 6459 823 154 0
84 30620 0 3 1 12685 437 196 2
85 40369 0 4 0 9165 559 162 1
86 42289 15832 6 0 15152 1718 344 1
87 39343 23292 1 0 15137 687 87 -1
88 23937 0 4 0 3444 1003 278 2
89 29025 0 3 1 7263 655 234 2
90 31807 0 4 1 10267 693 400 1
91 31540 0 4 1 5846 564 266 1
92 38302 22313 4 1 17748 904 295 1
93 50375 0 6 1 13274 954 456 1
94 31925 13705 3 0 8818 745 256 -1
95 29371 14038 6 0 12020 1478 487 -1
96 38463 17978 4 0 10496 901 118 1
97 37039 0 8 0 8203 1086 566 0
98 40895 23136 3 1 15598 1038 262 2
99 41161 0 4 0 12087 1133 225 -1
100 39976 25391 3 0 16616 1072 235 2

Solutions

Expert Solution

We want to build the Regression Model to predict the Expenditure of people on the groceries by using the monthly incomes of the peoples and other factors.

By using R:

Regression Model:

> Model = lm(Groc~.,data = data)
> Model

Call:
lm(formula = Groc ~ ., data = data)

Coefficients:
(Intercept) Income1 Income2 Famlsize Ownrent Autodebt  
52.1572271 -0.0001095 0.0006143 39.7482604 3.4121354 0.0009417  
Hpayrent Loc  
0.0491947 5.3885533  

From above output the Fitted model is,

Groc = 52.1572271 - 0.0001095 * Income1 + 0.0006143 * Income2 + 39.7482604 * Famlsize + 3.4121354 * Ownrent + 0.0009417 * Autodebt + 0.0491947 * Hpayrent + 5.3885533 * Loc  

i.e. from the above model we can say that if you are primary wage earner then your expenditure decreses by 0.0001095 and if you are secodary earner then your expenditure is incresed by 0.0006143.

ANOVA table:

> summary(Model)

Coefficients: Estimate Std. Error t value Pr(>|t|)   

(Intercept) 52.1572271 42.2783823 1.234 0.220   

Income1 -0.0001095 0.0013181 -0.083 0.934   

Income2 0.0006143 0.0010628 0.578 0.565   

Famlsize 39.7482604 4.7320222 8.400 5.27e-13 ***

Ownrent 3.4121354 16.3228969 0.209 0.835   

Autodebt 0.0009417 0.0032897 0.286 0.775   

Hpayrent 0.0491947 0.0305795 1.609 0.111   

Loc 5.3885533 6.7730375 0.796 0.428   

Residual standard error: 72.23 on 92 degrees of freedom

Multiple R-squared: 0.6161,

Adjusted R-squared: 0.5869

F-statistic: 21.1 on 7 and 92 DF,

p-value: < 2.2e-16

From the above summary,

Adjusted R square = 0.5869

i.e. Model is the best fit.

F Statistics: 21.10

p-value: 2.2e-16

Test of Hypothesis:

H0: The model is does not effective.

against,

H1: The model is effective.

Decision Rule: If p-value greater than 0.05 level of significance then we accept the
null hypothesis.
From above results p-value(2.2e-16) < 0.05
we reject the null hypothesis here

i.e. So Model is the best Fit.

>>>>>>>>>>>>>>>>>> Best Luck >>>>>>>>>>>>>>>>>>>


Related Solutions

Macon Inc. is looking at setting up a manufacturing plant in Kirksville. This will be a...
Macon Inc. is looking at setting up a manufacturing plant in Kirksville. This will be a 6-year project. The company bought the land in Kirksville last year for $1.5 million in anticipation of using it as a warehouse. The land is now appraised for $1.7 million after-tax. In 6 years, the after-tax value of the land is estimated for $2 million. The plant and equipment will cost $9.3 million to build and will be depreciated straight-line to zero over the...
Suppose a government department would like to investigate the relationship between the cost of heating a...
Suppose a government department would like to investigate the relationship between the cost of heating a home during the month of February in the Northeast and the​home's square footage. The accompanying data set shows a random sample of 10 homes. Construct a​ 90% confidence interval to estimate the average cost in February to heat a Northeast home that is 2,900 square feet. Heating Square Heating Square Cost​ ($) Footage Cost​ ($) Footage 330 2,410 440 2,610 300 2,410 340 2,210...
Suppose a government department would like to investigate the relationship between the cost of heating a...
Suppose a government department would like to investigate the relationship between the cost of heating a home during the month of February in the Northeast and the​home's square footage. The accompanying data set shows a random sample of 10 homes. Construct a​ 90% confidence interval to estimate the average cost in February to heat a Northeast home that is 2,900 square feet. Heating Square Heating Square Cost​ ($) Footage Cost​ ($) Footage 330 2,410 440 2,610 300 2,410 340 2,210...
Suppose a government department would like to investigate the relationship between the cost of heating a...
Suppose a government department would like to investigate the relationship between the cost of heating a home during the month of February in the Northeast and the​ home's square footage. The accompanying data set shows a random sample of 10 homes. Construct a​ 90% confidence interval to estimate the average cost in February to heat a Northeast home that is 3,000 square feet. Heating Square Heating Square Cost​ ($) Footage Cost​ ($) Footage 320 2,420 440 2,610 300 2,420 330...
Suppose a government department would like to investigate the relationship between the cost of heating a...
Suppose a government department would like to investigate the relationship between the cost of heating a home during the month of February in the Northeast and the home's square footage. The accompanying data set shows a random sample of 10 homes. Construct a 90% confidence interval to estimate the average cost in February to heat a Northeast home that is 2,200 square feet. Heating Square Heating Square Cost​ ($) Footage Cost​ ($) Footage 330 2,420 450 2,610 280 2,430 320...
Suppose a government department would like to investigate the relationship between the cost of heating a...
Suppose a government department would like to investigate the relationship between the cost of heating a home during the month of February in the Northeast and the​ home's square footage. The accompanying data set shows a random sample of 10 homes. Construct a​ 90% prediction interval to estimate the cost in February to heat a Northeast home that is 3,100 square feet. Heating Cost($)   Square_Footage 340 2430 300 2430 300 2020 260 2220 310 2310 460 2630 330 2210 400...
The head of Health Services at Goodheart University (GU) would like to investigate the proportion of...
The head of Health Services at Goodheart University (GU) would like to investigate the proportion of smokers at GU. In order to do so, the head of Health Services chooses a random sample of 500 Goodheart students, and finds that 80 of them are smokers. Construct a 90% confidence interval for the proportion of smokers at Goodheart University.
An electronics retailer would like to investigate the relationship between the selling price of a digital...
An electronics retailer would like to investigate the relationship between the selling price of a digital camera and the demand for it. The table shown below gives the weekly demand for the camera in one particular market along with the corresponding price. These data have a sample correlation​ coefficient, rounded to three decimal​ places, of -0.930. Using a significance level of 0.10, test if the population correlation coefficient between the selling price and the demand for the camera is less...
The CIO of an IT company would like to investigate how a software developer’s work experience...
The CIO of an IT company would like to investigate how a software developer’s work experience (in number of years), professional certifications (number of certificates), and knowledge of various computer languages (number of programming languages) contribute to his/her work performance. The work performance is measured on the scale of 1 to 1000 so that the higher one’s score, the better his/her work performance. He collects data from 20 software developers in his company Years of experience Num of certificates Num...
An Internet retailer would like to investigate the relationship between the amount of time in minutes...
An Internet retailer would like to investigate the relationship between the amount of time in minutes a purchaser spends on its Web site and the amount of money he or she spends on an order. The table to the right shows the data from a random sample of 12 customers. Construct a 90​% confidence interval for the regression slope. Construct a 90​% confidence interval for the slope. LCL equals nothing and UC Lequals nothing ​(Round to three decimal places as​...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT