In: Statistics and Probability
Food Lion Inc. would like to investigate the feasibility and future prospects of setting up stores in Denton County. Food Lion has provided you with a sample from a database of household financial variables. They would like you to use regression techniques to predict the monthly expenditure on groceries, of families who either rent or own their homes in Denton. The sample contains 100 records. The various fields in the sample are:
Income1 Þ Annual income of head of household or primary wage earner
Income2 Þ Annual income of secondary wage earner
Famlsize Þ Size of family (number of people permanently residing in the household)
Ownorent Þ 1 if household is owned; 0 if it is rented
Autodebt Þ Automobile related debt pending for wage earners in the household
Hpayrent Þ Household mortgage payment or rent per month
Groc Þ Monthly expenditure on groceries
Loc Þ 1 = East Denton (E); 0 = West Denton(W); -1 = North Denton (N); 2 = South Denton (S)
For this assignment you will be given minimal instructions since, having completed the first three assignments, you should now be quite familiar with Excel functions and pull down menus.
1. The sample data are contained in a file named Assgt#4.xls to be found in the folder Excel Assignments for All Sections. Download the files from the folder onto a disk.
2. Import the data into the first nine columns (A-I) of your spreadsheet. The first row contains labels for the variables in the order outlined in the data description. The table below illustrates how the first 10 data records should look, after this step is completed.
Obs |
Income1 |
Income2 |
Famlsize |
Ownorent |
Autodebt |
Hpayrent |
Groc |
Loc |
1 |
36557 |
20610 |
4 |
0 |
15290 |
1339 |
278 |
0 |
2 |
27045 |
25490 |
5 |
0 |
14676 |
1175 |
220 |
0 |
3 |
38878 |
0 |
8 |
0 |
10317 |
1108 |
456 |
0 |
4 |
41448 |
0 |
2 |
1 |
9504 |
729 |
253 |
0 |
5 |
33136 |
25300 |
6 |
1 |
17802 |
875 |
344 |
0 |
6 |
44308 |
24559 |
2 |
1 |
20537 |
1282 |
326 |
-1 |
7 |
31997 |
25419 |
4 |
1 |
11725 |
919 |
308 |
1 |
8 |
43437 |
0 |
6 |
1 |
12084 |
970 |
311 |
2 |
9 |
41625 |
22802 |
4 |
1 |
14863 |
945 |
305 |
0 |
10 |
40140 |
32158 |
5 |
0 |
15708 |
1470 |
432 |
2 |
3. Insert three columns prior to the column labeled “Groc”. In these columns, create three logical (dummy) variables to represent the four locations. You may use the IF function from the pull down menu, to do this. Please follow the illustration below, where: ED=1, if record is for East Denton; WD=1, if record is for West Denton; and SD=1, if the record is for South Denton (thus North Denton representing the base). After the creation of the dummy variables, the first 11 records should look as shown below.
Obs |
Income1 |
Income2 |
Famlsize |
Ownrent |
Autodebt |
Hpayrent |
ED |
WD |
SD |
Groc |
1 |
36557 |
20610 |
4 |
0 |
15290 |
1339 |
0 |
1 |
0 |
278 |
2 |
27045 |
25490 |
5 |
0 |
14676 |
1175 |
0 |
1 |
0 |
220 |
3 |
38878 |
0 |
8 |
0 |
10317 |
1108 |
0 |
1 |
0 |
456 |
4 |
41448 |
0 |
2 |
1 |
9504 |
729 |
0 |
1 |
0 |
253 |
5 |
33136 |
25300 |
6 |
1 |
17802 |
875 |
0 |
1 |
0 |
344 |
6 |
44308 |
24559 |
2 |
1 |
20537 |
1282 |
0 |
0 |
0 |
326 |
7 |
31997 |
25419 |
4 |
1 |
11725 |
919 |
1 |
0 |
0 |
308 |
8 |
43437 |
0 |
6 |
1 |
12084 |
970 |
0 |
0 |
1 |
311 |
9 |
41625 |
22802 |
4 |
1 |
14863 |
945 |
0 |
1 |
0 |
305 |
10 |
40140 |
32158 |
5 |
0 |
15708 |
1470 |
0 |
0 |
1 |
432 |
11 |
31448 |
0 |
4 |
0 |
6160 |
985 |
1 |
0 |
0 |
259 |
4. Conduct the regression analysis with Groc as the dependent variable (Y) and Income1 (X1), Income2 (X2), Ownorent (X3), Autodebt (X4), Famlsize (X5), and Location variables (X6 through X8) as the independent variables at 1% level of significance. Use the Regression tool accessed from the Data Tab/ Data Analysis (in 2003 use Tools/Data Analysis) pull-down menu. Check only the labels box and specify that you want the output in cell M1. (Make sure that you enter “Confidence Level = 99%”). Also, make sure to check the Standardized residuals box to obtain outlier information. A partial output is shown below for your guidance.
SUMMARY OUTPUT |
||||||||||||||
Regression Statistics |
||||||||||||||
Multiple R |
0.779966 |
|||||||||||||
R Square |
xxxx |
|||||||||||||
Adjusted R Square |
0.5739159 |
|||||||||||||
Standard Error |
xxxx |
|||||||||||||
Observations |
100 |
|||||||||||||
ANOVA |
||||||||||||||
df |
SS |
MS |
F |
Significance F |
||||||||||
Regression |
8 |
760733.044 |
95091.6 |
xxxx |
xxxx |
|||||||||
Residual |
91 |
xxxx |
xxxx |
|||||||||||
Total |
99 |
xxxx |
||||||||||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 99.0% |
Upper 99.0% |
|||||||
Intercept |
59.847981 |
43.87639859 |
1.364 |
0.176 |
-27.307103 |
147.0031 |
-55.58863 |
175.284587 |
||||||
Income1 |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
Income2 |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
Ownrent |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
Autodebt |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
Famlsize |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
ED |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
WD |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
SD |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
xxxx |
||||||
4. By substituting appropriate values directly into the sample regression equation given in your output generate a point estimate for (Alternatively, you may use the TREND function from the pull down menu, and follow directions given in Excel or use the KPK macro to run regression and for prediction as shown in class):
a) the monthly Grocery Bill for a family of 4 living in a rented home in South Denton, whose primary income is $42,457 and with the secondary wage earner having an income of $10000, and Autodebt is $6,000
b) the monthly Grocery Bill for a family of size 7 living in a home in the North Denton that is owned, whose primary income is $35,000 with the secondary wage earner having an income of $25,000, and Autodebt is $10,000.
Details on the trend function can be found by clicking the fx icon on the toolbar then selecting statistical and trend. The new values for Income1, Income2, Ownorent, Autodebt, Famlsize, and Location variables can be placed in convenient cells in the two rows below your regression output and the point estimates for Groc evaluated in the adjacent cells.
Use the output from the regression analysis and trend function to answer the following questions.
A. Write one to three sentences to interpret the meaning of your model’s R-square value.
B. Conduct an F test for the regression model containing the eight independent variables, at the 1% significance level. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.
C. Conduct a t-test, at the 1% significance level, for the usefulness of the dummy (indicator) variable Ownorent. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.
D. Conduct a t-test, at the 1% significance level, for the usefulness of the variable Income2. State the null and alternative hypotheses, the decision, reason for the decision, and a conclusion.
E. State which of the eight X (independent) variables are statistically significant and which ones are not, at the 1% significance level.
F. Write one or two sentences on the interpretation of the coefficient of each of the six independent variables, making specific reference to the value of each coefficient given by your model.
Experiential Exercise
As you work this assignment think about the following questions. Then form a team of 3 to 5 and discuss each of the following. You can engage in this discussion by meeting or your group can use a Wiki to engage in an online discussion. Instructions for setting up a Wiki are provided on the excel assignment page of our course web site.
What wording tells you the alternative and null hypothesis?
What wording tells you the type of statistical test to perform? For example, is a z or t statistic appropriate?
What wording tells you that this is a one or two tail hypothesis test?
What were the steps you used to obtain the calculated value of the test statistic?
What were the steps you used to obtain the critical value of the test statistic?
How do you use the calculated and critical value to make a statistical decision about this test?
How do you obtain the p value for the test statistic?
How do you use the p value and the level of significance to make a statistical decision about this test?
How does the result of your test relate to the statistical significance of your findings?
What managerial implications can you conclude from the results of your test?
To be ready for the “Excel Quiz 43” HLS Web Test you should prepare the following:
1. A printout of the regression analysis
2. The results of the prediction/estimation of Groc (computed directly or by using the trend function) in the two rows immediately below the regression output
3. Answers for questions A-F
4. Clearly labeled or highlighted parts of the output that pertain to each answer for ques. A-F.
The Excel assignments are each graded via a short Excel Quiz in HLS Web Test that is open for about 48 hours as listed in the syllabus and in your HLS progress report. You are expected to use your output and written answers to complete the quiz. You are not required to turn in the output.
The questions below are much like the quiz you will have in WEBTEST. If you can answer these, you should have no difficulty with those that will be asked. However, these questions are not the exact questions that you will have to answer.
SAMPLE WEBTEST QUIZ: The correct answers to the sample questions are highlighted.
1. What is the p-value of the test statistic for the global F test for the regression model?
A. 0.444 B. 0.0000 C. 0.9650 D. 50 E. 2.0696
2. What is the calculated value of the test statistic to conduct the test for the usefulness of the Autodebt variable?
A. 118.98 B. 0.9650 C. 0.234 D. -1.314 E. 0.0079
3. The best interpretation of R-square for this regression analysis is
60.8 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8
77.80 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8
57.4 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8
72.84 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X2-X8
95 percent of the total variation in a family's expenditure on groceries is explained by regression on the variables X1-X8
4. The result of the t-test for the usefulness of the Income2 (X2) variable, at a = .05 is to
F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is not useful for predicting Y
F.T.R. Ho, since the p-value is greater than .05. Conclude X2 is useful for predicting Y
Reject Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y
F.T.R. Ho, since the p-value is less than .05. Conclude X1 is not useful for predicting Y
Reject Ho, since the p-value is less than .05. Conclude X1 is useful for predicting Y
5. What is the coefficient for Income2 in the sample regression equation?
1. 20.10 B. -1.2504 C. 0.00104 D. 118.97 E. 0.0115
Obs | Income1 | Income2 | Famlsize | Ownrent | Autodebt | Hpayrent | Groc | Loc |
1 | 36557 | 20610 | 4 | 0 | 15290 | 1339 | 278 | 0 |
2 | 27045 | 25490 | 5 | 0 | 14676 | 1175 | 220 | 0 |
3 | 38878 | 0 | 8 | 0 | 10317 | 1108 | 456 | 0 |
4 | 41448 | 0 | 2 | 1 | 9504 | 729 | 253 | 0 |
5 | 33136 | 25300 | 6 | 1 | 17802 | 875 | 344 | 0 |
6 | 44308 | 24559 | 2 | 1 | 20537 | 1282 | 326 | -1 |
7 | 31997 | 25419 | 4 | 1 | 11725 | 919 | 308 | 1 |
8 | 43437 | 0 | 6 | 1 | 12084 | 970 | 311 | 2 |
9 | 41625 | 22802 | 4 | 1 | 14863 | 945 | 305 | 0 |
10 | 40140 | 32158 | 5 | 0 | 15708 | 1470 | 432 | 2 |
11 | 31448 | 0 | 4 | 0 | 6160 | 985 | 259 | 1 |
12 | 44850 | 14878 | 6 | 1 | 16528 | 1211 | 333 | -1 |
13 | 29334 | 0 | 2 | 0 | 7979 | 300 | 114 | 0 |
14 | 36579 | 23304 | 5 | 0 | 15442 | 1234 | 342 | 0 |
15 | 26977 | 20265 | 1 | 1 | 13920 | 274 | 184 | 2 |
16 | 35050 | 0 | 2 | 0 | 10538 | 993 | 174 | 0 |
17 | 46469 | 0 | 4 | 1 | 11580 | 1055 | 234 | 2 |
18 | 38446 | 22976 | 4 | 1 | 16095 | 981 | 258 | 0 |
19 | 45135 | 24819 | 5 | 1 | 19338 | 776 | 209 | 0 |
20 | 52240 | 0 | 8 | 1 | 12967 | 1394 | 454 | 1 |
21 | 56264 | 0 | 1 | 1 | 15849 | 746 | 160 | 2 |
22 | 31536 | 0 | 7 | 0 | 6965 | 1519 | 419 | -1 |
23 | 36924 | 0 | 4 | 0 | 11799 | 690 | 184 | -1 |
24 | 45540 | 0 | 3 | 0 | 8307 | 854 | 232 | -1 |
25 | 28004 | 0 | 4 | 1 | 10336 | 1034 | 267 | 1 |
26 | 34971 | 0 | 4 | 1 | 15163 | 450 | 278 | 2 |
27 | 35327 | 0 | 6 | 0 | 6984 | 1057 | 347 | -1 |
28 | 55231 | 0 | 5 | 0 | 13523 | 1039 | 367 | 0 |
29 | 34969 | 22296 | 2 | 1 | 12752 | 593 | 254 | 1 |
30 | 35766 | 0 | 1 | 0 | 10175 | 376 | 103 | 0 |
31 | 35716 | 16419 | 5 | 1 | 10227 | 1073 | 208 | 2 |
32 | 32575 | 0 | 6 | 0 | 10550 | 1352 | 393 | 2 |
33 | 41154 | 0 | 1 | 0 | 6096 | 713 | 150 | 2 |
34 | 30909 | 0 | 2 | 1 | 12811 | 515 | 191 | -1 |
35 | 40450 | 0 | 4 | 0 | 9177 | 479 | 259 | -1 |
36 | 42236 | 15492 | 3 | 0 | 15556 | 1719 | 313 | 2 |
37 | 39602 | 22935 | 1 | 0 | 15197 | 639 | 140 | -1 |
38 | 24029 | 0 | 2 | 0 | 3651 | 834 | 265 | -1 |
39 | 28913 | 0 | 3 | 1 | 6628 | 674 | 273 | 2 |
40 | 32005 | 0 | 2 | 1 | 10740 | 770 | 196 | 1 |
41 | 31342 | 0 | 4 | 0 | 6040 | 502 | 212 | 0 |
42 | 38429 | 22218 | 2 | 1 | 17750 | 907 | 257 | 2 |
43 | 50319 | 0 | 4 | 1 | 13279 | 934 | 187 | 2 |
44 | 31862 | 13799 | 3 | 1 | 9958 | 790 | 243 | 1 |
45 | 29308 | 13975 | 6 | 1 | 11597 | 1414 | 356 | 2 |
46 | 38392 | 18192 | 2 | 1 | 10501 | 870 | 155 | -1 |
47 | 37146 | 0 | 6 | 1 | 8017 | 1145 | 284 | -1 |
48 | 40817 | 23313 | 4 | 1 | 15722 | 1129 | 246 | 2 |
49 | 41019 | 0 | 4 | 0 | 11958 | 958 | 200 | 2 |
50 | 40040 | 25261 | 1 | 0 | 16238 | 875 | 112 | 1 |
51 | 36804 | 20519 | 4 | 0 | 15693 | 1340 | 290 | -1 |
52 | 27033 | 25809 | 4 | 0 | 14649 | 1245 | 261 | 2 |
53 | 38861 | 0 | 10 | 0 | 9907 | 1180 | 596 | 0 |
54 | 41124 | 0 | 4 | 1 | 9711 | 675 | 341 | -1 |
55 | 32837 | 25304 | 5 | 0 | 18210 | 913 | 236 | -1 |
56 | 44243 | 24931 | 5 | 1 | 20602 | 1277 | 334 | 1 |
57 | 32116 | 25189 | 2 | 0 | 11973 | 963 | 390 | 2 |
58 | 43416 | 0 | 7 | 0 | 11882 | 899 | 223 | -1 |
59 | 41669 | 22817 | 3 | 1 | 15011 | 827 | 155 | -1 |
60 | 40407 | 32077 | 6 | 0 | 15329 | 1388 | 386 | 0 |
61 | 31147 | 0 | 4 | 0 | 5550 | 1011 | 165 | 0 |
62 | 44675 | 0 | 7 | 0 | 16936 | 1226 | 398 | 0 |
63 | 29445 | 0 | 3 | 1 | 8597 | 201 | 50 | -1 |
64 | 36841 | 23274 | 3 | 1 | 15391 | 1420 | 341 | 2 |
65 | 26867 | 0 | 1 | 0 | 13564 | 225 | 144 | 0 |
66 | 34782 | 0 | 3 | 0 | 10345 | 972 | 168 | 0 |
67 | 46381 | 0 | 3 | 0 | 11409 | 1224 | 210 | 0 |
68 | 38261 | 22711 | 7 | 0 | 16434 | 1009 | 307 | 2 |
69 | 45194 | 24467 | 7 | 0 | 19378 | 957 | 677 | 1 |
70 | 52465 | 0 | 8 | 0 | 12420 | 1504 | 482 | -1 |
71 | 55837 | 0 | 2 | 0 | 15990 | 665 | 193 | -1 |
72 | 30990 | 0 | 7 | 0 | 6698 | 1416 | 366 | 1 |
73 | 36748 | 0 | 4 | 0 | 12120 | 853 | 202 | 0 |
74 | 45582 | 0 | 2 | 0 | 8575 | 1008 | 217 | 2 |
75 | 28123 | 17998 | 5 | 0 | 9960 | 1027 | 294 | -1 |
76 | 34742 | 19100 | 5 | 0 | 14953 | 516 | 437 | -1 |
77 | 35370 | 0 | 6 | 0 | 6786 | 1028 | 326 | 0 |
78 | 55270 | 0 | 6 | 0 | 13284 | 1115 | 346 | 1 |
79 | 35389 | 22421 | 2 | 1 | 12638 | 660 | 237 | 1 |
80 | 35821 | 0 | 2 | 0 | 10068 | 284 | 179 | -1 |
81 | 35786 | 16557 | 5 | 0 | 10675 | 1072 | 356 | 1 |
82 | 32597 | 0 | 7 | 0 | 10432 | 1421 | 455 | 0 |
83 | 41172 | 0 | 1 | 0 | 6459 | 823 | 154 | 0 |
84 | 30620 | 0 | 3 | 1 | 12685 | 437 | 196 | 2 |
85 | 40369 | 0 | 4 | 0 | 9165 | 559 | 162 | 1 |
86 | 42289 | 15832 | 6 | 0 | 15152 | 1718 | 344 | 1 |
87 | 39343 | 23292 | 1 | 0 | 15137 | 687 | 87 | -1 |
88 | 23937 | 0 | 4 | 0 | 3444 | 1003 | 278 | 2 |
89 | 29025 | 0 | 3 | 1 | 7263 | 655 | 234 | 2 |
90 | 31807 | 0 | 4 | 1 | 10267 | 693 | 400 | 1 |
91 | 31540 | 0 | 4 | 1 | 5846 | 564 | 266 | 1 |
92 | 38302 | 22313 | 4 | 1 | 17748 | 904 | 295 | 1 |
93 | 50375 | 0 | 6 | 1 | 13274 | 954 | 456 | 1 |
94 | 31925 | 13705 | 3 | 0 | 8818 | 745 | 256 | -1 |
95 | 29371 | 14038 | 6 | 0 | 12020 | 1478 | 487 | -1 |
96 | 38463 | 17978 | 4 | 0 | 10496 | 901 | 118 | 1 |
97 | 37039 | 0 | 8 | 0 | 8203 | 1086 | 566 | 0 |
98 | 40895 | 23136 | 3 | 1 | 15598 | 1038 | 262 | 2 |
99 | 41161 | 0 | 4 | 0 | 12087 | 1133 | 225 | -1 |
100 | 39976 | 25391 | 3 | 0 | 16616 | 1072 | 235 | 2 |
We want to build the Regression Model to predict the Expenditure of people on the groceries by using the monthly incomes of the peoples and other factors.
By using R:
Regression Model:
> Model = lm(Groc~.,data = data)
> Model
Call:
lm(formula = Groc ~ ., data = data)
Coefficients:
(Intercept) Income1 Income2 Famlsize Ownrent
Autodebt
52.1572271 -0.0001095 0.0006143 39.7482604 3.4121354
0.0009417
Hpayrent Loc
0.0491947 5.3885533
From above output the Fitted model is,
Groc = 52.1572271 - 0.0001095 * Income1 + 0.0006143 * Income2 + 39.7482604 * Famlsize + 3.4121354 * Ownrent + 0.0009417 * Autodebt + 0.0491947 * Hpayrent + 5.3885533 * Loc
i.e. from the above model we can say that if you are primary wage earner then your expenditure decreses by 0.0001095 and if you are secodary earner then your expenditure is incresed by 0.0006143.
ANOVA table:
> summary(Model)
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.1572271 42.2783823 1.234 0.220
Income1 -0.0001095 0.0013181 -0.083 0.934
Income2 0.0006143 0.0010628 0.578 0.565
Famlsize 39.7482604 4.7320222 8.400 5.27e-13 ***
Ownrent 3.4121354 16.3228969 0.209 0.835
Autodebt 0.0009417 0.0032897 0.286 0.775
Hpayrent 0.0491947 0.0305795 1.609 0.111
Loc 5.3885533 6.7730375 0.796 0.428
Residual standard error: 72.23 on 92 degrees of freedom
Multiple R-squared: 0.6161,
Adjusted R-squared: 0.5869
F-statistic: 21.1 on 7 and 92 DF,
p-value: < 2.2e-16
From the above summary,
Adjusted R square = 0.5869
i.e. Model is the best fit.
F Statistics: 21.10
p-value: 2.2e-16
Test of Hypothesis:
H0: The model is does not effective.
against,
H1: The model is effective.
Decision Rule: If p-value greater than 0.05 level of
significance then we accept the
null hypothesis.
From above results p-value(2.2e-16) < 0.05
we reject the null hypothesis here
i.e. So Model is the best Fit.
>>>>>>>>>>>>>>>>>> Best Luck >>>>>>>>>>>>>>>>>>>