In: Statistics and Probability
Many regions in North and South Carolina and Georgia have experienced rapid population growth over the last 10 years. It is expected that the growth will continue over the next 10 years. This has motivated many of the large grocery store chains to build new stores in the region. The Kelley’s Super Grocery Stores Inc. chain is no exception. The director of planning for Kelley’s Super Grocery Stores wants to study adding more stores in this region. He believes there are two main factors that indicate the amount families spend on groceries. The first is their income and the other is the number of people in the family. Food and income are reported in thousands of dollars per year, and the variable size refers to the number of people in the household.
a) Develop a correlation matrix. Do you see any problems with multicollinearity?
b) Determine the regression equation. Discuss the regression equation. How much does an additional family member add to the amount spent on food?
c) What is the value of R2? Can we conclude the model is significant?
d) Would you consider deleting either of the independent variables?
e) Plot the residuals in a histogram. Is there any problem with the normality assumption?
f) Plot the fitted values against the residuals. Does this plot indicate any problems with homoscedasticity?
Kelley's Super Grocery | |||
(in $1,000) | (in $1,000) | Family | |
Family | Food | Income | Size |
1 | 5.04 | 73.98 | 4 |
2 | 4.08 | 54.9 | 2 |
3 | 5.76 | 94.14 | 4 |
4 | 3.48 | 52.02 | 1 |
5 | 4.2 | 65.7 | 2 |
6 | 4.8 | 53.64 | 4 |
7 | 4.32 | 79.64 | 3 |
8 | 5.04 | 68.58 | 4 |
9 | 6.12 | 165.6 | 5 |
10 | 3.24 | 64.8 | 1 |
11 | 4.8 | 138.42 | 3 |
12 | 3.24 | 125.82 | 1 |
13 | 6.6 | 77.58 | 7 |
14 | 4.92 | 171.36 | 2 |
15 | 6.6 | 82.08 | 9 |
16 | 5.4 | 141.3 | 3 |
17 | 6 | 36.9 | 5 |
18 | 5.4 | 56.88 | 4 |
19 | 3.36 | 71.82 | 1 |
20 | 4.68 | 69.48 | 3 |
21 | 4.32 | 54.36 | 2 |
22 | 5.52 | 87.66 | 5 |
23 | 4.56 | 38.16 | 3 |
24 | 5.4 | 43.74 | 7 |
25 | 4.8 | 48.42 | 5 |
a)
Correlations: Food, Income, Size
Food | Income | |
Income | 0.156 | |
Size | 0.876 | -0.098 |
The variables which have a high correlation is only between the variables Food and Size. Whereas, the variable Food is the dependent variable. Hence, we do not see any problems with multicollinearity. Note: Multicollinearity happens when the correlation between the independent variable is high.
b)
b) Determine the regression equation. Discuss the regression equation. How much does an additional family member add to the amount spent on food?
Ans:
Regression Analysis: Food versus Income, Size
The regression equation is
Food = 2.84 + 0.00613 Income + 0.425 Size
Predictor | Coef | SE Coef | T | P-value |
Constant | 2.8435 | 0.2617 | 10.86 | 0.000 |
Income | 0.00613 | 0.002250 | 2.72 | 0.012 |
Size | 0.42475 | 0.04222 | 10.06 | 0.000 |
S = 0.420153 R-Sq = 82.6% R-Sq(adj) = 81.0%
The mean amount spent on food increased by increasing an additional family member is 0.425*1000=$425.
c) What is the value of R2? Can we conclude the model is significant?
Ans: The value of R2 is 0.8260. We can conclude that the model is significant at 005 level of significance because at least one coefficient has the p-value less than 0.05 in the above table.
d) Would you consider deleting either of the independent variables?
Ans: The p-values of both the coefficients are less than 0.05 level of significance. Hence, we are not considering deleting either of the independent variables.
e) Plot the residuals in a histogram. Is there any problem with the normality assumption?
Ans:
Comment: The histogram shows an approximate symmetric. Hence, the assumption of the normal distribution on model residual is satisfied.
f) Plot the fitted values against the residuals. Does this plot indicate any problems with homoscedasticity?
From the above plot, the points are not spread in a random and have a near to cubic relation between the variables. Hence, this is not pretty symmetrically distributed which tending to cluster towards the middle of the plot.