In: Statistics and Probability
Fran’s Convenience Marts is located throughout the Erie, Pennsylvania metro area. Fran, the owner, wants to expand her businesses to other communities in northwest Pennsylvania and southeast New York, such as Jamestown, Corry, Meadville, and Warren. To prepare your presentation to the local bank, you would like to better understand the factors that make a particular discount store productive. Fran must do all the work on her own, so she won't be able to study all the discount stores. Therefore, he selects a random sample of 15 stores and records the average daily sales, the floor space (area), the number of parking spaces and the average income of the families in the region for each of the stores. The sample information is reported below.
Sampled Mart |
Daily sales |
Store area |
Parking Spaces |
Income |
1 |
$1840 |
532 |
6 |
44 |
2 |
1746 |
478 |
4 |
51 |
3 |
1812 |
530 |
7 |
45 |
4 |
1806 |
508 |
7 |
46 |
5 |
1792 |
514 |
5 |
44 |
6 |
1825 |
556 |
6 |
46 |
7 |
1811 |
541 |
4 |
49 |
8 |
1803 |
513 |
6 |
52 |
9 |
1830 |
532 |
5 |
46 |
10 |
1827 |
537 |
5 |
46 |
11 |
1764 |
499 |
3 |
48 |
12 |
1825 |
510 |
8 |
47 |
13 |
1763 |
490 |
4 |
48 |
14 |
1846 |
516 |
8 |
45 |
15 |
1815 |
482 |
7 |
43 |
With the information above carry out the analysis required for the study that you must present to the bank regarding the best equation to estimate daily sales. Using all the information previously obtained by you:
a. indicate the correlation coefficients, identifying which is the best and the weakest among all the possible regressions and equations. b. the regression errors obtained, identifying which is the best and the weakest among all the possible regressions and equations. c. the required hypothesis tests d.Present and identify which is the best equation to predict the monthly average purchase volume, explain why it is the best equation. e.With the best estimated equation present the confidence interval to predict the monthly average purchase volume, when the Area of the store is 585, the family income is 50,000 and the parking number is 10.
Answer:
1) The regression line is
y = b1x1 + c
Where y= Daily sales
b1=slope
x1=Store area
c=Intercept
In order to perform regression analysis using the data in Excel, follow the below steps
You will get an output as below
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.658651907 | |||||||
R Square | 0.433822335 | |||||||
Adjusted R Square | 0.390270207 | |||||||
Standard Error | 22.8990747 | |||||||
Observations | 15 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 1 | 5223.221 | 5223.221 | 9.960991 | 0.007582 | |||
Residual | 13 | 6816.779 | 524.3676 | |||||
Total | 14 | 12040 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 1363.024788 | 140.7961 | 9.68084 | 2.62E-07 | 1058.853 | 1667.196 | 1058.853 | 1667.196 |
X Variable 1 | 0.860639465 | 0.27269 | 3.156104 | 0.007582 | 0.271527 | 1.449751 | 0.271527 | 1.449751 |
As per above table Correlation Coefficient is R Square which is 43.4% indicates an week relationship.
Standers error is 22.89 and the fitted regression line is
y = 0.86x1 + 1363.02
2)
The 2nd regression line is
y = b1x1 + b2x2 + c
Where y= Daily sales
b1,b2=slope
x1=Store area
x2=Parking spaces
c=Intercept
In order to perform regression analysis using the data in Excel, follow the below steps
You will get an output as below
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.896657897 | |||||||
R Square | 0.803995384 | |||||||
Adjusted R Square | 0.771327948 | |||||||
Standard Error | 14.02347906 | |||||||
Observations | 15 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 2 | 9680.104 | 4840.052 | 24.61152 | 5.67E-05 | |||
Residual | 12 | 2359.896 | 196.658 | |||||
Total | 14 | 12040 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 1342.490177 | 86.3319 | 15.55034 | 2.57E-09 | 1154.389 | 1530.591 | 1154.389 | 1530.591 |
X Variable 1 | 0.772651585 | 0.168016 | 4.598669 | 0.000612 | 0.406575 | 1.138728 | 0.406575 | 1.138728 |
X Variable 2 | 11.63375746 | 2.443769 | 4.76058 | 0.000464 | 6.309242 | 16.95827 | 6.309242 | 16.95827 |
As per above table Correlation Coefficient is R Square which is 80.4% indicates a strong relationship.
Standers error is 14.02 and the fitted regression line is
y = 0.77x1 + 11.63x2 + 1342.49
3)
The 3rd regression line is
y = b1x1 + b2x2 +b3x3 + c
Where y= Daily sales
b1,b2,b3=slope
x1=Store area
x2=Parking spaces
x3=Family income
c=Intercept
In order to perform regression analysis using the data in Excel, follow the below steps
You will get an output as below
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.913977941 | |||||||
R Square | 0.835355677 | |||||||
Adjusted R Square | 0.79045268 | |||||||
Standard Error | 13.4242577 | |||||||
Observations | 15 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 3 | 10057.68 | 3352.561 | 18.60356 | 0.000128518 | |||
Residual | 11 | 1982.318 | 180.2107 | |||||
Total | 14 | 12040 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 1480.744612 | 126.3042 | 11.72364 | 1.48E-07 | 1202.751032 | 1758.738 | 1202.751 | 1758.738 |
X Variable 1 | 0.731498876 | 0.16333 | 4.478642 | 0.000933 | 0.372010879 | 1.090987 | 0.372011 | 1.090987 |
X Variable 2 | 9.991487385 | 2.599961 | 3.842938 | 0.002733 | 4.26901275 | 15.71396 | 4.269013 | 15.71396 |
X Variable 3 | -0.00230826 | 0.001595 | -1.44748 | 0.175655 | -0.005818119 | 0.001202 | -0.00582 | 0.001202 |
As per above table Correlation Coefficient is R Square which is 83.5% which indicates a very strong relationship.
Standers error is 13.42 and the fitted regression line is
y = 0.73x1 + 9.99x2 -0.002x3 + 1480.74
The hypothesis test is
Null hypothesis : There is no relationship between the x variables
Alternate hypothesis : There is a relationship between the x variables
Hence considering above three regression lines third line indicates the highest Correlation Coefficient and the lowest standard error.However the p value for variable x3 is greater than 0.05 which is our significant level, hence we cannot take this variable into consideration.
So the final best fitted line is
y = 0.73x1 + 9.99x2 + 1480.74
so the monthly average sales for store 585 and parking number 10 is,
y = 0.73x1 + 9.99x2 + 1480.74
y= 0.73*585+9.99*10+1480.74
y=2007.69
Hence the monthly average purchase volume is $2008 .
NOTE:: I HOPE THIS ANSWER IS HELPFULL TO YOU......**PLEASE SUPPORT ME WITH YOUR RATING......
**PLEASE GIVE ME "LIKE".....ITS VERY IMPORTANT FOR,ME......PLEASE SUPPORT ME .......THANK YOU