In: Statistics and Probability
The manager of an amusement park would like to be able to
predict daily attendance in order to develop more accurate plans
about how much food to order and how many ride operators to hire.
After some consideration, he decided that the following three
factors are critical:
Yesterday’s attendance
Weekday or weekend (1 if weekend, 0 if weekday)
Predicted weather
Rain forecast ( 1 if forecast for rain, 0 if not)
Sun ( 1 if mostly sunny, 0 if not)
He then took a random sample of 40 days. For each day, he recorded
the attendance, the previous day’s attendance, day of the week, and
weather forecast. An example of the first few lines of Data and the
regression output are below:
Attendance Yest Att I1
I2 I3
7882 8876 0 1
0
6115 7203 0 0
0
5351 4370 0 0
0
8546 7192 1 1 0
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.836766353
R Square 0.700177929
Adjusted R Square 0.665912549
Standard Error 810.7745532
Observations 40
ANOVA
df SS MS
F Significance F
Regression 4 53729535
13432384 20.43398
9.28E-09
Residual 35 23007438
657355.4
Total 39 76736973
Coefficients Standard
Error t Stat P-value Lower
95% Upper 95%
Intercept 3490.466604 469.1554
7.439894 1.04E-08 2538.031
4442.903
Yest Att 0.368547078 0.077895
4.731349 3.6E-05 0.210412
0.526682
I1 1623.095785 492.5497
3.295294 0.002258 623.1668
2623.025
I2 733.4646317 394.3718
1.85983 0.071331 -67.1527
1534.082
I3 765.5429068 484.6621
-1.57954 0.123209 -1749.46
218.3734
Test to see if the model is valid. Use alpha = .05
Can we conclude that weather is a factor in determining
attendance?
If the manager is looking for a way to help predict attendance, Is
this a good model to use? How would you suggest making this model
better?
please give proper details for the answer. Thank you
Significance of Independent variable weather
From, the result summary,
Coefficients | t Stat | P-value | ||||
Weather (I2) | 733.4646317 | 1.85983 | 0.071331 | > | 0.05 | Not Significant |
The P-value for independent variable, weather is 0.071331 which is greater than 0.05 at 5% significance level hence we can conclude that weather is not a significant variable in the model.
Overall Significance
F | Significance F | ||||
Regression | 20.43398 | 9.28E-09 | < | 0.05 | Significant |
The significance F value is 9.28E-09 which is less than 0.05 at 5% significance level which mean the model significantly fit the data value at the predefined significance level (0.05). Hence we can conclude that independent variables fit the model significantly.
However, the regression model can be further improve by removing the insignificant independent variables.
From, the result summary,
P-value | ||||
Yest Att | 3.60E-05 | < | 0.05 | Significant |
I1 | 0.002258 | < | 0.05 | Significant |
I2 | 0.071331 | > | 0.05 | Not Significant |
I3 | 0.123209 | > | 0.05 | Not Significant |
There is only two variables, Yesterday Attendance and Weekday or weekend are statistically significant at 5% significance level. Hence by removing other two variable we can improve the model in terms of R-square value (The R-square value tell, how well the regression model fit the data values)