In: Statistics and Probability
the manager of an amusement park would like to be able to predict daily attendance in order to develop more accurate plans about how much food to order and how many ride operators to hire. after some consideration, he decided the following three factors are critical: 1. Yesterday's attendance 2. Weekday or Weekend 3. Predicted Weather He then took a random sample of 40 days. For each day, he recorded the attendance, day of the week, and weather forecast. The first independent variable is interval, but the other two are nominal. Accordingly, he created the following sets of indicator variables:
I1 = 1- (IF WEEKEND) 0 - (IF NOT)
I2 = 1 - (IF MOSTLY SUNNY IS PREDICTED) 2 - (IF NOT)
I3 = 1 - (IF RAIN IS PREDICTED) 2 - (IF NOT)
A. Conduct regression analysis
B. Is model valid? Explain
C. Can we conclude weather is a factor in determining attendance
D. Do these results provide sufficient evidence that weekend attendance is, on average, larger than weekday attendance?
I will attach data in another question
#R software code
x=matrix(c(7882,8876, 2, 1,
6115, 7203, 2, 3,
5351, 4370, 2, 3,
8546, 7192, 1, 1,
6055, 6835, 2, 3,
7367, 5469, 2, 1,
7871, 8207, 2, 1,
5377, 7026, 2, 3,
5259, 5592, 2, 1,
4915, 3190, 2, 3,
6538, 7012, 2, 3,
6607, 5434, 2, 3,
5118, 3764, 2, 3,
6077, 7575, 2, 3,
4475, 6047, 2, 3,
3771, 4430, 2, 3,
6106, 5697, 2, 3,
7017, 3928, 1, 2,
5718, 5552, 2, 3,
5966, 3142, 1, 2,
8160, 8648, 1, 2,
4717, 3397, 2, 3,
7783, 7655, 2, 3,
5124, 5920, 2, 3,
7495, 7831, 1, 2,
5848, 6355, 2, 3,
5166, 3529, 2, 3,
4487, 4220, 2, 3,
7320, 7526, 2, 1,
6925, 4083, 1, 1,
8133, 6382, 1, 1,
7929, 6459, 2, 3,
7291, 3432, 1, 2,
5419, 8077, 2, 3,
3634, 3353, 2, 3,
6859, 3803, 1, 2,
6883, 7476, 1, 2,
8352, 7075, 1, 1,
9659, 8859, 1, 1,
5627, 7696, 2, 1),ncol=4,byrow=T)
#define indicator variable for day of the week and weather forecast
x1=0;x2=0;
for(i in 1:40)
{
if(x[i,3]==1){x[i,3]=1}else{x[i,3]=0};
if(x[i,4]==1){x1[i]=1}else{x1[i]=0};
if(x[i,4]==2){x2[i]=1}else{x2[i]=0};
}
X=data.frame(x[,1:3],x1,x2)
names(X)=c("Attendance","Yest_Att","day_of_the_week", "mostly_sunny", "rain")
reg=lm(Attendance~.,data=X)
summary(reg)
#Output
Call:
lm(formula = Attendance ~ ., data = X)
Residuals:
Min 1Q Median 3Q Max
-1433.07 -492.67 23.08 404.67 2050.65
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3500.60602 471.86987 7.419 1.11e-08 ***
Yest_Att 0.36813 0.07839 4.696 4.00e-05 ***
day_of_the_week 1622.88274 495.72783 3.274 0.00239 **
mostly_sunny 726.34767 397.00811 1.830 0.07585 .
rain -39.71900 608.62767 -0.065 0.94834
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 816 on 35 degrees of freedom
Multiple R-squared: 0.6964, Adjusted R-squared: 0.6618
F-statistic: 20.08 on 4 and 35 DF, p-value: 1.146e-08