In: Statistics and Probability
Many movies are released each year and it would be interesting to be able to predict the Total Gross Revenues (in $1,000,000) from the box office based on a few predictors. The following predictors have been identified for 70 movies:
The following partial output has been obtained:
Coefficients |
Standard Error |
t Stat |
|
Intercept |
-4.039 |
30.735 |
|
Budget |
0.803 |
0.154 |
|
Length |
-0.433 |
0.242 |
|
Screens |
0.013 |
0.005 |
|
Awards |
1.390 |
1.049 |
|
Genre 1 |
4.777 |
2.032 |
|
Genre 2 |
2.732 |
13.646 |
ANOVA |
||||
df |
SS |
MS |
F |
|
Regression |
39.5 |
|||
Residual |
170.2 |
|||
Total |
Based on the above partial printout, answer the following questions:
First we fill the missing values in the table as:
t-Stat=Coefficients/ standard error of coefficients
i.e. t-Stat=betahat/SE(betahat)
where, SE=Standard Error
ANOVA table:
GIVEN: MSregression=39.5, SSresidual=170.2
n=70 (total number of movies)
k=6 (total number of predictors)
p=k+1=7 (total number of parameters)
df(Total)=n-1=69
df(Regression)=k=6
df(Residual)=n-p=70-7=63
MSregression = SSRegression/df(Regression)
Therefore, SSregression=MSRegression*df(Regression)=39.5*6=237
MSResidual= SSResidual/df(Residual)=170.2/63=2.701587
Total SS= SSregression + SSresidual = 237 + 170.2 = 407.2
F=MSregression / MSresidual = 39.5 / 2.701587 =14.62103
Completed table is as follows:
beta_hat | SE(betahat) | bhat/SE(bhat) | ||||||
Coefficients | Standard Error | ratio | t Stat | calc t | p-value | Formula in Excel | ||
Intercept | -4.039 | 30.735 | -4.039/30.735 | -0.13141 | 2.296237 | 0.895867 | TDIST(ABS(D3),63,2) | |
Budget | 0.803 | 0.154 | 0.803/0.154 | 5.214286 | 2.296237 | 2.18E-06 | TDIST(ABS(D4),63,2) | |
Length | -0.433 | 0.242 | -0.433/0.242 | -1.78926 | 2.296237 | 0.07838 | TDIST(ABS(D5),63,2) | |
Screens | 0.013 | 0.005 | 0.013/0.005 | 2.6 | 2.296237 | 0.0116 | TDIST(ABS(D6),63,2) | |
Awards | 1.39 | 1.049 | 1.39/01.049 | 1.325071 | 2.296237 | 0.189933 | TDIST(ABS(D7),63,2) | |
Genre 1 | 4.777 | 2.032 | 4.777/2.032 | 2.350886 | 2.296237 | 0.021872 | TDIST(ABS(D8),63,2) | |
Genre 2 | 2.732 | 13.646 | 2.732/13.646 | 0.200205 | 2.296237 | 0.841965 | TDIST(ABS(D9),63,2) | |
TINV(0.025,63)=2.296237 | FINV(0.05,6,63)=2.246408 |
ANOVA | ||||
df | SS | MS | F | |
Regression | 6 | 237 | 39.5 | 14.62103 |
Residual | 63 | 170.2 | 2.701587 | |
Total | 69 | 407.2 |
Based on the above partial printout, answer the following
questions:
a) What is the regression model?
Regression equation is given as:
## y=TGR=Total Gross Revenue
y= Intercept+ beta1*Budget+ beta2*Length+ beta3*Screens+
beta4*Awards+ beta5*Genre 1 + beta6*Genre 2
y= -4.039+ 0.803*BUDGET -0.433*LENGTH+ 0.013*SCREENS+
1.390*AWARDS+ 4.777*GENRE1+ 2.732*GENRE2
b) Interpret the following coefficients: -0.433
and 4.777.
X=Independent variable/ Regressor
Y=Dependent variable / Response
Here, (-0.433) indicates that if Length
decreases , the revenue (response)
increases and vice-versa.
Here, 4.777 indicates that: If Genre 1=1, then intercept will be
(-4.039+4.777=0.738) and if Genre 1=0, then intercept remains
unchanged.
c) Is there sufficient evidence at the 5% level
of significance to conclude that the model is useful at predicting
Total Gross Revenues?
H0: All beta_i's equal to zero i.e. regressors do not significantly
contribute to the model.
Reject H0 if F > F(alpha,k,n-p)
Here, F(alpha,k,n-p)=FINV(0.05,6,63)=2.246408. Calculated
F=14.62103
Calculated F > tabulated F, Thus, we Reject H0.
Yes, there sufficient evidence at the 5% level of
significance to conclude that the model is useful at predicting
Total Gross Revenues.
d) Determine the adjusted coefficient of
determination and explain its meaning in the context of the
problem.
## adjusted coefficient of determination = R2Adj
R-squared measures the proportion of the variation in
your dependent variable (Y) explained by independent variables (X)
for a linear regression model.The adjusted R-squared adjusts the
statistic for the number of independent variables in the model.
Importantly, its value increases only when the new term improves
the model fit more than expected by chance alone.
R2Adj= 1 - [( SSresidual/ (n-p) ) / ( TotalSS / (n-1) )]= 1
- [ (170.2/63) / (407.2/69) ]=0.542216
54.2216 % of variation in the model is explianed by the regressors.
e) What is the standard error of the
estimate?
Sigmasquare hat=MSresidual=2.701587
standard error of the
estimate=SE=sigmahat=squareroot(2.701587) = 1.643651
f) Does the “genre” of movie have a significant
impact (at 5%) on the Total Gross Revenues? Justify.
For Genre 1, calculated t-Stat > Tabulated t, we Reject H0 and
conclude that Genre 1 is significant. [Also,
p-value=0.021872 which is less than alpha=0.05, thus we Reject
H0.]
BUT, for Genre 2, calculated t-Stat < Tabulated t, we Accept H0
and conclude that Genre 2 is insignificant. [Also,
p-value=0.841965 which is greater than alpha=0.05, thus we Accept
H0.]
We can conclude that Genre is significant.
g) Estimate the Total Gross Revenues for a Drama
movie of 90 minutes produced with a $25,000,000 budget with a cast
of actors who were nominated 6 times for awards. In addition, that
movie was shown on 2,000 screens in the first weekend.
Since, it is drama movie, Genre1=Genre2=0.
Put Budget=$ 25 (in $1,000,000), Length=90, Awards=6,
Screens=2000, Genre 1=0, Genre 2=0 in REGRESSION EQUATION:
y= -4.039+ 0.803*(25) -0.433*(90)+ 0.013*(2000)+ 1.390*(6)+
4.777*(0)+ 2.732*(0)
y=11.406
Estimated Total Gross Revenue is 11.406 (in $1,000,000)
i.e. $11,406,000.