Question

In: Statistics and Probability

A motion picture industry analyst is studying movies based on epic novels. The following data were...

A motion picture industry analyst is studying movies based on epic novels. The following data were obtained for 10 Hollywood movies made in the past five years. Each movie was based on an epic novel. For these data, x1 = first-year box office receipts of the movie, x2 = total production costs of the movie, x3 = total promotional costs of the movie, and x4 = total book sales prior to movie release. All units are in millions of dollars.

x1 x2 x3 x4
85.1 8.5 5.1 4.7
106.3 12.9 5.8 8.8
50.2 5.2 2.1 15.1
130.6 10.7 8.4 12.2
54.8 3.1 2.9 10.6
30.3 3.5 1.2 3.5
79.4 9.2 3.7 9.7
91.0 9.0 7.6 5.9
135.4 15.1 7.7 20.8
89.3 10.2 4.5 7.9

(a) Generate summary statistics, including the mean and standard deviation of each variable. Compute the coefficient of variation for each variable. (Use 2 decimal places.)

x s CV
x1 %
x2 %
x3 %
x4 %

Relative to its mean, which variable has the largest spread of data values?

x4

x3    

x2

x1


Why would a variable with a large coefficient of variation be expected to change a lot relative to its average value? Although x1 has the largest standard deviation, it has the smallest coefficient of variation. How does the mean of x1 help explain this?

A variable with a large CV has large s relative to x. Here, x1 has a small CV because we divide by a small mean.

A variable with a large CV has large s relative to x. Here, x1 has a small CV because we divide by a large mean.    

A variable with a large CV has small s relative to x. Here, x1 has a small CV because we divide by a large mean

.A variable with a large CV has small s relative to x. Here, x1 has a small CV because we divide by a small mean.



(b) For each pair of variables, generate the correlation coefficient r. Compute the corresponding coefficient of determination r2. (Use 3 decimal places.)

r r2
x1, x2   
x1, x3
x1, x4
x2, x3
x2, x4
x3, x4

Which of the three variables x2, x3, and x4 has the least influence on box office receipts?

x4

x3   

x2



What percent of the variation in box office receipts can be attributed to the corresponding variation in production costs? (Use 1 decimal place.)
_________________%

(c) Perform a regression analysis with x1 as the response variable. Use x2, x3, and x4 as explanatory variables. Look at the coefficient of multiple determination. What percentage of the variation in x1 can be explained by the corresponding variations in x2, x3, and x4 taken together? (Use 1 decimal place.)
_____________ %

(d) Write out the regression equation. (Use 2 decimal places.)

x1 = +   x2 + x3 +   x4

Explain how each coefficient can be thought of as a slope.

If we hold all explanatory variables as fixed constants, the intercept can be thought of as a "slope."

If we look at all coefficients together, each one can be thought of as a "slope."    

If we look at all coefficients together, the sum of them can be thought of as the overall "slope" of the regression line.

If we hold all other explanatory variables as fixed constants, then we can look at one coefficient as a "slope."



If x2 (production costs) and x4 (book sales) were held fixed but x3 (promotional costs) were increased by 0.6 million dollars, what would you expect for the corresponding change in x1 (box office receipts)? (Use 2 decimal places.)________________


(e) Test each coefficient in the regression equation to determine if it is zero or not zero. Use level of significance 5%. (Use 2 decimal places for t and 3 decimal places for the P-value.)

t P-value
β2
β3
β4

Conclusion

Reject the null for β3 and β4. Fail to reject the null for β2.

Reject the null for β2 and β3. Fail to reject the null for β4.   

Reject the null for all tests.

Reject the null for β2 and β4. Fail to reject the null for β3.


Explain why book sales x4 probably are not contributing much information in the regression model to forecast box office receipts x1.

From the previous tests, we can conclude that the coefficient for x4 is different than 0. Thus it does not belong in the model.

From the previous tests, we can conclude that the coefficient for x4 is not different than 0. Thus it does not belong in the model.    

From the previous tests, we can conclude that the coefficient for x4 is different than 0. Thus it belongs in the model.

From the previous tests, we can conclude that the coefficient for x4 is not different than 0. Thus it belongs in the model.



(f) Find a 90% confidence interval for each coefficient. (Use 2 decimal places.)

lower limit upper limit
β2
β3
β4


(g) Suppose a new movie (based on an epic novel) has just been released. Production costs were x2 = 11.4 million; promotion costs were x3 = 4.7 million; book sales were x4 = 8.1 million. Make a prediction for x1 = first-year box office receipts and find an 85% confidence interval for your prediction (if your software supports prediction intervals). (Use 1 decimal place.)

prediction
lower limit
upper limit


(h) Construct a new regression model with x3 as the response variable and x1, x2, and x4 as explanatory variables. (Use 2 decimal places.)

x3 = +  x1 +  x2 +  x4


Suppose Hollywood is planning a new epic movie with projected box office sales x1 = 100 million and production costs x2 = 12 million. The book on which the movie is based had sales of x4 = 9.2 million. Forecast the dollar amount (in millions) that should be budgeted for promotion costs x3 and find an 80% confidence interval for your prediction.

prediction
lower limit
upper limit

Solutions

Expert Solution

[Used R-Software]

(a)

(See R-commands for computation)

xbar s cv
x1 85.24 33.79 39.64%
x2 8.74 3.89 44.45%
x3 4.9 2.48 50.62%
x4 9.92 5.17 52.15%

Relative to its mean, which variable has the largest spread of data values? The variable that has highest value of coefficient of variation i.e. x4 has the largest spread of data values.

Why would a variable with a large coefficient of variation be expected to change a lot relative to its average value? Although x1 has the largest standard deviation, it has the smallest coefficient of variation. How does the mean of x1 help explain this?
A variable with a large CV has small s relative to x. Here, x1 has a small CV because we divide by a large mean.

(b)

For each pair of variables, generate the correlation coefficient r. Compute the corresponding coefficient of determination r2. (See R-commands for computation)

r r2
x1, x2 0.917 0.842
x1, x3 0.93 0.865
x1, x4 0.475 0.225
x2, x3 0.79 0.624
x2, x4 0.429 0.184
x3, x4 0.299 0.089

Which of the three variables x2, x3, and x4 has the least influence on box office receipts? Correlation between x1 and x4 is the least. Therefore, x4 has the least influence on box office receipts.

What percent of the variation in box office receipts can be attributed to the corresponding variation in production costs? (Use 1 decimal place.) 91.7% (See R-commands for computation)

(c)

Perform a regression analysis with x1 as the response variable. Use x2, x3, and x4 as explanatory variables. Look at the coefficient of multiple determination. What percentage of the variation in x1 can be explained by the corresponding variations in x2, x3, and x4 taken together? (Use 1 decimal place.)

Coefficient of multiple determination will give percentage of the variation in x1 that can be explained by the corresponding variations in x2, x3, and x4 taken together. Thus, required percentage is 96.7%. (See R-commands for computation)

(d)

# Thus, regression equation is: x1=7.68 + 3.66*x2 + 7.62*x3 + 0.83*x4  (See R-commands for computation)

Explain how each coefficient can be thought of as a slope.
If we hold all other explanatory variables as fixed constants, then we can look at one coefficient as a "slope." As slope is change (rate of change) in one variable when there is unit change in another variable.

If x2 (production costs) and x4 (book sales) were held fixed but x3 (promotional costs) were increased by 0.6 million dollars, what would you expect for the corresponding change in x1 (box office receipts)? (Use 2 decimal places.)

# Thus, the new regression equation is: x1=3.10 + 3.66*x2 + 7.62*x3 + 0.83*x4

Only intercept changes when above situation appears. Betas remains the same.

R-commands and outputs:

x1=c(85.1,106.3,50.2,130.6,54.8,30.3,79.4,91.0,135.4,89.3)
x2=c(8.5,12.9,5.2,10.7,3.1,3.5,9.2,9.0,15.1,10.2)
x3=c(5.1,5.8,2.1,8.4,2.9,1.2,3.7,7.6,7.7,4.5)
x4=c(4.7,8.8,15.1,12.2,10.6,3.5,9.7,5.9,20.8,7.9)

#(a)
summary(x1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
30.30 60.95 87.20 85.24 102.47 135.40
mean(x1)
sd(x1)
cvx1=sd(x1)/mean(x1)
cvx1

summary(x2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.100 6.025 9.100 8.740 10.575 15.100
mean(x2)
sd(x2)
cvx2=sd(x2)/mean(x2)
cvx2

summary(x3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.20 3.10 4.80 4.90 7.15 8.40
mean(x3)
sd(x3)
cvx3=sd(x3)/mean(x3)
cvx3

summary(x4)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.50 6.40 9.25 9.92 11.80 20.80
mean(x4)
sd(x4)
cvx4=sd(x4)/mean(x4)
cvx4

xibar=c(mean(x1),mean(x2),mean(x3),mean(x4))
xibar
[1] 85.24 8.74 4.90 9.92
si=c(sd(x1),sd(x2),sd(x3),sd(x4))
si
[1] 33.786361 3.885357 2.480143 5.173393
round(si,2)
[1] 33.79 3.89 2.48 5.17
cvxi=si/xibar
cvxi
[1] 0.3963675 0.4445489 0.5061517 0.5215114
cvxip=cvxi*100
cvxip
[1] 39.63675 44.45489 50.61517 52.15114
round(cvxip,2)
[1] 39.64 44.45 50.62 52.15

#(b)
cor(x1,x2)
[1] 0.9174448
cor(x1,x3)
[1] 0.9299678
cor(x1,x4)
[1] 0.4746911
cor(x2,x3)
[1] 0.7899575
cor(x2,x4)
[1] 0.4291329
cor(x3,x4)
[1] 0.2987613

## Crosscheck:
sum((x3-mean(x3))*(x4-mean(x4))/(10-1))/(sd(x3)*sd(x4))
[1] 0.2987613

round(cor(x1,x2),3)
[1] 0.917
round(cor(x1,x3),3)
[1] 0.93
round(cor(x1,x4),3)
[1] 0.475
round(cor(x2,x3),3)
[1] 0.79
round(cor(x2,x4),3)
[1] 0.429
round(cor(x3,x4),3)
[1] 0.299

round(cor(x1,x2)^2,3)
[1] 0.842
round(cor(x1,x3)^2,3)
[1] 0.865
round(cor(x1,x4)^2,3)
[1] 0.225
round(cor(x2,x3)^2,3)
[1] 0.624
round(cor(x2,x4)^2,3)
[1] 0.184
round(cor(x3,x4)^2,3)
[1] 0.089

round(100*cor(x1,x2),1)
[1] 91.7

#(c)
# x1=response
fit=lm(x1~x2+x3+x4)
fit
Call:
lm(formula = x1 ~ x2 + x3 + x4)

Coefficients:
(Intercept) x2 x3 x4
7.6760 3.6616 7.6211 0.8285

s=summary(fit)
s
Call:
lm(formula = x1 ~ x2 + x3 + x4)

Residuals:
Min 1Q Median 3Q Max
-12.4384 -3.1695 0.8499 3.5134 9.6207

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 7.6760 6.7602 1.135 0.2995   
x2 3.6616 1.1178 3.276 0.0169 *
x3 7.6211 1.6573 4.598 0.0037 **
x4 0.8285 0.5394 1.536 0.1754   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.541 on 6 degrees of freedom
Multiple R-squared: 0.9668, Adjusted R-squared: 0.9502
F-statistic: 58.22 on 3 and 6 DF, p-value: 7.913e-05

# Extracting r^2 from summary:
Rsq=s$r.squared
Rsq
[1] 0.9667888
round(100*Rsq,1)
[1] 96.7

#(d)
# Regression equation:
beta=coef(fit)
(Intercept) x2 x3 x4
7.6760280 3.6616044 7.6210501 0.8284682
beta=round(beta,2)
beta
(Intercept) x2 x3 x4
7.68 3.66 7.62 0.83
# Thus, regression equation is: x1=7.68 + 3.66*x2 + 7.62*x3 + 0.83*x4

newx3=x3+0.6
newx3
[1] 5.7 6.4 2.7 9.0 3.5 1.8 4.3 8.2 8.3 5.1
newfit=lm(x1~x2+newx3+x4)
newfit
Call:
lm(formula = x1 ~ x2 + newx3 + x4)

Coefficients:
(Intercept) x2 newx3 x4
3.1034 3.6616 7.6211 0.8285

news=summary(newfit)
news
Call:
lm(formula = x1 ~ x2 + newx3 + x4)
Residuals:
Min 1Q Median 3Q Max
-12.4384 -3.1695 0.8499 3.5134 9.6207
Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 3.1034 6.9784 0.445 0.6721   
x2 3.6616 1.1178 3.276 0.0169 *
newx3 7.6211 1.6573 4.598 0.0037 **
x4 0.8285 0.5394 1.536 0.1754   
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.541 on 6 degrees of freedom
Multiple R-squared: 0.9668, Adjusted R-squared: 0.9502
F-statistic: 58.22 on 3 and 6 DF, p-value: 7.913e-05

newbeta=coef(newfit)
newbeta
(Intercept) x2 newx3 x4
3.1033980 3.6616044 7.6210501 0.8284682
nbeta=round(newbeta,2)
nbeta
(Intercept) x2 newx3 x4
3.10 3.66 7.62 0.83
# Thus, the new regression equation is: x1=3.10 + 3.66*x2 + 7.62*x3 + 0.83*x4


Related Solutions

A motion picture industry analyst is studying movies based on epic novels. The following data were...
A motion picture industry analyst is studying movies based on epic novels. The following data were obtained for 10 Hollywood movies made in the past five years. Each movie was based on an epic novel. For these data, x1 = first-year box office receipts of the movie, x2 = total production costs of the movie, x3 = total promotional costs of the movie, and x4 = total book sales prior to movie release. All units are in millions of dollars....
A motion picture industry analyst is studying movies based on epic novels. The following data were...
A motion picture industry analyst is studying movies based on epic novels. The following data were obtained for 10 Hollywood movies made in the past five years. Each movie was based on an epic novel. For these data, x1 = first-year box office receipts of the movie, x2 = total production costs of the movie, x3 = total promotional costs of the movie, and x4 = total book sales prior to movie release. All units are in millions of dollars....
The motion Picture Industry is a competitive business. More than 50 studios produce a total of...
The motion Picture Industry is a competitive business. More than 50 studios produce a total of 300 to 400 new motion pictures each year, and financial success of each motion picture varies considerably. The opening weekend gross sales ($millions), the total gross sales ($millions), the number of theaters the movie was shown in, and the number of weeks the motion picture was in the top 60 for gross sales are common variables used to measure the success of a motion...
The motion picture industry is a competitive business. More than 50 studios produce a total of...
The motion picture industry is a competitive business. More than 50 studios produce a total of 300 to 400 new motion pictures each year, and the financial success of each motion picture varies considerably. Gross sales for the opening weekend, the total gross sales, the number of theaters the movie was shown in, and the number of weeks the motion picture was open are common variables used to measure the success of a motion picture. Data collected for a sample...
The motion picture industry is a competitive business. More than 50 studios produce a total of...
The motion picture industry is a competitive business. More than 50 studios produce a total of 300 to 400 new motion pictures each year, and the financial success of each motion picture varies considerably. Gross sales for the opening weekend, the total gross sales, the number of theaters the movie was shown in, and the number of weeks the motion picture was open are common variables used to measure the success of a motion picture. Data collected for a sample...
In EXCEL: The motion picture industry is a competitive business. More than 60 studios produce a...
In EXCEL: The motion picture industry is a competitive business. More than 60 studios produce a total of 300 to 400 new motion pictures each year, and the financial success of each motion picture varies considerably. The opening weekend gross sales ($ millions) are often used to predict the success of a motion picture. Data collected for a sample of 30 motion pictures is shown below. Round and label your answers properly! Opening Weekend Gross Sales        ($ millions) Total Gross...
Case Study 2: Forecasting Box Office Returns For years, people in the motion picture industry –...
Case Study 2: Forecasting Box Office Returns For years, people in the motion picture industry – critics, film historians, and others – have eagerly awaited the second issue in January of Variety. Long considered the show business bible, Variety is a weekly trade newspaper that reports on all aspects of the entertainment industry; movies, television, recordings, concert tours, and so on. The second issue in January, called the Anniversary Edition, summarizes how the entertainment industry fared in the previous year,...
The motion picture industry is a competitive business. More than 50 studios produce several hundred new...
The motion picture industry is a competitive business. More than 50 studios produce several hundred new motion pictures each year, and the financial success of the motion pictures varies considerably. The opening weekend gross sales, the total gross sales ($ millions), the total gross sales ($millions), the number of theaters the movie was shown in, and the number of weeks the motion picture was in release are common variables used to measure the success of a movie. Data on the...
he motion picture industry is a competitive business. More than 50 studios produce several hundred new...
he motion picture industry is a competitive business. More than 50 studios produce several hundred new motion pictures each year, and the financial success of the motion pictures varies considerably. The opening weekend gross sales, the total gross sales, the number of theaters the movie was shown in, and the number of weeks the motion picture was in release are common variables used to measure the success of a motion picture. Data on the top 100 grossing motion pictures released...
In studying the occurrence of genetic characteristics, the following sample data were obtained. You would like...
In studying the occurrence of genetic characteristics, the following sample data were obtained. You would like to test the claim that the characteristics occur with the same frequency at the 0.05 significance level. What is value of the test statistic? Characteristic | Frequency A|28 B|30 C|45 D|48 E|39 F|39
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT