In: Math
Refer to the Baseball 2016 data, which reports information on the 2016 Major League Baseball season. Let attendance be the dependent variable and total team salary be the independent variable. Determine the regression equation and answer the following questions.
Draw a scatter diagram. From the diagram, does there seem to be a direct relationship between the two variables?
What is the expected attendance for a team with a salary of $100.0 million?
If the owners pay an additional $30 million, how many more people could they expect to attend?
At the .05 significance level, can we conclude that the slope of the regression line is positive? Conduct the appropriate test of hypothesis.
What percentage of the variation in attendance is accounted for by salary?
Determine the correlation between attendance and team batting average and between attendance and team ERA. Which is stronger? Conduct an appropriate test of hypothesis for each set of variables.
Show all work in Excel
Team | League | Year Opened | Team Salary | Attendance | Wins | ERA | BA | HR | Year | Average salary | ||
Arizona | National | 1998 | 65.80 | 2080145 | 79 | 4.04 | 0.264 | 154 | 2000 | 1988034 | ||
Atlanta | National | 1996 | 89.60 | 2001392 | 67 | 4.41 | 0.251 | 100 | 2001 | 2264403 | ||
Baltimore | American | 1992 | 118.90 | 2281202 | 81 | 4.05 | 0.250 | 217 | 2002 | 2383235 | ||
Boston | American | 1912 | 168.70 | 2880694 | 78 | 4.31 | 0.265 | 161 | 2003 | 2555476 | ||
Chicago Cubs | National | 1914 | 117.20 | 2959812 | 97 | 3.36 | 0.244 | 171 | 2004 | 2486609 | ||
Chicago Sox | American | 1991 | 110.70 | 1755810 | 76 | 3.98 | 0.250 | 136 | 2005 | 2632655 | ||
Cincinnati | National | 2003 | 117.70 | 2419506 | 64 | 4.33 | 0.248 | 167 | 2006 | 2866544 | ||
Cleveland | American | 1994 | 87.70 | 1388905 | 81 | 3.67 | 0.256 | 141 | 2007 | 2944556 | ||
Colorado | National | 1995 | 98.30 | 2506789 | 68 | 5.04 | 0.265 | 186 | 2008 | 3154845 | ||
Detroit | American | 2000 | 172.80 | 2726048 | 74 | 4.64 | 0.270 | 151 | 2009 | 3240206 | ||
Houston | American | 2000 | 69.10 | 2153585 | 86 | 3.57 | 0.250 | 230 | 2010 | 3297828 | ||
Kansas City | American | 1973 | 112.90 | 2708549 | 95 | 3.73 | 0.269 | 139 | 2011 | 3305393 | ||
LA Angels | American | 1966 | 146.40 | 3012765 | 85 | 3.94 | 0.246 | 176 | 2012 | 3440000 | ||
LA Dodgers | National | 1962 | 230.40 | 3764815 | 92 | 3.44 | 0.250 | 187 | 2013 | 3650000 | ||
Miami | National | 2012 | 84.60 | 1752235 | 71 | 4.02 | 0.260 | 120 | 2014 | 3950000 | ||
Milwaukee | National | 2001 | 98.70 | 2542558 | 68 | 4.28 | 0.251 | 145 | 2015 | 4250000 | ||
Minnesota | American | 2010 | 108.30 | 2220054 | 83 | 4.07 | 0.247 | 156 | ||||
NY Mets | National | 2009 | 100.10 | 2569753 | 90 | 3.43 | 0.244 | 177 | ||||
NY Yankees | American | 2009 | 213.50 | 3193795 | 87 | 4.05 | 0.251 | 212 | ||||
Oakland | American | 1966 | 80.80 | 1768175 | 68 | 4.14 | 0.251 | 146 | ||||
Philadelphia | National | 2004 | 133.00 | 1831080 | 63 | 4.69 | 0.249 | 130 | ||||
Pittsburgh | National | 2001 | 85.90 | 2498596 | 98 | 3.21 | 0.260 | 140 | ||||
San Diego | National | 2004 | 126.60 | 2459742 | 74 | 4.09 | 0.243 | 148 | ||||
San Francisco | National | 2000 | 166.50 | 3375882 | 84 | 3.72 | 0.267 | 136 | ||||
Seattle | American | 1999 | 123.20 | 2193581 | 76 | 4.16 | 0.249 | 198 | ||||
St. Louis | National | 2006 | 120.30 | 3520889 | 100 | 2.94 | 0.253 | 137 | ||||
Tampa Bay | American | 1990 | 74.80 | 1287054 | 80 | 3.74 | 0.252 | 167 | ||||
Texas | American | 1994 | 144.80 | 2491875 | 88 | 4.24 | 0.257 | 172 | ||||
Toronto | American | 1989 | 116.40 | 2794891 | 93 | 3.8 | 0.269 | 232 | ||||
Washington | National | 2008 | 174.50 | 2619843 | 83 | 3.62 | 0.251 | 177 |
Solution:
Required regression model by using excel is given as below:
Regression Statistics |
||||||
Multiple R |
0.706773855 |
|||||
R Square |
0.499529282 |
|||||
Adjusted R Square |
0.481655328 |
|||||
Standard Error |
427738.1077 |
|||||
Observations |
30 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
1 |
5.11324E+12 |
5.11324E+12 |
27.94732921 |
1.26761E-05 |
|
Residual |
28 |
5.12288E+12 |
1.8296E+11 |
|||
Total |
29 |
1.02361E+13 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
1196918.806 |
251124.0645 |
4.766244957 |
5.25468E-05 |
682514.4856 |
1711323.126 |
Team Salary |
10347.28988 |
1957.295783 |
5.286523357 |
1.26761E-05 |
6337.951267 |
14356.62848 |
Determine the regression equation and answer the following questions.
Attendance = 1196918.806 + 10347.28988*Team salary
Y = 1196918.806 + 10347.28988*X
Draw a scatter diagram. From the diagram, does there seem to be a direct relationship between the two variables?
Scatter diagram is given as below:
From above scatter diagram, it is observed that there is moderate positive linear relationship exists between the two variables.
What is the expected attendance for a team with a salary of $100.0 million?
Attendance = 1196918.806 + 10347.28988*Team salary
Attendance = 1196918.806 + 10347.28988*100
Attendance = 2231647.794
If the owners pay an additional $30 million, how many more people could they expect to attend?
Attendance = 1196918.806 + 10347.28988*Team salary
Attendance = 1196918.806 + 10347.28988*130
Attendance = 2542066.49
At the .05 significance level, can we conclude that the slope of the regression line is positive? Conduct the appropriate test of hypothesis.
H0: β = 0 versus Ha: β > 0
We are given α = 0.05
β̂ = 10347.28988
SE = 1957.295783
t = β̂/SE = 10347.28988/1957.295783
t = 5.286523357
P-value = 0.00
P-value < α = 0.05
So, we reject the null hypothesis
There is sufficient evidence to conclude that the slope of regression line is positive.
What percentage of the variation in attendance is accounted for by salary?
WE are given coefficient of determination or the value of R square is given as 0.499529282, which means about 50.0% of the variation in attendance is accounted for by salary.
Determine the correlation between attendance and team batting average and between attendance and team ERA. Which is stronger? Conduct an appropriate test of hypothesis for each set of variables.
Correlation coefficient between attendance and team batting average = 0.124065
Correlation coefficient between attendance and team ERA = -0.30198
Correlation coefficient between attendance and team ERA is stronger than Correlation coefficient between attendance and team batting average.