In: Statistics and Probability
The National Football League (NFL) records a variety of performance data for individuals and teams. To investigate the importance of passing on the percentage of games won by a team, the following data show the conference (Conf), average number of passing yards per attempt (Yds/Att), the number of interceptions thrown per attempt (Int/Att), and the percentage of games won (Win%) for a random sample of 16 NFL teams for the 2011 season (NFL web site, February 12, 2012).
Click on the datafile logo to reference the data.
Team | Conference | Yds/Att | Int/Att | Win% |
Arizona Cardinals | NFC | 6.5 | 0.042 | 50.0 |
Atlanta Falcons | NFC | 7.1 | 0.022 | 62.5 |
Carolina Panthers | NFC | 7.4 | 0.033 | 37.5 |
Cincinnati Bengals | AFC | 6.2 | 0.026 | 56.3 |
Detroit Lions | NFC | 7.2 | 0.024 | 62.5 |
Green Bay Packers | NFC | 8.9 | 0.014 | 93.8 |
Houstan Texans | AFC | 7.5 | 0.019 | 62.5 |
Indianapolis Colts | AFC | 5.6 | 0.026 | 12.5 |
Jacksonville Jaguars | AFC | 4.6 | 0.032 | 31.3 |
Minnesota Vikings | NFC | 5.8 | 0.033 | 18.8 |
New England Patriots | AFC | 8.3 | 0.020 | 81.3 |
New Orleans Saints | NFC | 8.1 | 0.021 | 81.3 |
Oakland Raiders | AFC | 7.6 | 0.044 | 50.0 |
San Francisco 49ers | NFC | 6.5 | 0.011 | 81.3 |
Tennessee Titans | AFC | 6.7 | 0.024 | 56.3 |
Washington Redskins | NFC | 6.4 | 0.041 | 31.3 |
Let x1 represent Yds/Att.
Let x2 represent Int/Att.
(a) | Develop the estimated regression equation that could be used to predict the percentage of games won, given the average number of passing yards per attempt. If required, round your answer to three decimal digits. For subtractive or negative numbers use a minus sign even if there is a + sign before the blank. (Example: -300) |
ŷ = + x1 | |
What proportion of variation in the sample values of proportion of games won does this model explain? If required, round your answer to one decimal digit. | |
% | |
(b) | Develop the estimated regression equation that could be used to predict the percentage of games won, given the number of interceptions thrown per attempt. If required, round your answer to three decimal digits. For subtractive or negative numbers use a minus sign even if there is a + sign before the blank. (Example: -300) |
ŷ = + x2 | |
What proportion of variation in the sample values of proportion of games won does this model explain? If required, round your answer to one decimal digit. | |
% | |
(c) | Develop the estimated regression equation that could be used to predict the percentage of games won, given the average number of passing yards per attempt and the number of interceptions thrown per attempt. If required, round your answer to three decimal digits. For subtractive or negative numbers use a minus sign even if there is a + sign before the blank. (Example: -300) |
ŷ = + x1 + x2 | |
What proportion of variation in the sample values of proportion of games won does this model explain? If required, round your answer to one decimal digit. | |
% | |
(d) | The average number of passing yards per attempt for the Buffalo Bills during the 2011 season was 6.7, and the team’s number of interceptions thrown per attempt was 0.043. Use the estimated regression equation developed in part (c) to predict the percentage of games won by the Buffalo Bills during the 2011 season. (Note: For the 2011 the 2011 season, the Buffalo Bills' record was 7 wins and 9 loses.) |
If required, round your answer to one decimal digit. Do not round intermediate calculations. | |
% | |
Compare your prediction to the actual percentage of games won by the Buffalo Bills. If required, round your answer to one decimal digit. | |
The Buffalo Bills performed - Select your answer -better worse Item 12 than what we predicted by %. | |
(e) | Did the estimated regression equation that uses only the average number of passing yards per attempt as the independent variable to predict the percentage of games won provide a good fit? |
In order to solve this question I used R software.
R codes and output:
> d=read.table('football.csv',header=TRUE, sep=',')
> head(d)
Yds.Att Int.Att Win
1 6.5 0.042 50.0
2 7.1 0.022 62.5
3 7.4 0.033 37.5
4 6.2 0.026 56.3
5 7.2 0.024 62.5
6 8.9 0.014 93.8
> attach(d)
The following objects are masked from d (pos = 3):
Int.Att, Win, Yds.Att
> fit_1=lm(Win~Yds.Att)
> summary(fit_1)
Call:
lm(formula = Win ~ Yds.Att)
Residuals:
Min 1Q Median 3Q Max
-25.020 -15.072 3.643 6.847 33.531
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -58.77 26.18 -2.245 0.041423 *
Yds.Att 16.39 3.75 4.371 0.000639 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.87 on 14 degrees of freedom
Multiple R-squared: 0.5771, Adjusted R-squared: 0.5469
F-statistic: 19.11 on 1 and 14 DF, p-value: 0.0006393
> fit_2=lm(Win~Int.Att)
> summary(fit_2)
Call:
lm(formula = Win ~ Int.Att)
Residuals:
Min 1Q Median 3Q Max
-43.425 -5.277 0.274 16.172 22.883
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.54 13.86 7.036 5.9e-06 ***
Int.Att -1600.49 484.63 -3.303 0.00524 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.3 on 14 degrees of freedom
Multiple R-squared: 0.4379, Adjusted R-squared: 0.3977
F-statistic: 10.91 on 1 and 14 DF, p-value: 0.005236
> fit_3=lm(Win~Yds.Att+Int.Att)
> summary(fit_3)
Call:
lm(formula = Win ~ Yds.Att + Int.Att)
Residuals:
Min 1Q Median 3Q Max
-26.075 -3.099 1.149 6.265 17.112
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.763 27.147 -0.212 0.83517
Yds.Att 12.949 3.186 4.065 0.00134 **
Int.Att -1083.788 357.117 -3.035 0.00958 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.6 on 13 degrees of freedom
Multiple R-squared: 0.7525, Adjusted R-squared: 0.7144
F-statistic: 19.76 on 2 and 13 DF, p-value: 0.0001144
Que.a
Y= -58.77 + 16.39 X1
Proportion of variation = 57.7%
Que.b
Y = 97.538 - 1600.491 X2
Proportion of variation= 43.8%
Que.c
Y = -5.763 + 12.949 X1 - 1083.788 X2
Proportion of variation = 75.3%
Que.d
> data=data.frame(Yds.Att=6.7,Int.Att=0.043 )
> predict(fit_3,newdata=data)
1 34.39452
Predicted value = 34.4%
Actual win % = 7/(7+9) * 100 = 43.8%
The Buffalo Bills performed worse than what we predicted by 9.4% (43.8 - 34.4)
Que.e
Goodness of fit of any model is checked by F test. For this model,
F-statistic: 19.11 on 1 and 14 DF, p-value: 0.0006393
Since p-value is less than 0.0006393, which is less than 0.05, hence we reject null hypothesis and conclude that this model provide good fit.