In: Statistics and Probability
The National Football League (NFL) records a variety of performance data for individuals and teams. To investigate the importance of passing on the percentage of games won by a team, the following data show the conference (Conf), average number of passing yards per attempt (Yds/Att), the number of interceptions thrown per attempt (Int/Att), and the percentage of games won (Win%) for a random sample of 16 NFL teams for the 2011 season (NFL web site, February 12, 2012). Click on the datafile logo to reference the data.
Let x1 represent Yds/Att.
|
> # reding the data > df = read.csv(file.choose(),header = T) > str(df) 'data.frame': 16 obs. of 5 variables: $ Team : Factor w/ 16 levels "Arizona Cardinals",..: 1 2 3 4 5 6 7 8 9 10 ... $ Conference: Factor w/ 2 levels "AFC","NFC": 2 2 2 1 2 2 1 1 1 2 ... $ Yds.Att : num 6.5 7.1 7.4 6.2 7.2 8.9 7.5 5.6 4.6 5.8 ... $ Int.Att : num 0.042 0.022 0.033 0.026 0.024 0.014 0.019 0.026 0.032 0.033 ... $ Win. : num 50 62.5 37.5 56.3 62.5 93.8 62.5 12.5 31.3 18.8 ... > head(df) Team Conference Yds.Att Int.Att Win. 1 Arizona Cardinals NFC 6.5 0.042 50.0 2 Atlanta Falcons NFC 7.1 0.022 62.5 3 Carolina Panthers NFC 7.4 0.033 37.5 4 Cincinnati Bengals AFC 6.2 0.026 56.3 5 Detroit Lions NFC 7.2 0.024 62.5 6 Green Bay Packers NFC 8.9 0.014 93.8 > mod = lm(Win.~ Yds.Att, data = df) > summary(mod) Call: lm(formula = Win. ~ Yds.Att, data = df) Residuals: Min 1Q Median 3Q Max -25.020 -15.072 3.643 6.847 33.531 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -58.77 26.18 -2.245 0.041423 * Yds.Att 16.39 3.75 4.371 0.000639 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 15.87 on 14 degrees of freedom Multiple R-squared: 0.5771, Adjusted R-squared: 0.5469 F-statistic: 19.11 on 1 and 14 DF, p-value: 0.0006393
# the fitted model is ŷ = -58.7 + 16.39*x1
Since R-squared: 0.5771, so the model explain 57.7 % of total variation
#b)
> mod = lm(Win.~ Int.Att, data = df) > summary(mod) Call: lm(formula = Win. ~ Int.Att, data = df) Residuals: Min 1Q Median 3Q Max -43.425 -5.277 0.274 16.172 22.883 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 97.54 13.86 7.036 5.9e-06 *** Int.Att -1600.49 484.63 -3.303 0.00524 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 18.3 on 14 degrees of freedom Multiple R-squared: 0.4379, Adjusted R-squared: 0.3977 F-statistic: 10.91 on 1 and 14 DF, p-value: 0.005236 |
|
|
#c)
> mod = lm(Win.~ Int.Att + Yds.Att, data = df) > summary(mod) Call: lm(formula = Win. ~ Int.Att + Yds.Att, data = df) Residuals: Min 1Q Median 3Q Max -26.075 -3.099 1.149 6.265 17.112 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.763 27.147 -0.212 0.83517 Int.Att -1083.788 357.117 -3.035 0.00958 ** Yds.Att 12.949 3.186 4.065 0.00134 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 12.6 on 13 degrees of freedom Multiple R-squared: 0.7525, Adjusted R-squared: 0.7144 F-statistic: 19.76 on 2 and 13 DF, p-value: 0.0001144
the fitted model is ŷ = -5.763 + 12.949*x1 -1083.788*x2
Since R-squared: 0.7525, so the model explain 75.25 % of total variation
D) it is given that The average number of passing yards per attempt for the Seattle Seahawks during the 2011 season was 6.8, and the team’s number of interceptions thrown per attempt was 0.028.
ie. x1 = 6.8 and x2 = 0.028
therefore,the percentage of games won by the Seattle Seahawks during the 2011 season. is
ŷ = -5.763 + 12.949*6.8 -1083.788*0.028
e) T he estimated regression equation that uses only the average number of passing yards per attempt as the independent variable has explain only 57.7% of variation ,so it is not good fit.
ŷ = 51.949
For the 2011 season, the Seattle Seahawks' record was 7 wins and 9 loses.
Therefore, the winning percentage is 7/16 = 0.43
the actual winnig percentage is 43% while our model predict 51.94% , this means the modelmis not very good.