In: Math
1) Use worksheet “baseball stats” to perform a multiple regression analysis on the dataset found in BB2011 tab, using Wins as the dependent variable, and League, ERA, Runs Scored, Hits Allowed, Walks Allowed, Saves, and Errors as candidates for the independent variables. Perform the analysis at the 5% significance level.
Team | League | Wins | E.R.A. | Runs Scored | Hits Allowed | Walks Allowed | Saves | Errors |
Baltimore | 0 | 69 | 4.89 | 708 | 1568 | 535 | 32 | 110 |
Boston | 0 | 90 | 4.20 | 875 | 1366 | 540 | 36 | 92 |
Chicago White Sox | 0 | 79 | 4.10 | 654 | 1463 | 439 | 42 | 79 |
Cleveland | 0 | 80 | 4.23 | 704 | 1482 | 463 | 38 | 110 |
Detroit | 0 | 95 | 4.04 | 787 | 1406 | 492 | 52 | 103 |
Kansas City | 0 | 71 | 4.44 | 730 | 1487 | 557 | 37 | 95 |
Los Angeles Angels | 0 | 86 | 3.57 | 667 | 1388 | 476 | 39 | 93 |
Minnesota | 0 | 63 | 4.58 | 619 | 1564 | 480 | 32 | 119 |
New York Yankees | 0 | 97 | 3.73 | 867 | 1423 | 507 | 47 | 102 |
Oakland | 0 | 74 | 3.71 | 645 | 1380 | 519 | 39 | 124 |
Seattle | 0 | 67 | 3.90 | 556 | 1369 | 436 | 39 | 108 |
Tampa Bay | 0 | 91 | 3.58 | 707 | 1263 | 504 | 32 | 73 |
Texas | 0 | 96 | 3.79 | 855 | 1327 | 461 | 38 | 114 |
Toronto | 0 | 81 | 4.32 | 743 | 1433 | 540 | 33 | 110 |
Arizona | 1 | 94 | 3.80 | 731 | 1414 | 442 | 58 | 90 |
Atlanta | 1 | 89 | 3.48 | 641 | 1332 | 521 | 52 | 83 |
Chicago Cubs | 1 | 71 | 4.33 | 654 | 1439 | 580 | 40 | 134 |
Cincinnati | 1 | 79 | 4.16 | 735 | 1414 | 539 | 39 | 91 |
Colorado | 1 | 73 | 4.43 | 735 | 1471 | 522 | 41 | 98 |
Houston | 1 | 56 | 4.51 | 615 | 1477 | 560 | 25 | 116 |
Los Angeles Dodgers | 1 | 82 | 3.54 | 644 | 1287 | 507 | 40 | 85 |
Miami | 1 | 72 | 3.95 | 625 | 1403 | 500 | 40 | 93 |
Milwaukee | 1 | 96 | 3.63 | 721 | 1348 | 440 | 47 | 111 |
New York Mets | 1 | 77 | 4.19 | 718 | 1482 | 514 | 43 | 116 |
Philadelphia | 1 | 102 | 3.02 | 713 | 1320 | 404 | 47 | 74 |
Pittsburgh | 1 | 72 | 4.04 | 610 | 1513 | 535 | 43 | 112 |
St. Louis | 1 | 90 | 3.74 | 762 | 1461 | 448 | 47 | 116 |
San Diego | 1 | 71 | 3.42 | 593 | 1324 | 521 | 44 | 94 |
San Francisco | 1 | 86 | 3.20 | 570 | 1260 | 559 | 52 | 104 |
Washington | 1 | 80 | 3.58 | 624 | 1403 | 477 | 49 | 104 |
a) Create a full write up, where you write your statistical analysis step by step. In your write up, make sure you address the following points.
· Methodology and steps that you took to get to your final regression equation.
· Final regression equation output.
· What is the final regression equation?
· Interpret all the coefficients in the equation.
· Speak to whether the signs on the coefficients make sense.
· Interpret R squared.
· Include a full residual analysis.
· What is the residual of the Tampa Bay observation?
b) Now, use tab BB2012 to make predictions of wins in 2012, using the model you created with the 2011 stats.
· How many games are the Giants (SFG) expected to win in 2012?
· Which team is predicted by the model to have the worst record in 2012?
· Which team is predicted by the model to have the best record in 2012?
Team | WINS | E.R.A. | Runs Scored | Hits Allowed | Walks Allowed | Saves | Errors |
ARI | 81 | 3.93 | 734 | 1432 | 417 | 39 | 91 |
ATL | 94 | 3.42 | 700 | 1310 | 464 | 47 | 89 |
BAL | 93 | 3.9 | 712 | 1433 | 481 | 55 | 109 |
BOS | 69 | 4.7 | 734 | 1449 | 529 | 35 | 100 |
CHC | 61 | 4.51 | 613 | 1399 | 573 | 28 | 105 |
CHW | 85 | 4.02 | 748 | 1365 | 503 | 37 | 70 |
CIN | 97 | 3.34 | 669 | 1356 | 427 | 56 | 89 |
CLE | 68 | 4.78 | 667 | 1503 | 543 | 43 | 97 |
COL | 64 | 5.22 | 758 | 1637 | 566 | 36 | 122 |
DET | 88 | 3.75 | 726 | 1409 | 438 | 40 | 97 |
HOU | 55 | 4.56 | 583 | 1493 | 540 | 31 | 118 |
KCR | 72 | 4.3 | 676 | 1504 | 542 | 44 | 113 |
LAA | 89 | 4.02 | 767 | 1339 | 483 | 38 | 97 |
LAD | 86 | 3.34 | 637 | 1277 | 539 | 40 | 97 |
MIA | 69 | 4.09 | 609 | 1448 | 495 | 38 | 104 |
MIL | 83 | 4.22 | 776 | 1458 | 525 | 44 | 99 |
MIN | 66 | 4.77 | 701 | 1536 | 465 | 35 | 107 |
NYM | 74 | 4.09 | 650 | 1368 | 488 | 36 | 102 |
NYY | 95 | 3.84 | 804 | 1401 | 431 | 51 | 76 |
OAK | 94 | 3.48 | 713 | 1360 | 462 | 47 | 110 |
PHI | 81 | 3.83 | 684 | 1387 | 409 | 42 | 100 |
PIT | 79 | 3.86 | 651 | 1357 | 490 | 45 | 112 |
SDP | 76 | 4.01 | 651 | 1356 | 539 | 43 | 123 |
SEA | 75 | 3.76 | 619 | 1359 | 449 | 43 | 71 |
SFG | 94 | 3.68 | 718 | 1361 | 489 | 53 | 112 |
STL | 88 | 3.71 | 765 | 1420 | 436 | 42 | 107 |
TBR | 90 | 3.19 | 697 | 1233 | 469 | 50 | 113 |
TEX | 93 | 3.99 | 808 | 1378 | 446 | 43 | 86 |
TOR | 73 | 4.64 | 716 | 1439 | 574 | 29 | 102 |
WSN | 98 | 3.33 | 731 | 1296 | 497 | 51 | 96 |
solution:
Regression line is given as
where
y= dependent/ response variable
and so on
= random error/ residual effect and
here, the dependent variable or y= Wins
we can minimise the error by using method of least squares, ie,
hence solving further, we get Sum of squares to construct ANOVA and calculate R2 values which are as follows:
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 7 | 3465.793 | 495.1132 | 34.78095 | 1.80939E-10 |
Residual | 22 | 313.174 | 14.23518 | ||
Total | 29 | 3778.967 |
The R values are:
Regression Statistics89 | |
Multiple R | 0.957668 |
R Square | 0.917127 |
Adjusted R Square | 0.890758 |
Standard Error | 3.772954 |
Observations | 30 |
The regression line is given as:
Wins = 93.70706 - 1.45486 League +-9.65899 E.R.A + 0.084965 Runs
Scored -0.02043 Hits Scored -0.03499 Walks Allowed + 0.425586 Saves
- 0.04132 Errors +
Here, R2=0.917127 and adjusted R2=0.89075
that indicates that 91%of response variable is explained by the variables of the model.
Now, let us interpret the significance of coefficients
the hypothesis is H01= there is no significance difference between effects of League and similar null hypothesis for other variables respectively. The alrenative hypothesis H11= there is significance between effects of League and similar alternative hypothesis for other variables respectively.
Hence, the decision criteria is to reject H0 if the p-value < 0.05
The following table tells about the coefficients and p-values and confidence intervals of the model :
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 93.70706 | 20.93209 | 4.476717 | 0.000188279 | 50.29655 | 137.1176 | 50.29655 | 137.1176 |
League | -1.45486 | 1.720024 | -0.84584 | 0.406749342 | -5.02197 | 2.11225 | -5.02197 | 2.11225 |
E.R.A. | -9.65899 | 4.368845 | -2.21088 | 0.037743681 | -18.7194 | -0.59856 | -18.7194 | -0.59856 |
Runs Scored | 0.084965 | 0.0097 | 8.759181 | 1.26918E-08 | 0.064848 | 0.105082 | 0.064848 | 0.105082 |
Hits Allowed | -0.02043 | 0.019296 | -1.05872 | 0.301213942 | -0.06045 | 0.019588 | -0.06045 | 0.019588 |
Walks Allowed | -0.03499 | 0.020262 | -1.72707 | 0.098172005 | -0.07702 | 0.007027 | -0.07702 | 0.007027 |
Saves | 0.425586 | 0.143276 | 2.970398 | 0.007062248 | 0.12845 | 0.722723 | 0.12845 | 0.722723 |
Errors | -0.04132 | 0.054939 | -0.75211 | 0.459958838 | -0.15526 | 0.072616 | -0.15526 | 0.072616 |
Here, the variables, League, Hit Allowed, Walks Allowed and Errors are significant and ERA, Runs Scored and Saved are insignificant.
The signs of the coefficients do not tell anything about the significance or insignificance of the variable.
The residual for the variables as per the ANOVA table is 14.23518