In: Math
Needing to measure how each factor (WAR, ERA, WHIP) effects salaries. Needing to see the probability an individual with a higher WAR will also receive a larger paycheck versus a person has a lower WHIP or ERA (these the lower the better). For example the out of the top 10 lowest WHIP 6 are in the top 33% of payed players because WHIP being low is a good factor. I believe the best way to show this would be using either t or z distribution formulas. Any help would be greatly appreciated. Looking to see how the different variables of WAR (wins above replacement) (this being higher should be a benefit when talking about pay increases), WHIP (walks plus hits per inning pitched) (the lower the WHIP the better when talking about pay increases), and ERA (earned run average) (the lower the better when talking about pay increases) effect the salaries of the players. Needing to know which effects the pay the most and which the least and by how much percentage wise and how this is done. Trying to show that having a higher WAR correlates to a higher salary and having a lower ERA and WHIP correlates to a higher salary and which has a greater impact in salary. Below are the top 30 base salaries with their average statistics to support their pay.
2019 Base Salaries |
WAR |
WHIP |
ERA |
$35,000,000 |
3.91 |
1.0976 |
3.372 |
$31,500,000 |
5.194 |
1.0562 |
3.108 |
$31,000,000 |
4.888 |
0.9328 |
2.402 |
$31,000,000 |
2.86 |
1.1864 |
3.552 |
$30,262,705 |
7.014 |
0.9454 |
2.72 |
$28,000,000 |
5.488 |
0.9914 |
2.964 |
$27,000,000 |
5.488 |
0.916 |
1.3428 |
$25,000,000 |
0.946 |
1.3782 |
5.196 |
$25,000,000 |
3.586 |
1.1854 |
3.4 |
$22,500,000 |
2.704 |
1.2548 |
3.588 |
$22,000,000 |
2.794 |
1.1334 |
3.918 |
$21,210,000 |
3.138 |
1.214 |
3.472 |
$21,000,000 |
2.474 |
1.1568 |
3.31 |
$21,000,000 |
1.714 |
1.269 |
4.512 |
$20,000,000 |
1.714 |
0.52 |
1.2832 |
$20,000,000 |
1.76 |
1.2148 |
3.86 |
$20,000,000 |
2.804 |
1.2752 |
3.774 |
$18,000,000 |
2.804 |
1.532 |
0.8556 |
$18,000,000 |
1.418 |
1.2752 |
4.616 |
$18,000,000 |
1.534 |
1.0006 |
2.66 |
$18,000,000 |
1.534 |
1.528 |
1.1982 |
$17,900,000 |
1.846 |
1.3468 |
4.608 |
$17,000,000 |
0.962 |
1.3644 |
4.668 |
$17,000,000 |
2.538 |
1.2362 |
3.768 |
$17,000,000 |
4.772 |
1.1256 |
3.548 |
$16,500,000 |
4.772 |
1.196 |
1.303 |
$16,000,000 |
1.432 |
1.269 |
4.21 |
$15,250,000 |
2.036 |
1.3172 |
4.15 |
$15,000,000 |
4.692 |
1.0102 |
3.24 |
$15,000,000 |
4.692 |
1.594 |
1.0662 |
To investigate this, the first step would be to calculate the individual impact of each of the variables - WAR, WHIP and ERA on 2019 Base Salaries. This is calculated using correlation (measure to quantify the linear association between two variables) and coefficient of determination (proportion of variation in Salary explained by the variables individually).
Correlation | Coefficient of Determination | |
Salary vs. WAR | 46% | 21% |
Salary vs. WHIP | -41% | 17% |
Salary vs. ERA | -1% | 0% |
As seen from the above table, none of the variables individually are significant in explaining the variation in Salary. WAR is the strongest in explaining the variation among the 3 variables. Also, from the sign of correlation values - it is evident that higher WAR correlates to a higher salary and lower WHIP/ ERA correlates to a higher salary.
As a next step, all the 3 variables were together used to understand the impact on Salary using Linear Regression.
Coefficients | P-value | |
Intercept | 2,17,53,050 | 1% |
WAR | 16,47,999 | 2% |
WHIP | -74,94,063 | 12% |
ERA | 11,46,069 | 19% |
R Square | 33% |
From the table above, WAR Coefficients can be interpreted as - the impact on Salary with a unit change in WAR and keeping all the other factors (WHIP and ERA) as constant. This is how we can interpret each of the coefficient. ERA coefficient behaves in a manner opposite to what is expected.
Further, both WHIP and ERA are not statistically significant at 95% confidence level (as p-value is > 5%) and only 33% of variation in Salary is explained by these 3 variables together.
Conclusion: The best variable that explains the variability in Salary is WAR but it can alone explain only 21% of variation in Salary.