In: Statistics and Probability
Simpson's Paradox, Derek -vs- David: Averaging
across categories can be misleading but this can be resolved with
weighted averages.
In baseball, the batting average is defined as the number
of hits divided by the number of times at bat. Below is a table for
the batting average for two different players for two different
years.
The number in parentheses gives the number of times at bat for each
player for each year.
Batting Average (# of times at bat) | ||||||
1995 | 1996 | |||||
Derek | 0.249 (45 times at bat) | 0.313 (575 times at bat) | ||||
David | 0.252 (415 times at bat) | 0.322 (145 times at bat) | ||||
(a) What are the averages of the two batting averages for Derek(xDerek) and David(xDavid)? Do NOT use a weighted average, just take the mean of 1995 and 1996 batting averages. Round your answers to 3 decimal places.
(b) Who had the higher average batting average using the non-weighted average?
(c) Using a weighted average, calculate the average batting averages for Derek(xDerek)and David (xDavid). Round your answers to 3 decimal places.
(d) Who had the higher average batting average using the weighted average?
(e) What caused the discrepancy in average batting averages? Derek's higher average occurred with more times at bat (575). David's higher average occurred with fewer times at bat (145). Derek's lower batting average was based on a small number of times at bat (45). All of these contributed to the discrepancy. |
1995 | 1996 | ||||||
Derek | 0.249 | 45 | 0.313 | 575 | 620 | ||
David | 0.252 | 415 | 0.322 | 145 | 560 | ||
a) | average | Derek | 0.281 | ||||
David | 0.287 | ||||||
b) | David has higher average | ||||||
c) | Weigthed average | Derek | 0.308355 | ||||
David | 0.270125 | ||||||
d) | |||||||
Derek has higher average using the weighted average |
1995 | 1996 | |||||
Derek | 0.249 | 45 | 0.313 | 575 | =SUM(C4,F4) | |
David | 0.252 | 415 | 0.322 | 145 | =SUM(C5,F5) | |
a) | average | Derek | =AVERAGE(B4,E4) | |||
David | =AVERAGE(B5,E5) | |||||
b) | David has higher average | |||||
c) | Weigthed average | Derek | =(B4*C4+E4*F4)/G4 | |||
David | =(B5*C5+E5*F5)/G5 | |||||
d) | ||||||
Derek has higher average using the weighted average | ||||||