MVP Award Probabilities: Accounting for Sampling Variation
This week wraps up the MLB awards. In the AL, Joe Mauer took home the MVP and Zack Greinke took home the Cy Young. In the NL, the hardware will likely go to Albert Pujols for MVP. Meanwhile, in one of the tightest three-way races in recent memory, Tim Lincecum squeaked out a victory for the Cy Young. Since these four players won the awards, they must be the top players of 2009, right?
Surely, I jest. If you’re reading this, you probably have figured out long ago that the Baseball Writers Association of America does not always award the MVP and Cy Young to the most productive or valuable players (this year, however I happen to agree with all four of their picks). However, even making the quantum leap that the BBWAA is the population most qualified to determine the winners of these awards, there is still no guarantee that the small group of writers who actually get to vote for the awards will accurately mirror the opinions of the group they represent. The reason: simple statistical sampling variability. If we consider the actual voters as a simple sample of 32 voters (or 28 in the AL) who represent a hypothetical universe of similarly qualified baseball writers, analysts, and experts, we can see that there is natural variability in the votes of the MVP and Cy Young, and that the “right” player (defined as the consensus pick among the entire universe of qualified baseball experts) may not always be chosen.
On the basis of the 32 BBWAA writers’ votes, Tim Lincecum was deemed the best pitcher of 2009 by the baseball establishment. But was Lincecum’s really the consensus pick for the NL Cy Young? Or did Lincecum just get lucky while the majority of qualified experts really preferred somebody else? Based on the results of the voting, it’s clear that some baseball experts preferred Lincecum (11 first-place votes), some preferred Wainwright (9 first-place votes), and some preferred Carpenter (12 first-place votes). When the Cy Young votes were tallied, the group of 32 voters as a whole preferred Lincecum, but it was very close. Perhaps Lincecum simply got lucky and, just by chance, had more of his supporters in the sample of 32 voters. Perhaps the universe of qualified baseball experts as a whole actually thought Carpenter or Wainwright was most deserving of the award.
This article attempts to find the probability that Lincecum really did have the most support among the baseball establishment, and that the 32 voters who happened to have a vote this year really did select the “right” candidate.
Calculating the Probabilities
One way to estimate the variability associated with the MVP and Cy Young awards is to use a statistical resampling method, in which you basically take a sample of the 32 ballots with replacement. This method of essentially simulating the MVP balloting many times based upon the real MVP balloting would be great, except for one snafu: it appears very difficult, if not impossible, to find the results of each individual ballot. Without having the individual ballots, we can’t use this technique.
In the end I settled on a different kind of approach. To start with, I calculated both the mean and standard deviation of each player’s point total. I then used the normal distribution (which is applicable due to the Central Limit Theorem) to determine how likely it was that a player, given a certain “true” expected point total, would have scored as many points as he actually did in the Award voting. For instance, if Lincecum’s true expected Cy Young point total among the universe of all writers was 90, what was the probability that he would have scored the exactly the 100 points that he actually scored? In this case, about 2.4%. How about if Lincecum’s true average was 91? As expected, it's a little higher, at 2.7%. We do this for every potential “true” expected value of Lincecum’s point total.
In the end, we want to determine the probability that Lincecum’s “true” expected point value was the highest of all the Cy Young contenders? The problem of course is that Lincecum’s point total is highly correlated with the other contenders, so we can’t use assume independence among each pitchers to determine this probability. Furthermore determining the exact correlation between two players’ point totals is very difficult.
Instead, what we can do is estimate a point total required for victory, and calculate the odds of each player having a true value greater or equal to this necessary total. In a two-person race, this necessary total is usually simply half-way between the winner and the runner-up’s point total. In a three-way or other type of race, the number is a little trickier to figure. In the end, we can determine expected point value necessary to win by choosing the value for which the sum of all players’ probability totals 100%. For example in the 2009 Rookie of the Year voting, the points “necessary for victory” was 100. The probability that Chris Coughlin, who actually scored 105 points with a standard error of about 12 points, had a “true” expected point value of 100 points or higher was 70%. For J.A. Happ, who scored 94 points, the probability of having a true point value of 100 or higher was just 30%. This means that based on the sample of 32 votes, there was a 70% chance that Coghlan really was the consensus choice for Rookie of the Year among a greater universe of voters, and a 30% chance that Coghlan just lucked into the award and that Happ actually had more support among all potential voters.
The 2009 Awards
How did the rest of the awards go? In the AL MVP, Joe Mauer won 27 out of 28 first place votes and crushed Mark Teixeira with a point total of 387 to 225. In this case there was little doubt that the baseball writers as a whole preferred Mauer as the AL MVP, and this method shows Mauer with a virtually 100% chance of being the “true” writer’s choice. The same was true with the AL Cy Young, where Zack Greinke was almost certainly the writers' choice for the award.
In the NL however, things went much differently. Lincecum scored 100 points and was the winner of the Cy Young. Carpenter scored 94 points, while Wainwright scored 90 points. If just a handful of voters had switched his first-place vote from Lincecum to either Carpenter or Wainwright, the outcome would have been different. So, what was the probability that Lincecum was truly the choice of the baseball writers as a whole? Lincecum scored 100 points with a standard error of 9.2 points. Carpenter scored 94 points with a standard error of 9.2 points, while Wainwright scored 90 points, with a slightly higher standard error of 10.5 points.
So what was the probability that each pitcher’s true point value was greater than the roughly 99 points that were required to win the award? Lincecum had a 53% chance of having a true expected point total above 99. Carpenter had a 28% chance, and Wainwright had a 19% chance. This analysis shows that because there were only 32 voters in such a close vote, the true writers’ choice could have been any of the three. In the end, Lincecum was the lucky one, in garnering the most support from the 32 writers that actually had a vote. However, there is only a 53% chance that Lincecum had the most support from the hypothetical universe of all expert baseball writers. Carpenter or Wainwright may have been the ones who actually “deserved” the award. However, because MLB only surveys 32 writers, we’ll never know who the greater universe of writers’ true choice was.
Looking at the Rookie of the Year voting, we see similar uncertainty. The AL Rookie of the Year vote was fairly close, with Andrew Bailey winning 13 of 28 first place votes and winning by the margin of 88-65 over Elvis Andrus and Ricky Porcello. However, because of the small sample size, it’s no guarantee that Bailey truly had the writers' backing. There was an 80% chance that Bailey was the true choice, however Andrus and Porcello also may have been the true RoY winners, with an 11% and 9% chance respectively. Meanwhile the NL Rookie voting was a 70%-30% split as I mentioned previously.
Probability of Being the True MVP/Cy Young/RoY 2003-2009
Below you can see the probability of being the “true” MVP, Cy Young, and Rookie of the Year for each league over the several years.
As you can see from the chart, many MVP and Cy Young Award winners were not certain winners. Had a different set of writers been voting, things might have turned out differently. As a general rule, one cannot be sure that the MVP has been selected "correctly" unless one candidate has about a 70 point lead in the voting. For instance, in 2008, Albert Pujols garnered 18 first-place votes and bested Ryan Howard by 61 points in the voting. However, there was still a 2% chance that Albert won by luck and that Ryan Howard was the true writers choice for MVP. A win of 40 points means that the winner had about a 90% likelihood of being the “true” MVP. Meanwhile a win of 20 points corresponds to about a 75% probability of being the true consensus selection.
In the Cy Young or Rookie of the Year, the margins required are not as steep. A 50-point lead or more virtually guarantees that the right person got the award. A 20-point lead means that the winner had about a 90% chance of being the true consensus pick, a 10-point lead corresponds to a 70% probability, while a 5-point lead corresponds to about a 60% probability.
MVP Award Probability vs. MVP Award Shares
This system, which I'll call MVP Award Probabilities is an alternative to the “MVP Award Shares” statistic, though they really measure seperate things. In that system, a player is given award shares even when it is clear that there was absolutely no chance that he was considered the most valuable player by the writers. For example, Mark Teixeira had an MVP award share of 57% this year, despite getting no first place votes and being undeniably NOT considered the best player in the AL by the BBWAA. Additionally, players can have very similar award shares even when it is fairly clear that one player was the consensus pick. For example, in the 2008 NL MVP race, Albert Pujols had a 98% chance of being the “true” MVP, but the difference in award shares was not very great (82% to 69%).
This Award Probability system also has the advantage of handing out exactly one award - if you sum the award probability percentages, they add to exactly 100%. With this method, we can give Albert Pujols 98% of an MVP and Ryan Howard 2% of an MVP in the 2008 race. Though Howard certainly had some support among the writers for MVP, it was fairly clear that the consensus choice was Pujols, hence we give him credit for nearly an entire MVP award. In the case of the 2009 Cy Young Award, even though Lincecum won the award, there was only about a 50% chance that he “deserved” it. Hence, we can award him about 50% of a Cy Young Award. This, in contrast to 2008, when Lincecum was clearly the Cy Young choice of the writers over Brandon Webb.
In the end, these Award Probabilities are useful for giving out partial awards in years when there was no consensus award winner. Because the sample size of voters is quite small, often we can't be sure who really had biggest backing of baseball experts. Calculating these probabilities is an interesting way of accounting for this uncertainty.