Updating Preseason Predictions
We're coming up on three weeks into the 2009 season and as usual there have been plenty of surprises. Here at Baseball Analysts, Patrick Sullivan has been breaking down those teams which have underperformed and over-performed their expectations. I'll be tackling the same subject from a simply numeric standpoint.
When a surprising start occurs, such as the Florida Marlins' remarkable first two weeks of the season, we have two strongly conflicting pieces of information. On one hand, the Marlins were predicted to be a very bad team (PECOTA's prediction had them winning 72 games) and such teams rarely turn out to be any good. On the other hand, the Marlins started the season 11-1, and teams that start 11-1 are rarely poor clubs. So how can we marry these two pieces of information to determine a ballclub's true skill level?
To do this, first we need some information about the accuracy of such preseason predictions. Baseball Prospectus' PECOTA predictions have been shown to be the most accurate out there, so let's take a look at their accuracy. From 2003 to 2008, the predictions had a root mean squared error of .053 points of WPCT, which means that the predictions were on target give or take about 9 games - not bad at all for preseason prognostication.
Next, we'll have to make sure the predictions aren't biased. PECOTA had major systematic problems in 2003 and 2004, causing the good teams to be overrated and bad teams to be underrated. If Nate Silver had been setting the Vegas lines you could have cleaned up ('03 Yanks at 109 wins? I'll take the under please). Eight out of the top 10 predicted teams won less than predicted, while 8 of the bottom 10 predicted teams won more than predicted. It seems they forgot to regress their predictions to the mean, which would be a major factor in our work here. Luckily since 2004, they've corrected the problem and the over-under on their predictions for good and bad teams have been dead on.
So, for 2009 we can be fairly confident that the PECOTA predictions will be unbiased and our best estimate for the error is about .053 points of WPCT (re-calculating the RMSE based on regressed 2003 and 2004 data reduces the RMSE slightly, but it's still about .053). However, a lot of this potential error in PECOTA's predictions is not PECOTA's fault. Teams play only 162 games in a year, and contrary to the old adage, it doesn't all even out of the course of a season. Even if we know the exact true WPCT of a team, there will still be substantial variation in a team's record. Using the binomial distribution, we can calculate that the standard error of a team's WPCT over a 162 game season is .039 points of WPCT (or about 6.3 games). So, even a perfect prognosticator who could tell you the true WPCT of every team in the league would be off by at least that much (this is over the long run - in the short run of course, anything can happen).
So how much of the error is PECOTA's fault, and how much is random chance that can't be accounted for? If we subtract the variances, we can see that (.053)^2 - (.039)^2 = (.035)^2, meaning that PECOTA's estimate for the true winning percentage of each team has a standard error of .035.
Armed with this information we now have what we need to get started. When the Marlins' started the season 11-1, this was indeed a very unlikely result - but now we can look at each potential true winning percentage to see the likelihood of the Marlins having that true WPCT. The following graph of WPCT distributions shows the results.
The green line indicates the distribution of the Marlins likely true winning percentages based solely on their 11-1 record. Obviously, based on this information alone we would think the Marlins had an extremely high true WPCT - far higher than any major league team could possibly sustain. However, because relatively few games have been played, the distribution is wide, allowing for a wide range of true WPCTs. The red line indicates the likelihood that the Marlins have a particular true WPCT based on PECOTA's preseason prediction. PECOTA predicted the Marlins to have a WPCT of .444, so you can see that the distribution peaks at .444. This distribution is far narrower, reflecting the fact that we know that the true WPCT of an MLB team is almost always somewhere between .350 and .650.
The purple line takes account of both factors. By multiplying the probability of having a certain WPCT under the prediction distribution with the probability of having a certain WPCT under the game distribution, we can derive the probability of having a certain WPCT given both the prediction and game distributions. As we can see, this final distribution is still normal shaped, but is shifted over, reflecting the fact that the Marlins' 11-1 start means that they are likely significantly better than we thought before the season began. The peak of this distribution is now at .471 - much better than .444, but still not over .500. Using this .471 mark to predict a win total in their remaining 150 games and adding it to their win total thus far, we would upgrade their predicted record from 72-90 to 82-80, based on their 11-1 start.
Using this methodology, PECOTA's 2009 predictions, and the current standings, we can make updated predictions for the rest of the 2009 season.
As you can see, two and a half weeks into the season, the preseason predictions still hold a lot of weight. The biggest changes in estimated true WPCT have been Toronto (+.021), Washington (-.019), Florida (+.018), and St. Louis (+.015). This changes the expected final standings as well, with now incredibly, the Seattle Mariners being the favorite to win the AL West. In the AL East, we can see that Tampa has dug itself a major hole behind the Yankees and Red Sox and no longer appears to be their peer.
In the NL, we can see the toll that Florida's four-game losing streak has taken on their predicted true WPCT - when they were 11-1 their estimate was .471, but now they've been downgraded to .462. Elsewhere in the NL, the Dodgers have overtaken the Cubs as the best team in the NL, while the Pirates, despite their 9-7 start, remain baseball's worst (though Houston is now predicted to have the lowest number of wins).
So what happens as the season goes on? Obviously, the more games that have been played, the more weight they will have in the resulting distribution, and the less reliant we are on the pre-season prediction. However, as we showed earlier, the standard error for the pre-season prediction is .035, while the standard error due to random chance after 162 games is .039. What this means is that even after the season is over, the PECOTA prediction is still a more accurate predictor of a team's true talent than the actual record of the team over the course of 162 games!! Based on the standard errors, PECOTA's predictions actually have the accuracy of about 204 major league games!
The following example shows the Chicago White Sox of last year. In this case PECOTA predicted a 77 win season while they actually won 89 - so what's the best estimate of their true WPCT? The following graph shows the result.
As you would expect, the best estimate of the true WPCT is somewhere in the middle (.507). Not only will you notice that the final distribution is in between the other two, but you'll notice that it's also a more narrow distribution with a higher peak and shorter tails. This is because with both pieces of information, we now have more confidence in our estimate of the White Sox' true WPCT. The standard error of the White Sox' final true WPCT estimate is .026, which is better than either the standard error of the PECOTA estimate or the standard error from luck of playing 162 games (actually 163 games for the 2008 White Sox!).
All in all, this is a simple yet powerful way to calculate a team's true skill level based on preseason predictions and the actual games played thus far. This would make it ideal for creating the "power rankings" that every sports related publication seems to release. Of course, it doesn't take into account things like a team's Pythagorean WPCT, trades, or injuries (though these are built into the variance), but this gives a great quick estimate of a team's true skill level based on just two simple pieces of information.
This result also shows just how powerful good preseason predictions are. However, the weight of the preseason prediction is not limited to just PECTOA - even a casual fan's prediction will likely have a weight of over 100 MLB games, which is why fans and commentators "don't believe" in a team even after they've won a lot of games over a 162 game season. Likewise, it's why people can still consider a team dangerous even after a finish around .500. They know that their "gut" perception of a team is actually about as indicative of a team's true talent as the team's record.
As the season goes on and even after it's over, we can keep updating these estimates to keep track of how our perceptions and reality converge to get an estimate of a team's true talent level.