Spring Training, PECOTA, and the Regular Season
Over at Sports Illustrated last week, I wrote an article on how spring training records aren't all that meaningless. It's been a blast writing over at SI.com, but one of the downsides is that I can't delve into as much nitty-gritty as I can here. When I run a regression or do a study, I like to be able to report things like p-values, standard errors, and other things that baseball analysts use to assess a study's validity. I know it would be tough for me to take a study seriously without those kinds of metrics, so I'm going to provide some of that detail here. The discussion is particularly salient in light of Richard Lederer's recent criticism and discussion of PECOTA.
If you haven't read my original article, the point of my study was to determine whether spring training games had any predictive value at all. Like most fans, I was of the mind that spring stats and standings had pretty much no bearing on what will occur during the regular season. David Cameron had a piece over at Fangraphs saying as such last week (anecdotal evidence only though). I set out to find if this was true.
To measure the impact of spring training, I first needed a "gold standard" prediction. For this I used Baseball Prospectus' PECOTA projections. If spring training data could improve on PECOTA's predictions, I would feel confident in saying that spring training could really be worth a second look.
To do this, I did a regression analysis which tried to predict a team's season WPCT going back to 2003. Obviously the PECOTA prediction was one key variable. The second variable, which was of more interest, was whether a team under or over-performed in spring training, measured by (Spring Training WPCT - PECOTA WPCT).
The results of the model are below:
This gives the formula:
As we see, the spring training variable is significant and positive even when accounting for a team's expertly predicted WPCT. This means that indeed spring training records actually do have some predictive value and do add to our prior knowledge of a team's skills. As I wrote last week, the most surprising spring training teams should adjust their projections by about 3 games or so.
One important thing to note however is that while adjusting a team's projected WPCT by using spring stats is a statistically significant improvement, don't expect a huge boost in accuracy. The Root Mean Squared Error (RMSE) goes from .055 using only PECOTA, to .054 using PECOTA and spring training records. That issue is one that plagues any type of projection system. Even if you include things that really are important and really do increase accuracy, the net result is quite small. To drive home the point, PECOTA's .055 RMSE is not even all that much better than just predicting every team will go .500. The Everybody Plays .500 Projection System has an RMSE of .070.
PECOTA will be correct within 9 games 67% of the time, while the Everybody Plays .500 System will be correct within 11 games 67% of the time. The difference between one of the top projection systems and knowing absolutely nothing is not all that great. That's not a knock on PECOTA, it just underscores the fact that it's really difficult to predict what's going to happen. Knowing spring training records is an improvement, but it still leaves us relatively in the dark.
Do We Have to Regress PECOTA?
Another interesting thing I found in my research into this was that PECOTA's predictions may be overzealous. I had assumed that PECOTA did not regress to the mean in the 2003 and 2004 seasons, when they were predicting the Yankees to win 109 games. They said they did some major overhauls and I assume this was one of them. In my research above, I corrected this for them and regressed to the mean in '03 and '04. The problems were not nearly as bad in subsequent years and I assumed they had been fixed. However, they still seem to persist.
Unbiased predictions would cause a regression of PECOTA to WPCT to have a slope of 1 and no intercept. However, using just 2005-2009 data, we see that this is not the case. We see a quite significant intercept of .10 (p-value of .02). Meanwhile the coefficient for PECOTA is .8, where it should be 1. In essence, PECOTA has been too overzealous in its predictions. If it predicts a team to go 10 games over .500, the best statistical estimate is that the team goes 8 games over .500. When betting against PECOTA, it pays to take the under on good teams and the over on bad teams.
The chart above shows the PECOTA to WPCT regression coefficient, where the ideal is 1. As you can see, from 2005-2007, they accounted well for the regression effect. But in the past two years they've gone downhill. While luck can wreak havoc with any projection system, the problem is beginning to look a little more systematic. Looking at the 2010 projections, they seem to pass the eyeball test (Angels notwithstanding), but I'll be curious to see whether this problem persists in 2010 as well. As I showed above, it wouldn't hurt for them to use spring training stats in their projections as well.