An Unfiltered Interview with Nate Silver
I met Nate Silver in the summer of 2003 at a Baseball Prospectus Pizza Feed in Anaheim. We sat together and discussed nothing but baseball for a couple of hours, developing a mutual respect that continues to this day.
After graduating from the University of Chicago in 2000 with a Bachelor of Arts in Economics, Nate "took one of those consulting jobs that an economics grad from the U of C might be expected to take." He started working on PECOTA in 2002 and found a couple of advocates for it in Gary Huckabay and Keith Woolner. Baseball Prospectus was looking for a new projection system in anticipation of launching its premium service and purchased PECOTA from Nate in 2003. A year later, Gary left BP to pursue a series of consulting opportunities both inside and outside baseball and Nate took over as Executive Vice President of the company.
I caught up with Nate over the weekend to discuss all things PECOTA. Grab a cup of coffee, pull up a chair, and enjoy.
Rich: PECOTA - Player Empirical Comparison and Optimization Test Algorithm. My goodness, that is a mouthful. How did you come up with that?
Nate: The original version of PECOTA was just for pitchers - I felt at the time like there was much more room for improvement in the realm of pitcher forecasts. It was only after I'd pitched the idea to Gary Huckabay, who was running BP at the time, that he said "hey kid, you'd better come up with a hitter version too." So the 'P' in the garbled acronym that you see above originally stood for 'Pitcher,' and from there it was just a matter of stringing together various words and letters that seemed relevant enough to come up with a catchy acronym. I think the finalists were PECOTA and PETRY - you can see the influence of American League Baseball circa 1987. But I couldn't come up anything to match the 'Y' in PETRY.
RL: Bill Pecota and Dan Petry. I like that Pecota won out. He seems more comparable to Mario Mendoza, the player behind - or right on top of - the Mendoza Line.
NS: Sure, but when I was growing up in Michigan and listening to Ernie Harwell, I didn't think of him that way - it seemed like Bill Pecota was always a real thorn in the side of the Tigers. And now, thanks to the magic of Retrosheet and Baseball Reference, we can look back and see that this wasn't entirely my imagination - Pecota was a .303 lifetime hitter against Detroit.
RL: Did Bill James and his similarity scores have any influence on your work?
NS: The basic idea behind PECOTA is really a fusion of two different things - James's work on similarity scores and Gary Huckabay's work on Vlad, BP's previous projection system, which tried to assign players to a number of different career paths. I think Gary used something like thirteen or fifteen separate career paths, and all that PECOTA is really doing is carrying that to the logical extreme, where there is a essentially a separate career path for every player in major league history. The comparability scores are the mechanism by which it picks and chooses from among those career paths.
RL: Go on.
NS: There are some differences, though, between backward-looking similarity scores and forward-looking scores like the ones that PECOTA apply. James, I don't think, really intended for his similarity scores to be used for projection purposes; instead they were introduced in The Politics of Glory as a way to assess a player's fitness for the Hall of Fame. If you're trying to determine whether Tim Raines should be in Cooperstown, the fact that he was 5'8" shouldn't really matter. But if you're trying to figure out how Dustin Pedroia is going to develop, the fact that he's 5'8" does matter.
RL: OK...so tell me, in addition to body type, how many factors do you take into account and which ones have the most impact universally?
NS: There are currently 13 different comparability factors in place for position players and 12 for pitchers, one of which changes depending on whether we're looking at a major league or minor league player. We use MLB career length for major leaguers but since this isn't relevant for a prospect we use the level he played at instead (that is, PECOTA prefers to compare a Double-A player to other Double-A players).
The most important variables for hitters are batting average, walk rate, isolated power, strikeout rate, speed score, and position. The most important for pitchers are strikeout rate, walk rate, isolated power allowed, and usage pattern. But the weights are really fairly flat - there's not any one factor that dwarfs the others in importance.
RL: Why have you chosen to keep the detailed formulas proprietary?
NS: The short answer is that we're trying to make a living off this stuff, and we're reluctant to give away trade secrets. I know that there's a strong tradition of 'open source' in the sabermetric community, which goes all the way back to Bill James, and I recognize and appreciate that. BP at times has been guilty of being too aloof from the sabermetric community, which is something that we're trying to reverse. We've debuted the Unfiltered blog, and we hope to add some community and forum features within the next couple of months. I think we've gotten a lot better about citing other good work in the field, whether it's work that you've done or the Hardball Times does or John Dewan does or what we might read at Baseball Think Factory or on Sons of Sam Horn. We're talking to people like Ron Shandler and Bill James on our radio program. We're running excerpts from 'The Book' on our website and telling everyone within earshot that they needed to buy it six months ago. But we are trying to run a business - for many of us, Baseball Prospectus is all that we do.
RL: As an owner and operator of a business myself, I can appreciate that.
NS: I would also argue that, although we haven't literally given away the formulas and algorithms, PECOTA is perhaps the best-explained projection system in history. There were long essays about PECOTA's methodology in the 2003, 2004 and 2006 annuals, and another in the Kevin Maas chapter of Baseball Between the Numbers. There's a comprehensive glossary up on the website and there have been numerous questions we've fielded about it in chats and articles over the years. The largest barrier to reverse engineering PECOTA, frankly, is not the ingeniousness of the formulas or anything like that but really just the amount of work that it has required in terms of fitting all the puzzle pieces together. I've personally put well over one thousand man hours into PECOTA, and that's before accounting for things like the Davenport Translations or VORP that feed into the system.
RL: How often do you tweak the system? Can you isolate how much these revisions have improved the overall projections from one year to the next?
NS: I'm a perfectionist about this stuff, and so we're making improvements pretty much constantly. A lot of this is accomplished through trial and error. For example, the forecast that we ran for Homer Bailey based on the previous version of the system looked unduly pessimistic to me, so I went back and said "hmm, are there any assumptions that we're making about Bailey that might not be treating him fairly." It turned out that there was one such assumption. Bailey improved a lot from 2005 to 2006 and if a veteran pitcher had experienced that sort of improvement, you'd want to regress it back to the mean fairly heavily. But we found out that a 21-year-old pitcher shouldn't be treated the same way a 31-year-old pitcher - if a 21-year-old improves markedly from year to year, there's a much better chance of most of that improvement sticking. So we made this fix, ran the pitcher projections again, and I think Bailey ended up with something like 30 points shaved off his ERA. PECOTA still thinks that there's a much bigger difference between Bailey and say Phil Hughes than most people give credit for, but they're closer than they were before.
So in some sense, there's as much art in PECOTA as there is science - you need to be able to ask questions and test assumptions based on watching baseball games, watching players develop, and talking to people both inside and outside BP. If you outsourced PECOTA to a bunch of tech geeks in Bangalore who didn't have the horse sense to say "Homer Bailey's forecast looks wrong to me," you wouldn't have the same system.
RL: I would agree. You need to know the ins and outs of your business to know if something passes the smell test.
NS: Probably the most challenging part of running an independent business like BP is that you necessarily need to be a jack of all trades. It seems to me the only way to get anything done is to be willing to trust your intuition.
RL: What have been the best improvements since the original formula was introduced in 2003?
NS: Some of the improvements are simply a matter of gaining access to new types of data. Being able to look at groundball/flyball numbers for pitchers, for example, which we didn't do originally, is greatly helpful. Or, we can look at play-by-play data to develop a better version of speed scores, since we know exactly how many opportunities a guy had to ground into a double play instead of just approximating.
Still, the improvements that I'm proudest of are things related to long-term player valuation, such as MORP and the detailed five-year forecasts and the Upside score. There are a lot of systems that can put together a pretty reasonable forecast for a player but fewer that can give you a sound idea of what that forecast really means in the bigger scheme of things. And besides, it's kind of cool to know how many triples Howie Kendrick is going to hit in 2010.
RL: What's the current over/under?
RL: Are you taking any futures bets on this one?
NS: No, but I'd take the over. I remember seeing a game at Comiskey Park last year where Kendrick went 0-for-5. And even in that performance, he looked like a future star; I loved his stroke, his plate coverage, the way that he got out of the box. But now I'm starting to sound like a scout.
RL: If I understand correctly, your database goes back to 1946. Is that a matter of availability or is there another reason why you have chosen to use post-World War II players only?
NS: There are some kinds of data, like the groundball-flyball stuff, that just aren't available for the years before 1957 that Retrosheet hasn't covered yet. Still, you can always guesstimate this data where you don't have it, as we do for the 1946-1956 players in our system.
Really, the decision not to look at the pre-WWII data has mostly to do with the feeling that baseball after WWII is more similar to modern baseball than it is different, and that baseball before WWII was more different than it is similar. If you look at World War II and maybe the ten or fifteen years that followed it, you have a huge number of different things that are happening. The integration of the game, both to black players and to Latin America. Night baseball. Relief pitching. Expansion. The evolution of the farm system. Increased consistency in ballpark architecture. The improvements in nutrition and the standard of living made possible by the prosperity boom of the '50s. The professionalization of baseball. The very earliest work in sabermetrics that people like Allan Roth were doing. All of these things were happening more or less at the same time, and World War II is as convenient a cutoff point as anything.
RL: You incorporate both minor league and international baseball statistics into your forecasts. Have you given consideration to using college stats, adjusted for level of competition, strength of schedule, ballparks, and pitchers/batters faced, in your projections for younger players?
NS: I'd love to look at college stats, but thus far, neither Clay nor I have put in the work to build credible translations. I also worry a bit about the aluminum bat thing. I know that Kevin Goldstein is convinced that there are a lot of good aluminum bat hitters that just won't make the transition to wood. It also perhaps changes pitching philosophy, requiring you to work around hitters a bit more. Someone like Justin Verlander, for example, had some pretty high walk rates in college, but that control got much better once he was pitching to wood bats and with a professional defense behind him.
RL: Comparing college and professional baseball may not be apples to apples but it's not fruits to vegetables either. Scouting is obviously important when it comes to evaluating amateur talent. But I wouldn't dismiss performance analysis, especially when play-by-play data becomes standard. Put me in charge and I would place a lot of weight on strikeout, walk, and groundball rates at the college level and even in high school, for that matter. Combining this information with the 20-80 scouting reports would be very helpful in my mind.
NS: We could do some incredibly interesting things if we had a complete set of 20-80 scouting reports to look at. As long as you're able to quantify something, we should be able to incorporate in PECOTA, and we're already emphasizing things like speed score and body type and age relative to league that were once considered more in the scouting domain. But I'm not keeping my fingers crossed waiting for some team to gift their scouting database to us.
RL: How do splits come into play when it comes to PECOTA? Aside from using a pinpoint ballpark factor, is the system sophisticated enough to differentiate between RHB vs. LHB and RHP vs. LHP when analyzing player performance at home and on the road?
NS: Our park factors are very detailed, but we could probably do a bit more with LH/RH splits. If you look at someone like Jason Michaels, about half his at-bats while he was with the Phillies were against left-handed pitchers. That percentage went way down in Cleveland since he was being used as more of an everyday player and - surprise, surprise - his numbers got a lot worse. I was hoping to get around to this for this year and didn't, but it's high on the agenda for 2008.
RL: Drilling down even deeper, can PECOTA take into account spray charts to determine if a player is a pull hitter vs. an opposite-field type as a factor in determining how a player might perform at one home ballpark vs. another, especially in the event of a trade?
NS: What we're going to see over the next three or five years is a whole revolution in the way that data on the baseball field is described and quantified - we'll go beyond simply recording the outcome of every play into literally tracking the movement of every object on the baseball field from start to finish. I'd certainly hope to incorporate as much of that stuff as possible into PECOTA, but it's a lot of work, and I'd probably like to wait a year or two for the data to standardize before we do so.
RL: Switching gears here a bit, explain the idea behind upside, as well as the percentages assigned to breakout, improvement, collapse, and attrition.
NS: Upside is explained at great length in this article. The basic idea is to look at the probability of a player being an above-average performer at the major league level in the years during which he's still under club control (that is, before he becomes a free agent). This is really what you're hoping for when you're investing in scouting and development - that a player will be both very good and comparatively cheap. Upside doesn't give you any credit just for "being there," like Luis Rivas or somebody.
What the breakout, collapse and improvement numbers do is look at how a player's performance is likely to change relative to his 'baseline.' This can be confusing because 'baseline' implies looking at his last three years of performance, rather than just what he did in 2006. So Hanley Ramirez, for example, has a breakout rate of 31%, which is very high, even though PECOTA actually expects his performance to be a bit worse than last year. His 'breakout' is not in achieving a new level of performance (although this is possible) so much as it is sustaining a performance that might seem unlikely based on his longer track record. We've experimented with a lot of different definitions of breakout rates and I'm convinced that this is the most helpful version, even though it can sometimes be counter-intuitive.
'Attrition' is really in a different family than breakout, improve and collapse, in that it measures prospective changes in playing time rather than performance. Specifically, it's attempting to estimate the probability of a radical decrease in playing time. This could be because of injury, but it could also be because the player gets benched, retires, suspended, starts spending too much time hanging out with Jeff Juden, and so forth.
RL: Does it make sense for the percentages to exceed 100?
NS: Yes, because breakout rate is a subset of improvement rate. Improvement rate is the chance that a player's performance improves at all relative to his baseline; breakout rate is the chance that it improves a lot.
RL: Which competitive projection systems do you value the most?
NS: I'm reluctant to name too many names because I feel like I'll unwittingly leave someone out, but if the PECOTAs weren't around, the first system I'd look at are probably the projections that Tom Tippett does for Diamond Mind.
RL: Last year, PECOTA outperformed the other methodologies in predicting hitting (using OPS as the gauge) but it fell a bit short on the pitching side of the ledger. Is that a one-year aberration or is there something in the former or latter that makes PECOTA better or worse than the others in forecasting hitting and pitching results?
NS: We were pleased with the results that we saw in the study you're referencing - PECOTA had a large lead for position players and was a very close second for pitchers. With that said, it wasn't our study, and if you change the assumptions, you might have gotten a different result. In particular, that study looked only at pitchers who threw at least 100 innings, which creates a pretty substantial selection bias and tends to favor systems that err on the optimistic side. The one system that did better in that study on the pitching side was ZiPS. While I like the work that Dan has done a great deal, if there's one criticism I have of ZiPS it's that it seems systematically to be too optimistic for pitchers. I know that when we've done our own studies on forecast accuracy, PECOTA seems to have a larger comparative advantage for pitchers than it does for position players.
RL: Speaking of pitchers, PECOTA missed Jered Weaver so badly (6-9, 5.03 ERA with less than a 2:1 K/BB ratio vs. actuals of 11-2, 2.56, and better than 3:1 K/BB), perhaps that deviation alone caused it to perform not as well as others in the pitching department? [laughs]
NS: Yeah, that's not one of the projections we were prouder of.
RL: With respect to Weaver, why do you suppose the system broke down as it did?
NS: Weaver is a hard pitcher to find comparables for. He's extremely tall, and usually pitchers who are very tall tend to generate a lot of downward break and post strong groundball numbers. Instead Weaver is one of the more extreme flyball pitchers in recent memory. Since PECOTA is a comparables-driven system, its results always need to be interrupted carefully when this kind of thing comes up.
RL: I mentioned Chris Young was a good comp for Weaver a couple of years back. Both are very tall righthanded pitchers with outstanding control and high flyball rates.
NS: There seems to be a whole generation of Very Tall Pitchers these days. You've got Weaver and Young and Andy Sisco and Jon Rauch, and a whole host of guys in the minor leagues. I think that shows you how much a single success story like Randy Johnson can influence the conventional wisdom on this sort of subject.
RL: There is an old saying, "If you have to forecast, forecast often." Is there something inherently wrong when your five-year forecasts vary as widely from one year to the next as in the case of Weaver (who went from not winning more than six games in any one season with an average ERA over 5 to winning 11-13 games per year with ERAs ranging from 3.62-3.82 over the next five campaigns)?
NS: Well, I'm a big believer in the fact that a pitcher has a very large wall to climb between the minors and the majors. So to see that Weaver held up so well in the majors - that his flyball tendencies didn't translate into significant problems with the longball, for example - was very important. PECOTA doesn't give a lot of credit to any minor league pitchers unless they're Philip Hughes or Felix Hernandez good, since the attrition rates are pretty damn high. It's much less stubborn once they've made the leap to the majors.
RL: For Jered's sake, I hope he doesn't follow the same course as Don Wilson, one of his four comps this year.
NS: Yeah, that's one of those things that's hard to wrestle with. What happens when a player draws Don Wilson or J.R. Richard or Lyman Bostock as a comparable? What happens when a player whose primary vice is Yoo Hoo! gets comped to one of the 1980s players who ruined his career because of cocaine? In other words, should there be some sort of exception for tragic circumstances? Right now, the only exception we make are for players that served in the Korean War, who are treated as 'missing' in the dataset rather than zeroes. There's an argument that we should expand that definition. On the other hand, tragic circumstances are a part of life, and there's probably that residual 1% or 2% chance that a player's career gets ruined for circumstances having nothing to do with what takes place on the field.
RL: Alrighty. I'll get off my Weaver bandwagon here. The variability in forecasting is obviously huge, especially among younger players. Let's take a look at Elvis Andrus as an example. How in the world can you list Luis Rivas and Miguel Cabrera as two of his four best comps? Isn't that like saying a college co-ed might wind up looking like either Rosie O'Donnell or, then again, Jessica Alba?
NS: Rich, I don't know if you've been to your high school reunions - I've managed to skip mine - but I wouldn't be the least bit surprised if the prom queen had put on 70 pounds of weight, or the awkward, bookish-looking girl had turned into a hottie. A lot of things can happen between the time a person is 17 or 18 and the time they reach their mid-20s.
RL: I like your idea of attaching a beta to help explain the variability in the comps. But you've got Garrett Atkins as either the next coming of Cal Ripken or Ken McMullen with a beta of just 0.93. What am I missing here?
NS: A lot of it is simply that extreme variation is really the norm. Even for a relatively established player like Atkins, you're going to have some Cal Ripken scenarios (at least on the offensive side - Atkins leaves a lot to be desired with his glove) and some Ken McMullen scenarios. It's good that this is the case, because otherwise baseball wouldn't be much fun.
RL: I think I can speak for most readers and say it has been a pleasure reading you more often via the Unfiltered blog.
NS: Thanks, Rich. As I've said, this was something that was long overdue.
RL: In your latest post, you wrote about the limitations of major league equivalencies (MLE) and discussed the term Major League Pace or MLP for short. I like the latter approach because I maintain that the MLE PECOTA forecasts are overly optimistic for younger players and believe the MLP is a better way of thinking about them.
NS: Yeah, I have some trouble with the notion that Clayton Kershaw, say, could post a 3.91 ERA in Dodger Stadium *next year*. I have less trouble with the notion that he could post a 3.51 ERA in Dodger Stadium in 2011.
RL: Yes, I agree wholeheartedly with those notions. Would it be worthwhile to attach a 'confidence' factor to the PECOTA projections?
NS: We do have something called the 'Similarity Index' that serves to accomplish this function. An unremarkable major league pitcher like Ted Lilly or Matt Morris will usually have a similarity score in the mid-50s. Kershaw's similarity score is a big, whopping ZERO. You just don't see a lot of 18-year-old kids that post a 54/5 strikeout-to-walk in their first professional season.
RL: Well, Nate, I'm confident that we have covered the ins and outs of PECOTA. Thank you for sharing your projection system with us.
NS: Thanks, Rich.