The Great Discussion
Sabermetrics. The term, which is derived from the acronym for the Society for American Baseball Research, was originated by Bill James in the 1980 Baseball Abstract. A year ago I wrote in this letter that what I do does not have a name and cannot be explained in a sentence or two. Well, now I have given it a name: Sabermetrics, the first part to honor the acronym of the Society for American Baseball Research, the second part to indicate measurement. Sabermetrics is the mathematical and statistical analysis of baseball records. James admitted to me in our interview in December 2004 that his original meaning was "not a very good definition." Bill said he had recently stumbled across an even worse definition in a dictionary ("the computerized use of baseball statistics") because "computers don't have anything to do with it." The Senior Baseball Operations Advisor for the Boston Red Sox was pleased to learn that he said "good sabermetrics respects the validity of all types of evidence, including that which is beyond the scope of statistical validation" in the 1981 Baseball Abstract. I'm glad to know I wrote that back then. In the wake of Moneyball, some people have tried to set up a tension in the working baseball community between people who see the game through statistics and scouts. There is no natural tension there. There's only tension there if you think that you understand everything. If you understand that you're not really seeing the whole game through the numbers or you're not seeing the whole thing described through your eyes, there is no real basis for tension and there's no reason for scouts not to be able to talk and agree on things. A year after The Great Debate, hosted by Alan Schwarz of Baseball America, I gathered three top baseball minds in the hopes of advancing the discussion beyond the idea that sabermetricians are nothing more than statheads. Joining me today are Tom Tango, Mitchel Lichtman, and Eric Van. Tom (aka Tangotiger) and Mitchel (MGL), along with Andy Dolphin, recently published The Book: Playing The Percentages in Baseball. The Book is aimed at coaches, managers, and front office executives, as well as baseball fans interested in strategies such as batter/pitcher matchups, platooning, the sacrifice bunt, base stealing, and much more. All three of my guests are noted sabermetricians. Tom works full-time in computer systems development and part-time as a consultant to major-league teams (currently in the NHL, formerly in MLB); Mitchel is currently senior analyst for the St. Louis Cardinals; and Eric, whose lifelong dalliance with sabermetrics turned serious in 1999 when he started posting analysis to Usenet, was hired as a consultant in 2005 by the Boston Red Sox after his work on Sons of Sam Horn caught the eye of John Henry and his management team. Please feel free to pull up a chair, listen in, and enjoy. Rich: It's March 2006. Nearly 30 years have passed since Bill James wrote his first Baseball Abstract. The sabermetric community has grown significantly in numbers and respect over the last few decades. Our voices are now being heard more than ever. Let's take a few minutes to assess where we've been, where we are, and where we're going. Eric: If a team is spending nothing on analysis, there are obviously hundreds of guys in the field who could do a solid, competent job...and because it's a fun job with hundreds of candidates for thirty positions, they're not going to pay much. The bang for buck here is off the scale. The much more interesting question is how many analysts there are who are way more than competent, who can do more than just prevent their team from making Saber 101 mistakes, but can come up with great stuff, stuff that gives their team a real edge over rivals whose analysis is pedestrian. It will be very interesting to see how much money such elite analysts can eventually make, if they can establish a track record of adding that kind of value. I know I'm working on it. Mitchel: I have no idea what teams should or do spend on scouting. I have never asked the Cardinals and no matter what they said, it wouldn't mean much to me anyway. As far as what teams do or should spend on "analysis," I ditto what Eric just said. And I don't think it is an "either/or" thing, although teams may perceive it that way, at least for now. At the present time and probably in the near future, teams can get a more than competent sabermetrician for pennies on the dollar. As more teams recognize the value of a good analyst or two (or three), the supply and demand balance will change, competition will likely heat up, and analysts will make more money. There is a limit, however, for various reasons. For one thing, as the "baseline" increases, analysts will be able to save their teams less and less money, as compared to other teams or the average team. For another, geeks and nerds will always make a lot less than athletes. I guess eventually we will have to set the value of a "replacement-level sabermetrician" and go from there. Perhaps we should also form a union and start hiring agents like Boras or Moorad. Without collective bargaining or some other powerful force in the market (like extreme competition), it is difficult for anyone to make a whole lot of money. Eric: I think we've stumbled on a question that had never occurred to me before - just how much value can a top analyst add, above a replacement-level one? What kinds of new discoveries are out there, and how exploitable might they be in terms of getting a competitive advantage? And a thorny-related question: let's say an analyst crunches, say, some pitch-type data from BIS and discovers some wonderful new platoon pattern. A pattern that could be exploited to get a competitive edge, but also a pattern that every fan would be interested in and would add to everyone's appreciation of the game. Is it fair to sell this finding to an MLB club for their exclusive use, or is there a scientific obligation to publish? Tom: I think you should publish, after a couple of years. One thing that I negotiate in all my contracts is that I maintain IP rights to all my work, and that I grant the team or individual a non-exclusive, non-transferrable, perpetual-use licence. I don't want to happen to me what happened to Kramer. Rich: Well, Bill James has said that he wishes he could talk about certain studies, but that the Sox now own the rights to some of his recent findings. In the 1988 Baseball Abstract, James released the formulas and theories to his old works in Breakin' The Wand. I guess it comes down to whether or not you are independent or employed by a team. Mitchel: As Tom said, or at least implied, if you are employed by a team or work for them as an IC, it is up to the two parties to decide how to deal with the IP rights. Obviously, teams would like as much exclusivity and ownership as possible. It is certainly a little frustrating and disappointing when James says something like, "I would love to talk about X, but I can't." In my case with the Cardinals, I have an agreement which is very fair and balanced. With some of my work I retain ownership and there is no exclusivity agreement, and with other stuff the team acquires most of the rights. I also have a limited non-compete clause in my contract. To be honest, I have not looked at the contract in a while and there have never been any disagreements between us. The Cardinals are a very pleasant organization to work with and we have a very deferential, almost informal, relationship. Eric: Having studied neuroscience, I was trying to work out an analogy with pharmaceutical research and, unfortunately, it just doesn't fly. If you find a new serotonin receptor subtype, and think you can design a drug to target it, you have to publish the scientific finding as an eventual justification for the drug's efficacy. You probably have a year or two head start on the competition in terms of developing the drug, and once you beat everyone to the market with it, you patent it! So there are no incentives against making scientific findings public. If we do unionize, we might want to consider a policy whereby all our contracts state that such research becomes public domain after, say, 10 years (via the rights reverting back to us for publication). It's nice to give your employer a competitive edge but I'd hate to see the scientific understanding of the game suffer as a result. Mitchel: I don't think that sabermetricians have any responsibility whatsoever to publish or release any of their work in the public domain. It is their work and it is up to them to decide what suits them best. We are not talking about the cure for cancer or global warming here. Eric: C'mon. There's a profound correlation this century between global temperatures and strikeout rates. And it's not like we lack a causal explanation in terms of hot air. Tom: Right, I agree. Some people expect strikeout rates to jump 1% based soley on our discussion here today. Rich: Tom, you have stated before that sabermetrics includes both quantifiable and qualifiable measures. Do you care to elaborate on that point? Tom: I think people like to associate "numbers" and performance analysis to sabermetrics, and relegate scouting and observation as some ugly duckling. Sabermetrics is about the search for truth about baseball. And, at its core, baseball is about the physical and mental abilities of its players, which manifest themselves in explosions a handful of times in a game. Since we have limited samples in which to evaluate a player by his performance, we need to supplement that with some keen observations. The pinnacle of sabermetrics is the convergence of performance analysis and scouting. Mitchel: Tom, I know that is not politically correct to "bash" traditional scouting and observation, so I won't. I will say, however - and you and I have had this discussion before - that the more data we have - the "explosions" you refer to - the less we need scouting and other "subjective" data in order to reach the correct conclusions. To a large extent, an infinite amount of unbiased data always yields perfect results. This is an important point that is often missed or at least misunderstood by even good analysts. Tom: There is no question that if you had an infinite sample that we would have no need for observational analysis. It's essentially a scale, where good scouting can be worth 300 at bats, just to use as an illustration. That is, if I had a player with a 300 AB season, and I had a good scout who watched him for 5 or 10 games, I would "weight" his analysis by 300 AB. However, after a couple of seasons, my player will now have 1200 or 1500 AB, and the scout is still worth 300 AB. So, the scout becomes less and less relevant with the more AB that the player piles up. Eric: The convergence of sabermetrics and scouting has me as juiced as Tom but for a different reason. When I dream at night I dream of spreadsheets, and they have not just the columns we're all used to from The Bill James Handbook, but all the scouting-style data that BIS gathers: who throws what pitches how fast, all that. And I'm running correlations between that data and the standard numbers, and looking for career patterns and so forth. And Liv Tyler is lending a hand with the thornier linear regressions. They're pretty good dreams. Tom: Yes, the scouting-style data is exactly what I'm talking about, as anyone who follows my Fans' Scouting Report project knows. We need to capture all these traits of players, all the little things, so that we can better appreciate the context of the performance, and properly assign a value to the performance. Eric: I want to return to something Mitchel said earlier: "I consider anything which cannot be measured or supports the null hypothesis with a high degree of certainty to be essentially a non-issue, at least in a practical sense." And I think that's irrefutable. But the question is, are the things that are unmeasurable going to stay that way? Some very real and important things can be unmeasurable if enough noise is added. Who's to say that the right noise filter doesn't exist? Mitchel: Eric, sure, heretofore never been used statistical techniques as well as new methodologies can reduce background noise and otherwise enable us to measure things that we were previously unable to measure. But, to tell you the truth, if quality researchers have had difficulty measuring something in the past, it is most likely not worth a whole lot even if it can eventually be measured. That is not an absolute statement of course. We are talking about a relatively simple environment to study (with all due respect to Bill James, who generally refers to baseball as a complex dynamic), as compared with, say, quantum physics or cosmology. Rich: Well, Mitchel, I would rather talk about baseball than the structure of the universe any day. With that in mind, I'd like to go around the room and hear the most interesting topic you are working on right now. Eric: Hmm . . . I actually did recently send a letter to New Scientist about the structure of the universe (some unappreciated implications of Heim's Grand Unified Theory). This may be why it took me 35 years to get a career going in sabermetrics . . . Tom: I've started a few things, and they are all based off the play-by-play and pitch-by-pitch logs. Studes at Major League Baseball Graphs did a sensational job with what I was dipping my toes in, with his Batted Ball Index project. And I was also dipping my toes into what David Pinto already did with his fielding graphs chart. David Appelman did the third thing that I've been working on and off with, understanding pitch-by-pitch. There are plenty of great minds out there working their butts off. I think the Holy Grail centers around understanding the pitch-by-pitch process. This is what baseball is all about, this is where performance analysis can do the most damage, this is where you can have a real impact on the approach to hitters and pitchers themselves, and this is where scouting and game theory really comes to the forefront. It's the center of the baseball universe. My guess is that top baseball game designers may have cracked this nut already, and I would bet that Tom Tippett may be ahead of everyone on this. Just a guess. This is a journey I'd love to take, if I had time. Mitchel: Well, I can't really say, as it is all proprietary, but I can say that in 10 or 12 years when it becomes public, it will rock the baseball world! Just kidding! I'm not really working on anything earth-shattering right now. I have recently revamped my entire UZR methodology, which doesn't really mean anything to too many people, as I haven't published any wholesale results in a long time anyway. And, of course, I've been "scooped" by John Dewan in terms of any future public disclosure of UZR ratings in the form of a book. That is fair, as the original concept of a "zone rating" and even an "ultimate zone rating" was originally published by John and STATS Inc (although I developed my own "zone rating" independently and about the same time - along with several other people that I know of - remember DeCoursey's and Nichols' "defensive average" back in the late 80's or early 90's?). Eric: You kids! The adjective "back" should be reserved for the early 70's. I had to hand-calculate league OBP's and emend my copy of the 1974 MacMillan Encyclopedia in ballpoint ink. And walk a mile to school, too. Carrying the book. Mitchel: I don't think I'm that much younger than you, Eric! Anyway, I am also working on an "ultimate, ultimate zone rating (UUZR)" which, rather than using distinct zones or vectors and the probabilities of catching a certain type ball within them, uses a smooth function such that we can basically plug in the x, y coordinates of a batted ball (along with the usual characteristics - speed, type, etc.) and come up with the probability of that ball being caught, regardless of whether we already have an historical "baseline" for that particular type of ball at those coordinates. I am also going to incorporate into the UUZR methodology subjective ratings on all plays made (which STATS routinely provides) to improve the integrity of the data. As well, I am working on better ways of "park adjusting" player stats in order to do better context-neutral projections as well as to determine the future value of a player in a specific park, especially when that player changes home teams. I am continually working on improving my projection models, as these are really at the heart of what a sabermetrician can do for a team. Tom might disagree with this as he tends to think that one projection system is basically as good as another. Tom: For established big-league hitters, that's pretty much true. You can more or less prove that the maximum r possible for a forecasting system is around .75, while a group of fans can get you .65, and these sophisticated forecasting systems are at the .70 level (as a basic illustration). That's for hitters. For pitchers and fielders, that's not true of course. As for park factors, I've been talking about this for years. I find it extremely disappointing that we always talk about a single park factor, when that's simply not reality. Busch Stadium cannot possibly affect Coleman, McGee, and Jack Clark the same way, and we should not pretend that it does. Same for Coors. Yes, using something is better than nothing. But, there's been very little published on this subject and very little innovation. Eric: The overall park factors work fine for evaluating past value, but can be close to worthless for predicting future value. And there's a whole breakthrough project waiting to be done correlating weather and park data. Look at the year-to-year PF variation for Dodger Stadium vs. a place that actually has weather like Wrigley Field. Mitchel: My next big project is delving into the pitch-by-pitch data (TLV data - type, location, and velocity) that Tom just mentioned. He is right in that that is one of the Holy Grails left in baseball analysis with respect to evaluating and "scouting" players (and understanding and incorporating game theory into the analysis) in a very different way than we have been doing for the last 20 years. Mitchel: I'm all for that (BPA). BABIP is way too long. Almost as bad as TINSTAAPP! Eric: For instance, team BPA depends significantly on team K and BB rates. So good pitchers do allow a lower BPA, and differences between pitchers must be reasonably large. It also means that when you use BPA as a team defensive metric (and all the best people do), you want to tweak it to adjust for the quality of the staff as evidenced by the K and BB rates. I'm also just wrapping up my other recent project. I'm about to send the Hardball Times an article that, I believe, proves that RISP hitting differences are real rather than random (a question so settled in the other direction that Keith Woolner omitted it from "Baseball's Hilbert Problems" in the 2000 Baseball Prospectus). I'm not talking about "clutch hitting," but real and reasonably common variations in performance by hitters with RISP in response to the different tactics of the batter/pitcher matchup. I hope it will open up that topic for a good deal of further analysis. Rich: Thanks for the chat, guys. Based on our discussion, I think it is safe to say that there is a good deal of further analysis ahead of us in a number of areas. [Additional reader comments and retorts at Baseball Primer.] |
Comments
"I think the Holy Grail centers around understanding the pitch-by-pitch process."
You mean like the great stuff Rich came up with last week? ('Strikeout Proficiency') I really enjoyed those two articles -- and the tons of mileage that came out of it over at BTF and Pinto's site -- as well as this roundtable.
Posted by: J.W. at March 5, 2006 11:50 PM
"The slowdown will happen if MLB and the data owners considers it more important if 30 analysts look at this data instead of 30,000."
I think this is a real issue, especially with the current TLV data. This stuff isn't publicly accessible. As far as I know, the closest you can get is Retrosheet and while a lot can be gleaned from their play by play data, it doesn't have the same type of granularity of TLV data.
You can’t exactly blame the companies that collect this type of data for not publicly disseminating it because I’m sure it’s expensive to collect and for companies like BIS it’s their bread and butter. On the other hand, the data is far too expensive for almost all hobbyists and if you can afford it, it comes with understandable distribution restrictions.
Over at FanGraphs, we’ve agreed with BIS to freely distribute raw TLV data for retired players only, but for any league wide studies to be done it may take over a decade for enough players to retire. And it certainly doesn’t help with the evaluation of current players.
Unfortunately, I think for the foreseeable future it may be that there are only the 30 analysts looking at the data. I’m not exactly sure how the problem is going to be fixed either unless there's a Retrosheet style project to collect this type of data.
Posted by: Appelman at March 6, 2006 10:51 AM
"we’ve agreed with BIS to freely distribute raw TLV data for retired players only"
Well, that is very refreshing! At least one company cares enough about the researchers that they will allow data to enter the public domain. Since BIS only has data since 2002, this doesn't apply to them, but I'd also say that companies should release data that is 5 years old or older. After all, that data is pretty much worthless to a team or outfits like Yahoo, etc. Even setting up a nominal charge would be great. But for an analyst, it is really irrelevant if the data is from 2002 or 2007. We need data to understand the behaviours of players. Releasing old data that's not generating any revenue also does good business sense in that it may bring you a larger customer base, of which some may start to buy your data.
Posted by: tangotiger at March 6, 2006 11:04 AM
The 5-year statute of limitations is a terrific idea to promulgate (and Tom makes a strong argument for why it will be good for the stat companies.) I've never done a study where data that old correlated to present performance.
It's in the interest of teams to get this old data into the hands of the general saber community, too. An analyst for a team can do a much better job with the recent, proprietary data if he can draw on a large body of public work on its general interpretation. Instead of 30 (or fewer) analysts separately inventing the wheel, you've got something resembling a normal scientific community, where the general work is in the public domain and the applications are for profit.
Posted by: Eric Van at March 6, 2006 11:59 AM
Thank GOD somebody -- i.e., MGL -- has finally come out and said the obvious about James's "Fog" article. (I mean, besides me.) James makes too much out of the "fog"; his results are of no practical importance. To do otherwise is to stand the scientific method on its head.
Posted by: Rob McMillin at March 6, 2006 2:01 PM
Rob, I couldn't disagree with you more. As I've said in several places now (and for many years), the strength of a correlation doesn't tell you the size of the signal being measured; it tells you the size of the signal less the size of the noise. And a signal of any strength can be obscured by sufficient noise, hence the fog argument. There are certain things that are important to the game that are inevitably swamped by noise and hence difficult to measure, but that doesn't mean we give up on understanding them. The initial findings about the weakness of the correlation of BPA (BABIP) led many people to assume that the individual range in BPA was not "baseball significant" (even after I, Tom Tippett, and others showed that it was statistically significant). Well, it turns out that if you don't attempt to measure a pitcher's true BPA, your estimate of his true ERA is likely to be off by 0.20 or even 0.40 runs, which is to say $1 - $3 million a year of salary in terms of value on the FA market. It's swamped in fog and it's of immense practical importance. Not everything on Bill's list of fog-shrouded phenomena is going to prove as real and important, but each needs to be looked at more closely.
Posted by: Eric Van at March 6, 2006 3:53 PM
I never said we should stop trying to find useful information, but neither should we pretend that because we don't know something that it does exist, either. The problem I see with the "Fog" document is that it very much implies the existence of things that quite frankly haven't been proven. Unlike James and his hypothetical picket, the existence of the army is not a given, let alone its presence. His choice of metaphors was extremely poor, as it would seem to make it incumbent on those who say "there is no proof of X as a skill" to prove that negative assertion, as opposed to those who believe it is to prove their positive assertion. Well, negative assertions aren't proveable, by definition.
Posted by: Rob McMillin at March 6, 2006 4:53 PM
I think this (the "fog" issue) is more a matter of degree and semantics than anything else.
James statement:
Cramer was using random data as proof of nothingness and I did the same, many times, and many other people also have done the same. But I'm saying now that's not right; random data proves nothing and it cannot be used as proof of nothingness.
Why? Because whenever you do a study, if your study completely fails, you will get random data. Therefore, when you get random data, all you may conclude is that your study has failed. Cramer's study may have failed to identify clutch hitters because clutch hitters don't exist as he concluded or it may have failed to identify clutch hitters because the method doesn't work as I now believe. We don't know. All we can say is that the study has failed
was a poor way of representing his point of view, and I agree with Rob assessment.
One, every good scientist understands that when he does an "experiment" and it "fails" that the best he can say is that, "We found no evidence of whatever it is we were looking for," or some such thing, and that, "Given the test we did, there is such and such chance that we made a Type I or Type II error," etc. That is banal.
It is also banal to state that an experiment might fail because it was poorly designed, etc., as James does in the above statement. Actually, it is more than banal. It is downright ridiculous to state that, "We shouldn't get all excited or jump to any conclusions when a scientist finds no evidence of something because the study may have been bad. Well, no shit! That applies equally when we do find evidence of something. And that is why we have things like peer review and duplicated, independent research before we in fact jump to any wholsesale conclusions. That really has nothing to do with the "fog" issue, per se.
So while I don't necessarily disagree with Eric's point of view, I think that we are in some sense arguing about angels dancing on the head of a pin.
Obviously if someone uses poor statistical techniques (including too small samples, etc.), we take their finding with a grain of salt. However, when we do accept a certain thesis, we operate on the assumption that is was derived in a responsible, scientific manner, always leaving open the possibility that the results are "wong." So what? That is the nature of science. I forgot who said it, and I am probably butchering what he said anyway, but science is dicsovering truths through the scientific method until such time as someone with better tools or data disproves those truths.
Bottom line is that I think this whole "fog" issue is overrated. Eric can flippantly state that "a large signal can be obscured by noise such that it is difficult to measure," but the fact of the matter is that that is exceeedingly rare in baseball. If a signal is hard to measure, it is almost without a doubt not very important in a practical sense. DIPS is a poor example of something that supports Eric hypothesis. Voros made an initial blanket statement that pitchers have little or no control over BABIP, as opposed to other outcomes (BB, K, and HR). He was right then and is right now. How little (as well as other related stuff) is another issue altogether.
Anyway I'll get off the soapbox and return you to your regularly scheduled programming.
Posted by: MGL at March 6, 2006 4:56 PM
The problem I see with the "Fog" document is that it very much implies the existence of things that quite frankly haven't been proven.
Does it? Or does it merely caution people not to jump too quickly to the conclusion that noise = insignificance? (Which, to my mind, is a much more useful reminder than it is to a working scientist like MGL.)
If I was managing or assembling a team, I'd rather allow myself to be open to the possibility that Gary Sheffield really *does* hit better in the clutch, or that Davey Lopes really *did* hit lefties better than the platoon spread for righties would predict. It would be near or at the bottom for stuff I'd look for or rely on, but I see far less harm personally in keeping the door open for such data than slamming it shut.
But then, I'll never interact with a baseball team in any meaningful way, so who cares?
Posted by: Matt Welch at March 6, 2006 10:41 PM
Matt, all due respect, but you don't post sentries to look for the Iraqi army in Arizona. That's my problem with James's metaphor.
Posted by: Rob McMillin at March 7, 2006 9:14 AM
Oh, and finally: if the Angels -- for this is just a part of this discussion -- don't take RISP and RISP2 hitting seriously, why let slip that they do, that it's the most important stat they keep?
Posted by: Rob McMillin at March 7, 2006 9:16 AM
Bill James' Fog piece was actually useful because a lot of non-scientists reading sabermetric work don't understand the distinction between proving something doesn't exist and not proving that something exists. This is very problematic in the practice of the work for a couple reasons.
I work in basketball and it is extremely common for well-trained statisticians jumping into this to form their basketball study completely wrong. The most common mistake leads to the result that offensive rebounds are useless. This is because they construct the study wrong -- as soon as you tell them how to craft it, they see a true value showing up in their results. And that value isn't small. It is not a small signal, but a lot of noise introduced by the study. I wouldn't doubt that this sort of thing happens in the more difficult topics of baseball, as well.
So there is a lot of fog that makes the practice of sabermetrics (in baseball or basketball) a little tough.
Dean Oliver
Consultant to the Seattle Supersonics
Author, Basketball on Paper
Posted by: Dean Oliver at March 8, 2006 5:11 AM
James' real power is his language. We talk about the Fog piece, and we don't even have to go back to read it. We *remember* it, like a song. I say "Michelle", and you're singing the entire Beatles song. We say "Fog", and we think of the James piece.
I agree with Dean that the effect of the Fog was to remind many people between the difference between something that doesn't exist, and something that you haven't found. It's not the same thing. At the same time, if you are like Mitchel, and you are looking high and low for it, and you still haven't found it, for all intents and purposes, if you end up finding it, it probably will be useless to you.
Unless you are meticulous, it's hard to tell that what you are looking for is a bomb or a grain of sand.
Posted by: tangotiger at March 8, 2006 6:59 AM