Designated HitterJanuary 18, 2007
The Greeks, Bill James and the Beauty of Baseball Stats
By Dave Studeman

You've heard of Pythagoras, right? If you're a fan of baseball stats, you might associate Pythagoras with Bill James's Pythagorean Formula, RS^2/(RS^2+RA^2), which calculates a team's expected winning percentage. It's a sublime formula, really. It captures critical information in a simple way and expresses the relationship between runs scored, runs allowed and winning just so.

If you're not a baseball analyst, you probably associate Pythagoras with right triangles, as in A^2+B^2=C^2, where C is the length of the hypotenuse. It's another beautiful formula. From what I've read, Pythagoras didn't exactly invent it, but he did popularize it. Still, it wasn't Pythagoras's greatest contribution to mankind.

Pythagoras actually invented the musical scale we use today. If you place your finger exactly halfway up a guitar string, the note of the string is an octave higher. Put your finger on a spot two-fifths the length of the string, and you get a perfect fifth note. It's said that Pythagoras discovered this, and he found that the simplest ratios of string length created the most harmonious notes.

Reportedly, this was a huge revelation to the Greek. He felt that he had discovered a fundamental Truth, something that uncovered the deepest meanings of the universe. In a way, he had.

Pythagoras had discovered the power and beauty of ratios. He became convinced that mathematical ratios were the foundation of all beauty in the universe. He conceived of the music of the spheres, in which all planets orbit the earth in a circle, set in a specific ratio from the earth, which emits its own tone throughout the universe.

Pythagoras took his findings seriously. He developed a following - a cult, really - that believed that universal truths could be found in numbers. His disciples considered him a kind of god and followed him loyally.

I don't know anyone who thinks of Bill James as a kind of god, but there are many of us who feel that our eyes were opened by his Abstracts. He didn't just discuss baseball and its numbers, he uncovered the beauty in its numbers. Take that Pythagorean Formula...

James found that you can reasonably predict a team's performance by its runs scored and allowed. He also found that the relationship is geometric; Runs aren't just doubled in the formula, they're squared.

The power of two is everywhere in life. E=MC squared, after all. When you move closer to a light, cutting the distance in half, the light doesn't become twice as bright. The brightness is squared. When you double the sides of a square, its size doesn't just double, it's squared.

So when Bill James discovered that the nature of runs to winning is squared, it seemed as though something essential and fundamental had been discovered. And he didn't stop there.

Take any league in modern baseball history and multiply its On-Base Percentage by its total bases. Know what you'll usually get? A number that is very, very close to the total number of runs scored in that league. I mean, how amazing is that?

League  Year     OBP    TB       OBP*TB   Runs    Diff    %Diff
NL      1968    .300    18,737    5621    5577      44      1%
NL      1954    .335    17,106    5731    5624     107      2%
NL      1925    .348    17,751    6177    6195     -18      0%
AL      1997    .340    33,495   11388   11164     224      2%
AL      1977    .330    31,307   10331   10247      84      1%
AL      1959    .323    16,118    5206    5391    -185     -3%

I don't know if Bill James is the person who discovered this relationship but, like Pythagoras and his theorem, he will forever be associated with it because it was the basis of the very first Runs Created formula: A+B/C, where A is times on base, B is total bases and C is plate appearances.

Once again, James had found a simple formula and ratio, multiplicative in nature, that expressed the fundamental nature of baseball.

Of course, James created other metrics, too. He created Game Scores, Defensive Efficiency Record, Secondary Average and Isolated Power. He developed points systems for Hall of Fame and award eligibility. He created his own ways to project player careers (the Brock system), major league performance from minor league performance (MLE's) and the Favorite Toy.

James's findings were simple and beautiful. They were something new in the baseball firmament and they created a new kind of baseball fan, a bit like Pythagoras's cult. But, as with Pythagoras, questions began to undermine the beauty of the numbers.

One of Pythagoras's followers, an unfortunate man named Hippasus, discovered that some numbers are irrational. That is, the digits of some numbers continue infinitely like Pi (3.14159...) or the square root of two (1.41421...). Hippasus developed a proof showing that irrational numbers exist. Pythagoras considered this sacrilege, and reportedly had him drowned.

But the truth couldn't be held back, and the logic of Hippasus's finding was eventually recognized. Thousands of years later, a guy named Copernicus came along and established, once and for all, that the planets don't revolve around earth. They revolve around the sun. Pythagoras's music of the spheres doesn't really exist at all.

Pythagoras's math wasn't wrong, really. The trouble was that, for all of its beauty, it wasn't fundamentally sound enough to take future mathematicians where they needed to go. Newton and Einstein could never have conceived of calculus and relativity (relatively) if they had stuck to Pythagoras's mathematical ideals. Sometimes, progress requires a revision of the fundamentals.

Early in his career, Bill James really wasn't interested in creating the most precise statistics. He was interested in the framework, in the insights that would lead to revolutionary thinking about baseball and its players. So he didn't include counting stats like stolen bases and sacrifice hits in Runs Created. Like Pythagoras, he was most interested in the beauty and insight.

As time moved on, however, he became more interested in accuracy, and his formulas became more complex. He eventually added stolen bases, situational hitting and lots of other things to Runs Created. In fact, the current Runs Created formula is virtually unrecognizable compared to its original version, even though it still follows the A+B/C format.

The Pythagorean Formula has changed too. James recognized that squaring runs scored and allowed wasn't quite accurate enough, and changed the formula's factor to 1.83. I remember my disappointment when he did that, thinking that Pythagoras wouldn't approve.

Subsequent researchers have gone further, and found that the correct factor is dependent on the overall run environment. In other words, the impact of runs scored and allowed changes according to the average number of runs scored in each league each year.

Just think how Pythagoras would have responded to that.

Many years ago, Pete Palmer built his own runs estimator formula called Linear Weights, in which each offensive event (singles, home runs, walks, outs, etc.) is weighted by a specific amount. James didn't like Linear Weights. He once criticized Palmer's system because the weights of each event were computed after the end of the year (and he also doesn't like stats that use averages as a baseline).

However, Tangotiger showed, in a persuasive article called "How Runs are Really Created" a few years ago, that context really does matter. You can't really know the impact of each type of batting event unless you know how many times every event occurred.

In fact, Tango went one step further and showed that the format of James's original Runs Created formula wasn't quite right. He advocates the use of a formula developed by David Smyth called Base Runs. And if you take some time to think about it, you have to agree with him.

When you look at things in more detail, sometimes the fundamental structures that have gotten you so far have to change. That's what Hippasus meant to Pythagoras, and that's what has happened to James's original formulas, too.

Baseball writers like Rich and me aren't really researchers. We're communicators. We want to reach out to fans who are curious about the game of baseball and describe to them how the "inner game" of baseball statistics works. We are truly following in the footsteps of James, who is a fantastic writer, and we want to express the same joy at the beauty of baseball stats.

On the other hand, hardcore researchers are finding new ways of describing the game's statistics, and we want to share that with general baseball fans too. So we're in a curious bind. We want to continue to talk about the music of the spheres, but we also want to acknowledge the Copernican solar system.

At Baseball Graphs and the Hardball Times, I've helped keep Bill James's Win Shares in the public's eye. At the same time, however, I've conducted my own research and tried to improve his system. Some researchers have told me that trying to correct Win Shares isn't possible, that the framework is too flawed. But there is much I like about Win Shares, so I soldier on.

In the end, my quest may be quixotic, but as long as I help a few fans see a bit more in the numbers, and help a few researchers get a little more visibility for their efforts, I'll be happy. At least, hopefully, no one will try to drown me.

Dave Studeman is a writer at the Hardball Times, and also the manager of the Baseball Graphs website.

Comments

"In fact, Tango went one step further and showed that the format of James's original Runs Created formula wasn't quite right. He advocates the use of a formula developed by David Smyth called Base Runs. And if you take some time to think about it, you have to agree with him."

Uh no, I agree with James on this- if you have to tweak a formula after every year as in LWTS or BaseRuns, it has it's own problems. The disarity in individual components (outs, 2bs, sbs, hrs) among individual players within a year is vastl;y greater than the disparity among such indiviidual components for whole teams or leagues accross years.

If a formula has to be tweaked to adjust for 2004 versus 2006, then how do I know it's evaluating both a Jason Giambi and a Juan Pierre accurately in 2006? I'd much rather find the unchanging formula that generates the lowest mean error accross teamws accross multiple years- and several formulas- including most of James more complex RC formulas beat any unchanging BaseRuns formula-

That the idea behind BaseRuns seems philosophically right to its adherents is really irrelevant to me

jpwf: you are confusing two things.

BaseRuns does not need to be tweaked, any more than Runs Created does. Whatever basic formula for Runs Created exists for 1908 and 1988, there'd be the same basic formula for BaseRuns that can be used every year. The superiority of BaseRuns is that it ensures the run value of the HR works in a much wider and more logical fashion than does Runs Created.

I can guarantee you one thing: if BaseRuns was created first, there's no way in the world that Runs Created would have been the adopted choice of run evaluation of Bill James. When you look at how much Bill James jumps through hoops to keep changing his Runs Created formula by changing the coefficients of each term, but sticking to the flawed A*B/C model, you'd realize this.

***

You are completely wrong about how RC beats BsR in RMSE. Quite the opposite actually:
http://gosu02.tripod.com/id108.html

"As to the claim that Base Runs does not have comparable accuracy when applied to regular teams as other run estimators, this is simply not true. The Stolen Base version of BsR presented above has a lower RMSE when applied to 1961-2004 data(excluding the strike-shortened seasons of 1981 and 1994) than does Stolen Base RC, ERP, Equivalent Runs."

***

Now, the other thing is about the changing value of each component. This is reality. If you want to model reality, then a custom linear weights process is what you want. If you want to stick to the basics, then BaseRuns is what you want.

***

This may be of interest to some:
http://www.tangotiger.net/markov.html

A fun read. Thanks, Dave.

As legend has it, Pythagoras was chased and ultimately killed by his enemies after his refusal to cross through a bean field allowed his capture. James once worked in a bean factory.

Blows your mind, doesn't it?

Kent, that's hilarious. I should have thought of that; it completes the analogy.

jpwf, I obviously agree with Tango on this. The fact that individuals vary a lot in the individual components is exactly why a team-based analysis doesn't get you close enough to the "right" answer.

Of course, James' Win Shares now tweaks itself in about a million ways after each season, so presumably James has abandoned that criticism of Linear Weights.

James still has an irrational prejudice against the use of average as a baseline. Alas, he doesn't seem to get that baselines are a matter of perspective, and that they generally aren't intrinsic to statistics themselves. There aren't good or bad baselines, just more useful and less useful baselines.

Re Win Shares again, I'm not a big fan, but since James appears to be totally reworking Win Shares - including Loss Shares, as James has confirmed he's doing, is obviously going to require a major rewrite which is going to totally change the results - it doesn't seem to me that tweaking the current Win Shares is going to accomplish very much, unless the tweaking results in a generally better way of measuring performance (as opposed to just a minor improvement of the current Win Shares itself).

Minor quibble:

Of course, James created other metrics, too. He created Game Scores, Defensive Efficiency Record, Secondary Average and Isolated Power.

I believe Branch Rickey and Allan Roth created Isolated Power, as this statistics glossary suggests.

Defensive Efficieny Record: while James may have popularized it, I don't think he invented that either. In this book by High Boskage's Walker:
http://www.amazon.com/dp/0890873356/
He uses DER. Book was in 1982, and I seem to remember him writing that he "invented" it.

Same deal with RC. James was not the first, but he certainly popularized it.

***

Greg: I'll be very interested to see how many Loss Shares Bonds, Pedro, et al are going to get. My guess is that if he gives them positive numbers, he's going to bump up with Win Shares by the same amount, to stick to the "positive only" rule. And, if that's the case, he may simply move toward Win Probability.

Good point about Isolated Power. However, I'm surprised someone is saying that James didn't invent DER. I think that he feels he came up with it independently, and I believe e published it in his self-published Abstracts, before 1982. However, I'm not as familiar with Eric Walker's work as I could be.

Regarding tweaking Win Shares, I like to play with the system because that's how I learn things. Also, I believe Win Shares Above Bench (see today's Hardball Times) is a major upgrade over straight Win Shares. And if so, shouldn't we improve it in relatively minor ways while we're at it?

Studes - Is the minor tweaking BJ has done shown in the Bill James HankBook? Aren't the results a bit different, for atleast modern players?

Stuff, I'm not aware that James has made any tweaks for the Handbook. I know that the major changes he's working on (like Loss Shares) aren't in the Handbook.

"When you double the sides of a square, its size doesn't just double, it's squared."

Actually, its size (area) is multiplied by four, not squared.

Branch Rickey invented "Power Percentage" or what Bill James later termed Isolated Power. James had this to say about power percentage in the 1977 Baseball Abstract:

Power percentage is that part of slugging percentage which is accounted for by the extra bases; in simple terms, slugging percentage minus batting average. To me, power percentage is the natural form of the statistic, and slugging percentage is what you get if you add a player's batting average to his power percentage. Slugging percentage is a summary statistic, while power percentage is a descriptive statistic that applies to power alone.

James introduced Defensive Efficiency Ratio in the 1978 Baseball Abstract although he may have discussed it in a Baseball Digest article even earlier than that. Here is what he wrote in the '78 Abstract:

I've figured it for years, and it always correlates highly with team wins. It is simply the best team defensive statistic there is.

James also wrote a two-page essay on "The Defensive Record" in the 1979 Baseball Abstract. His conclusion: "(1) the more important measure of a player's defensive ability is not his fielding average, but his range factor, which is simply the number of plays per game that the fielder makes, and (2) the important measure of a defensive team is the percentage of all balls put into play against it that it can get to and make a play on."

James attempted to prove his point by stating that the good defensive teams based on Defensive Efficiency Record (DER) allowed fewer runs during the 1978 season than poor defensive teams, whereas there were no clear patterns based on fielding averages. "DER, always, correlates well with W/L Pct."

"When you double the sides of a square, its size doesn't just double, it's squared."

Actually, its size (area) is multiplied by four, not squared.

Oops! Brain freeze.

Studes:
A question/observation on WSAB. It seems to me that using 70% of avg WS sets the replacement level very high. Let's take a fulltime hitter who creates 68 runs vs. an average of 80, or .85. That player will be awarded hitting WS of about .70 of average, since the WS baseline is .52. So your bench hitter is only 15% below average. If a real replacement hitter is about .75, you would need to set replacement at about 50% of avg. WS. Am I calculating this incorrectly?

Also, if you look at the guys who are 0-1 WSAB in 2006, it includes guys like Everett, Griffey, L. Gonzalez, and Lamb. None of them hugely valuable, to be sure, but somewhat above replacement I think.

* *

I think DER is a very important concept, but one that has never really caught on except among serious analysts. Given the scales fans are used to working with -- BA, and now OBP/SLG to some extent -- it's not very intuitive. .715 doesn't look much different than .695, but that translates into about 16 points of BAA, or 8 wins (i.e. huge). In hindsight, it might have been better to use BABIP (1-DER, if you include ROE) thus scaling it closer to BA. For example, the 1973 Orioles had a .731 DER vs. lg avg of .701. If we instead say the Os allowed opponents a .269 hit rate, 30 points below league average, the point is a little clearer. Or convert to BAA (assuming average Ks and HRs), and the Os' defense lowered opponents' BA to .234, vs. lg avg of .259.

Rich, very good, thanks. I'll have to see what Walker wrote exactly in his book. I seem to remember him saying he was the first, but that may be my memory.

Rich: have you ever asked Bill why he does not republish his Abstracts? It would be a simple matter to put it in an online catalog for POD.

Studes has a total of 2424 WSAB in a league with 7290 win shares. That's an average of 81 WSAB per team, or 27 wins per team above bench.

So, for an 81-win team, that team would have 27 wins above bench, making that team's bench-line at 54 wins (.333).

Seems reasonable. If you want to argue that the bench-line should be at 48.6 wins instead (.300), that's ok too.

Hey Dave,

Great article.

It kept reminding me of a term I read in an economics textbook once:

"Standing on the shoulders of giants."

Tango: I don't see how that addresses the issue I'm raising. Suppose that James had used .75 as his offensive benchmark rather than .52. Then, WS would already be an above-replacement metric. In that case, a player who had 70% of avg WS would only be 7.5% below average, far above replacement. But it would still be true that Studes' method would yield 2424 WSAB in a league with 7290 WS.

Let's take a very good player who's 140% of the mean in RC (OPS+ of around 120). If replacement is 75%, he's 2.6 X as valuable as an average hitter (140-75=65, 100-75=25). But if you set "bench" at 85%, as WSAB does, the better hitter will instead appear to be 3.7 X as valuable. The ratio of player values is wrong.

Basically, WS already gets you about half-way to true replacement. So if you then want to get to replacement, you want to use something like 50% (or maybe 60%) of WS. I don't have a problem with a .350 replacement level -- which I think Studes is shooting for -- but WSAB is actually setting it at .425.

I appreciate this article perhaps more than any other I have read on Sabermetrics. Though I am keenly interested in having an adequate toolkit with which to evaluate players, I often find myself arguing with hardcore statheads about the inadequacies of some of the formulas. Some of these arguments stem from my interlocutors' rudimentary understandings of the state of the stat art; others with the limits of practicality in formula generation.

It's a fine line between a useful and an adequate formula. Many formulas are as close to adequate as we will ever come, but they are simply too complex to be very useful. Those that are quite elegant and useful are really not adequate to fine-grained analysis, such as the comparison between two players who are quite close in skills except for arm strength. How many wins does that translate into over the course of a season? Yours is the first article I have read that acknowledges the trade-off.

Baseball is a complex and at times chaotic game. But I don't think formula refinement has reached the point of diminishing returns quite yet. When it does, perhaps that's a sign that it's time to move beyond the Capernican understanding of the solar system and to an Einsteinian understanding. That might require starting from scratch with some of James' assumptions.

As with Capernicus, applying Okham's razor initially helped create a more elegant theory than Ptolemy's (which was based on Pythagoras’). But it ultimately led to anomalies that could not be explained. So Einstein scrapped those elegant but simplistic assumptions and started with some rather ungainly assumptions that nonetheless explained all the phenomena in the solar system without anomalies.

Baseball theory might need to go through that same transformation at some point. In the meantime, James' theories are useful enough of the time to continue to use them and refine them.

I'm actually a big fan of the 3 versions of XR myself (extrapolated runs). But if you really want excellent Runs Created estimates, then I think you should use several of the best formulas and then average the result.

From players from many years ago where we don't have a complete statistical record (such as baserunning outs and situational hitting), I actually think you should use runs scored as one of your runs created estimates.

Anyway Studes is the man, and this site and The Hardball Times rule.

Guy, great point.

His top WSAB is Pujols, at 25 (meaning +8.3 wins above bench). The top 10 averaged 22 (meaning +7.3 wins above bench). Do those make sense?

If we look at PMLV:
http://www.baseballprospectus.com/statistics/sortable/index.php?cid=99976
Which is batting runs above the average for that position, the top 10 come out to +54 runs, or +5 wins above average. Assuming those guys are, overall average fielders for their position, that seems like a reasonable place to be. 5 wins above average would be around 7 to 7.5 wins above replacement. Bill James has +7.3 wins, so I think the overall scale is correct.

Do you disagree?

It does look like WSAB tracks VORP pretty well, so I take your point. However, VORP sets replacement at 80% of mean offense, so that's a replacement level of .400, not the .350 Studes indicates, which was my point.

Of course, the premise of WS is that position players create 65% of value, with fielding about one-quarter of that. So it should produce higher win estimates for position players than VORP (which values hitting and pitching each at 50% in the aggregate, with nothing for fielding). I think what's happening here is that WSAB is in a sense setting a too-high replacement level for hitting, but offsets that by giving some fielding WS to just about everyone.

Separate issue, but in doing this I noticed that A. Everett and Betancourt both get rated as replacement players by WSAB. Seems like those players may not be getting enough fielding credit.

I should have said earlier that this is a very nicely done article. Win Shares debate #452 shouldn't distract from that.

One interesting aspect of James' writing, to me, is a change in focus over the years. James' earlier writing focused on methods for gaining a clearer picture of players' contributions by extracting them from their context (team, park, etc.). In contrast, Win Shares focused on evaluating players within the context of their team. For example, to the extent a team got lucky or unlucky in terms of wins vs. pythag estimate, he deliberately allocates the luck to players. Runs Created began as a pure context-neutral measure, but now includes 'clutch' homerun performance.

Similarly, his more recent writing -- such as "Underestimating the Fog" and his 2006 THT annual articles (Blyleven, non-random hitting clusters) -- seems devoted to looking for evidence that there's more than just luck involved in certain variations in baseball performance that are commonly thought to be random/luck. Ironically, for many of us that skepticism about something like clutch performance was learned from James himself.

One gets the sense that James is responding to, or has sympathy for, the argument that statistical analysis has gone too far in removing players from the "real game," and too far in downgrading the importance of "human" factors. I wonder whether he has misgivings about his own contributions in this regard.

But personally, I liked the "young" James better.

I have the repl level for nonrelievers at .380, and for relievers at .470. That'll give you a team repl level of .300.

Win Shares is notorious for undercrediting fielders. One of its many flaws.

And yes, an excellent article by studes!

Tango: You're welcome. Re republishing the old Baseball Abstracts, yes, I have asked Bill about that and even suggested he sell them as a boxed set.

Interesting point about WSAB, Guy. Thanks. The issue is that James does use a replacement level, but then layers absolute wins on top of them. So, the system basically assumes that a team of batters at the 52% replacement level would win no games.

WSAB "works" because the numbers add up for a specific team, but it's missing some part of the iceberg below the water line.

Aargh. I'll have to think this through.

I think that the best way to find the WS for a replacement level player would be to do it specifically for each team.

Say we have a team with 120 offensive win shares, and the total claim points are 360. So there's 1 WS for every CP. The league runs/out is .18.

Say your replacment level as a percentage of average runs/out is 73% (this would be a .350 OW%). He would get (.73-.52)*.18 = .038 marg runs/out.

So if we had a batter who made 300 outs, his replacement would get .038*300 = 11.4 claim points, or 3.8 win shares. If he actually created 80 runs, then he himself would get 80-300*.18*.52 = 52 claim points, or 17.3 WS. So he'd be +13.5 WS v. replacement.

If we skip the WS stuff and compare him directly to replacement, he would be:
80-300*.73*.18 = +40.6 RAR
.18 R/O is .18*25.2*2 = 9.07 runs/game, so he's approximately 40.6/9.07 = +4.48 WAR. Compare this with 13.5/3 = +4.5 WAR from the Win Share approach.

My point is that a batter's value should be essentially the same if you evaluate him under the WS framework or the more standard framework. Doing this on a team-by-team level may be cumbersone, but I think it would be the most logically sound approach.

Unless I'm missing something, that's pretty much what I do, only with a lower threshold. I take 70% of expected Win Shares; your approach would use 21% of expected Win Shares (which is essentially what Guy was suggesting).

The problem is that the combined WP% of your players will be .200 instead of .350. This is a problem inherent with what Win Shares does: stretching total wins over a continuum of performance that only includes that above a "replacement" threshold.

IOW, the relative WAR you'd get with this approach would be different from a more straightforward approach based only on runs and converted to wins.

Not that I could add much to this discussion but...with regard to accuracy and run estimators I've written a bit about it at

http://www.hardballtimes.com/main/article/a-closer-look-at-run-estimation/

and

http://www.hardballtimes.com/main/article/ops-for-the-masses/

Great article Studes. Love the mix of science history and sabermetrics. Count me in for a republication of the abstracts. They would go like hotcakes. ACTA should have tried to get that going in conjunction with their book on James coming out later this spring.

Studes - Did you notice that the totals in the BJ Handbook and the hardballtimes were a bit different? They are a bit different, atleast for certain players - Bonds career total for example, according to the BJ Handbook is 686, but according to the hardballtimes, it's 693.

Right, stuff. THT doesn't completely follow the original James formula. We've tweaked it a bit. You can find the explanation on our site.

Studes - I feel like a moron. I completely forgot about that, thanks for reminding me. I enjoyed your recent article, BTW, though I still feel this system ( WS) tends to overrate Cobb a bit.