Introducing Monte Carlo Win-Loss
The flaws in pythagorean Win-Loss percentage (commonly the square of runs scored divided by the sum of the square of runs scored and the square of runs allowed) are fairly well known. 20-2 blowouts count as only one win, but may affect pythagorean win-loss percentages dramatically. One-run wins and losses count as a whole win or loss, while pythagorean win-loss will treat them as nearly half a win and half a loss. All of these are true, but the method is still pretty darn good. There has been a fair amount of work on what just is the best exponent to use, and I've settled on 1.83 for Baseball-Reference, but other choices abound and some have even resorted to variable exponents to squeeze out those last three to four wins of error. I'm not going to go down that path here. I would like to look at a different way to approach this issue that accepts that teams have blowouts and one-run wins and incorporates this into the method. At the 2004 SABR convention in Cincinnati, I presented a talk on monte carlo simulation of pennant races (http://www.bb-ref.com/sabr/). The idea behind Monte Carlo Win-Loss Percentage is similar. (Monte Carlo techniques are common computational solution techniques used to simulate complicated systems. Basically, you run a lot of simulations and aggregate the data.)
What are the Flaws? Well, this method assumes that runs scored and runs allowed are independent of each other and that clearly is not the case. Managers manage to the score and the four runs allowed by mop-up relievers in the bottom of the ninth could turn a real win into a monte carlo loss (the same thing happens with pythagorean to a lesser degree). However, I think this method more correctly handles the cases where a team has a lot of one-run wins or many blowouts. Suspended games are somewhat problematic and tie games are troubling, but I think all of this gets evened out over the long run. Does it work? Yes, but not well enough to supplant pythagorean win-loss records. I've computed Monte Carlo Win-Loss Percentages (mcWL%) for every team from 1901 on and it does a little better than Pythagorean Win-Loss Percentage (pythWL%) with a 1.83 exponent. Root-mean square error between mcWL%, pythWL%, and actual WL% for 2076 seasons since 1900. So one measly percentage point, or one-sixth of a game better estimate over the course of a season. Also the mcWL% was as closer or better than pythag in 53% of the cases. So not great, but competitive. What can you do with this? I have a couple of ideas, but I'll expand on those later. One thing that is neat about these simulations is that you can count how many times the team's actual wins exceeded the simulated seasons wins. For instance a team that exceeded the simulation all 1000 times was probably very lucky to do so, and a team that never did was very unlucky (I call this percentile). We can also track their best and worst results along with the average. Luckiest teams by mcWL% - WL% team_ID year_ID W L mcW mcL HighW LowW WP mcWP pythWP lucky percentile BOS 1946 104 50 93.4 62.6 102.0 84.5 0.675 0.599 0.629 0.076 1.000 NYG 1909 92 61 83.1 74.9 93.5 74.5 0.601 0.526 0.560 0.075 0.998 NYG 1913 101 51 92.6 63.4 102.5 83.5 0.664 0.594 0.627 0.070 0.998 NYY 2004 101 61 89.8 72.2 98.5 78.5 0.623 0.554 0.548 0.069 1.000 BRO 1954 92 62 81.3 72.7 92.0 71.0 0.597 0.528 0.523 0.069 1.000 CHW 1959 94 60 84.9 71.1 94.0 74.5 0.610 0.545 0.559 0.065 1.000 CIN 1981 66 42 59.1 48.9 67.0 51.5 0.611 0.547 0.524 0.064 0.999 CIN 1944 89 65 80.0 75.0 91.5 71.0 0.578 0.516 0.530 0.062 0.999 NYY 1943 98 56 88.9 66.1 99.0 80.5 0.636 0.574 0.595 0.062 0.999 PIT 1908 98 56 88.9 66.1 99.5 79.0 0.636 0.574 0.600 0.062 0.997 SLB 1902 78 58 71.6 68.4 80.0 62.0 0.574 0.512 0.509 0.062 0.990 NYM 1972 83 73 73.5 82.5 83.0 65.0 0.532 0.471 0.459 0.061 1.000 PHA 1931 107 45 98.4 54.6 106.5 88.0 0.704 0.643 0.640 0.061 1.000 STL 1917 82 70 73.9 80.1 83.5 65.0 0.539 0.480 0.470 0.059 0.996 NYG 1925 86 66 77.0 75.0 86.0 66.5 0.566 0.507 0.522 0.059 1.000 CHC 1907 107 45 100.1 54.9 109.5 91.0 0.704 0.646 0.670 0.058 0.992 NYG 1906 96 56 88.0 65.0 97.5 74.5 0.632 0.575 0.592 0.057 0.997 PIT 1905 96 57 88.4 66.6 98.0 77.0 0.627 0.570 0.588 0.057 0.996 PIT 1909 110 42 102.0 51.0 113.5 94.0 0.724 0.667 0.694 0.057 0.997 BRO 1924 92 62 83.3 70.7 94.0 74.5 0.597 0.541 0.528 0.056 0.999Unluckiest teams by mcWL% - WL% team_ID year_ID W L mcW mcL HighW LowW WP mcWP pythWP lucky percentile BSN 1935 38 115 53.3 99.7 64.5 43.0 0.248 0.348 0.327 -0.100 0.000 NYM 1993 59 103 71.4 90.6 81.5 61.5 0.364 0.441 0.454 -0.077 0.000 CIN 1937 56 98 68.2 86.8 79.5 59.0 0.364 0.440 0.434 -0.076 0.000 PHI 1936 54 100 65.6 88.4 74.5 55.0 0.351 0.426 0.416 -0.075 0.000 STL 1909 54 98 66.1 87.9 76.0 55.0 0.355 0.429 0.398 -0.074 0.000 SLB 1905 54 99 66.5 89.5 78.0 58.0 0.353 0.426 0.421 -0.073 0.000 PIT 1917 51 103 63.2 93.8 71.5 54.5 0.331 0.403 0.388 -0.072 0.000 BSN 1912 52 101 63.6 91.4 73.0 53.5 0.340 0.410 0.402 -0.070 0.000 DET 1952 50 104 61.3 94.7 71.0 49.5 0.325 0.393 0.374 -0.068 0.001 PHA 1945 52 98 63.3 89.7 72.5 53.5 0.347 0.414 0.385 -0.067 0.000 NYM 1962 40 120 50.9 110.1 61.0 41.0 0.250 0.316 0.313 -0.066 0.000 WSH 1907 49 102 60.3 93.7 69.0 51.0 0.325 0.391 0.361 -0.066 0.000 BRO 1912 58 95 67.9 85.1 79.0 58.5 0.379 0.444 0.433 -0.065 0.000 HOU 1975 64 97 74.8 87.2 83.5 64.5 0.398 0.462 0.469 -0.064 0.000 SDP 1994 47 70 54.5 62.5 63.5 46.0 0.402 0.466 0.453 -0.064 0.003 PHA 1946 49 105 59.0 96.0 68.0 50.5 0.318 0.381 0.387 -0.063 0.000 PHI 1930 52 102 62.4 93.6 73.0 50.5 0.338 0.400 0.392 -0.062 0.002 SLB 1911 45 107 54.5 97.5 63.5 42.5 0.296 0.358 0.341 -0.062 0.002 PHI 1923 50 104 59.8 95.2 69.5 47.5 0.325 0.386 0.367 -0.061 0.002 BSN 1911 44 107 54.9 101.1 63.0 43.5 0.291 0.352 0.333 -0.061 0.001Teams for which pythWL% and mcWL% differ the most team_ID year_ID W L mcW mcL HighW LowW WP mcWP pythWP percentile BRO 1918 57 69 57.0 69.0 64.5 47.0 0.452 0.452 0.387 0.543 CHC 1905 92 61 96.2 58.8 106.5 86.0 0.601 0.621 0.680 0.083 BSN 1904 55 98 57.8 97.2 67.5 49.5 0.359 0.373 0.316 0.187 CHW 1905 92 60 91.8 66.2 100.0 81.5 0.605 0.581 0.636 0.563 BSN 1906 49 102 53.6 98.4 63.0 45.0 0.325 0.353 0.300 0.059 STL 1908 49 105 50.5 103.5 63.0 42.0 0.318 0.328 0.277 0.328 WSH 1903 43 94 49.3 90.7 59.0 40.0 0.314 0.352 0.302 0.015 CIN 1901 52 87 54.0 88.0 62.5 45.5 0.374 0.381 0.334 0.264 SDP 1972 58 95 62.6 90.4 73.0 53.0 0.379 0.409 0.362 0.075 WSH 1947 64 90 63.0 91.0 72.5 51.0 0.416 0.409 0.363 0.660 CHC 1909 104 49 102.9 52.1 111.5 92.0 0.680 0.664 0.709 0.651 CLE 1908 90 64 86.9 70.1 96.5 77.5 0.584 0.553 0.598 0.871 NYY 1939 106 45 104.8 47.2 114.0 94.0 0.702 0.690 0.734 0.683 BRO 1909 55 98 60.6 94.4 70.0 52.5 0.359 0.391 0.347 0.031 BSN 1905 51 103 54.6 101.4 63.0 46.0 0.331 0.350 0.306 0.119 DET 1905 79 74 72.4 81.6 81.5 61.5 0.516 0.470 0.426 0.993 HOU 1963 66 96 64.8 97.2 73.0 55.0 0.407 0.400 0.357 0.686 PIT 1918 65 60 64.6 61.4 72.0 55.0 0.520 0.513 0.556 0.581 STL 1944 105 49 102.8 54.2 112.5 93.5 0.682 0.655 0.697 0.784 BRO 1910 64 90 68.6 87.4 79.0 57.5 0.416 0.440 0.398 0.065 I've also made available a dump of my simulation results. The fields are tab-delimited. You can import this into excel easily using the text to columns command (the most useful command for any stathead, well after sorting). Simulation Data. The columns are straightforward, except for stdW which is the standard deviation of the wins totals across the 1000 simulations, and bstW and wstW are the best and worst win totals of all 1000 simulations. [Additional reader comments and retorts at Baseball Primer.] |
Comments
The big advantage of Pythagoran records is that they're good at predicting future performance. Can the Monte Carlo method be used as a forecasting tool?
Posted by: edd at July 1, 2005 05:42 AM
Do you have a hypothesis on why there are so many teams from the first half of the twentieth century? If the opportunity to have a lucky or unlucky season was truly random there would be more teams from recent times. This would be due to the fact that there are more teams in the league in recent years, therefore more teams have an opportunity to be either lucky or unlucky. Whew, that's confusing. I hope you get where I'm coming from.
Posted by: Wimbo at July 1, 2005 07:44 AM
>>The big advantage of Pythagoran records is that they're good at predicting future performance. Can the Monte Carlo method be used as a forecasting tool?
Sean can provide real #s, but it's clear that the pythag and the MC% are very highly correlated (probably .95 or higher). In that case, anything you can predict with pythag you can predict with MC (but maybe a little better or worse).
But there doesn't seem to be a big advantage of MC over pythag in terms of accuracy and pythag is much easier to calculate. And I'm too lazy to download Sean's simulated data, but I'd bet that the empirical standard deviation from the sims is very close to the expected standard deviation you'd get using the pythag winning percentage.
Do you have a hypothesis on why there are so many teams from the first half of the twentieth century?
I was wondering about that myself. My best guess is that either pythag or MC doesn't work that well in the tails, perhaps especially the low-scoring tails, of the distribution. Given 12 of the teams in the last table are from between 1901-1909, it seems that the two methods must diverge in low-scoring eras. Which one is better I have no idea.
Posted by: Walt Davis at July 1, 2005 09:34 AM
I am curious where the 2001 Seattle Mariners would stand in any dicussion about the luckiest team ever. Observationally, I have never seen a team get as many breaks as that team. I remember going to a game that year at Safeco where they scored 5 runs for a win and didn't hit a ball hard all game long. I remember laughing to myself all the way home after that game. Unfortunately, there are not too many laughs in Seattle these days after a Mariner game. The luckiest single performance I have ever seen was in 1965 when Robin Roberts shut out the Phillies in a complete game while giving up a ton of hits and rockets that were turned into outs. The last out of the game was a shot by Dick Allen that I thought was going to hit the roof of the Astrodome but instead settled into the glove of the centerfielder who caught it at the wall in straight-away centerfield.
Posted by: stan at July 2, 2005 01:10 PM
Thanks for the comments.
I agree, pythag is far easier to compute. What I like about this method is that it tends to view things in terms of a distribution of possible outcomes. If I had more time I would have looked a bit more closely at how the outcomes are distributed.
I also found it interesting that a team's win totals could range from 102 to 84 wins just by randomly re-ordering the runs scored and allowed. That in no way changes how many runs were allowed and scored in a season or even in each game.
Also, I checked the 1900-1909 seasons and pythag is very slightly better.
Posted by: Sean Forman at July 2, 2005 09:33 PM
Bill James tried something similar in the 1986 Baseball Abstract, and got similar conclusions; the gain wasn't worth the extra work.
He computed the won-lost percentage of all NL teams for each number of runs scored. Each team was credited with an offensive won-lost percentage based on the number of runs it scored in each game. For example, since NL teams in 1985 won 60% of the time they scored four runs, a team which scored four runs 20 times would be credited with 12 offensive wins and offensive losses for those games. Defensive wins were calculated similarly.
The offensive and defensive won-lost percentages were then combined to get an estimated total of wins for each team. The standard error by this method was slightly less than projected by the Pythagorean formula, but the difference was trivial.
Posted by: David Grabiner at July 7, 2005 04:41 PM
I don't know how to do html, so I'm afraid all this is going to come out in one big paragraph. Sorry.
Why do a simulation when you can easily compute the result over all pairings?
Let's look at your 10-game data.
When they score 5 runs, they'll go 7-1-2 (you note that ties are troubling, but you don't say how you handled them; I'll count them as half-a-win, half-a-loss, though I won't make any effort to defend this choice), which I'll read as 7.5-2.5.
When they score 11 runs, they go 10-0.
Here's the full table in the format, (runs scored, record when scoring this many runs, number of times scoring this many runs, total result of scoring this many runs):
(5, 7.5-2.5, 3, 22.5-7.5),
(11, 10-0, 1, 10-0),
(4, 6.5-3.5, 2, 13-7),
(6, 8-2, 1, 8-2),
(10, 10-0, 1, 10-0),
(12, 10-0, 1, 10-0),
(2, 4-6, 1, 4-6),
The total is 77.5-22.5, so the expected winning percentage is
.775, no simulations necessary.
You could probably even find a way to compute standard deviations, and thus compile your tables of lucky & unlucky teams.
I note that 4 of your luckiest teams were McGraw teams (John, not Tug). If most of McGraw's teams were "lucky" then it's just possible you've stumbled on a good measure of what a manager contributes to a team.
I guess BOS must be the AL team, but if I hadn't seen BSN farther down the page I wouldn't have known. Similarly the existence of SLB implies STL must be the NL team. It's unfortunate that the reader has to figure these things out.
Good stuff.
Posted by: Gerry at July 7, 2005 07:20 PM
Well, it came out in one big paragraph in preview, but then it came out in separate paragraphs in real life. Go figure.
Posted by: Gerry at July 7, 2005 07:22 PM
Post a comment