F/X VisualizationsFebruary 05, 2010
Thoughts on a New Box Score
By Dave Allen

I have fond memories of, as a child, reading box scores in the newspaper. In the pre-internet, or at least pre-internet in my house, days box scores in newspapers was the medium by which I, and I assume, most people consumed baseball data. The data were all there, tightly yet efficiently packed in a format that allowed you to pull out any or all you wanted without feeling overwhelmed. Each was small enough for box scores for all the day's games to fit on one page.

I still read box scores, the medium has changed to the internet, but the box score itself is largely the same. I guess the format has stayed largely the same since the mid-1800s. Some of the stats are different but the layout is very similar. Over 150 years with little change shows that the format is remarkably successful, but that does not mean there cannot be innovations. FanGraphs's WPA charts are not box scores per se, but are a very effective way of presenting what happened in a game.

I thought it would be an interesting exercise to attempt to create a new box score. I wanted it to retain the original box score's quality of presenting a relatively large amount of information in a relatively small space, but making that data accessible and not overwhelming. Beyond that I hoped my new method gave a more immediate feeling for the pace and tenor of the game, like the WPA chart does.

Here is my attempt. The image is may be too small, but I kept it that way so that it didn't push out the right margin of the page. You can click on it for a larger version. I used game one of the 2009 World Series for the example.
New Box Score
Each at-bat is represented by a bar, the height of which denotes the base the batter reached. White bars are for outs, black for hits or walks. The batter's progression around the rest of the bases that inning is indicated in gray (steals have a vertical black line through them). Runners on-base during an at-bat are indicated in red: circles for those not moved over in the at-bat, lines to show their progression as a result of the at-bat and an 'ex' if they were thrown or tagged out in that at-bat.

The score can be counted along as the black or gray bars reach the top. That also allows you to count individual batter's runs scored or pitcher's runs allowed. Red lines that reach the top are RBIs.

Compared to a traditional box score it is harder to find an individual player's line. For example to see that Chase Utley went 2-4 with 2 HRs, 2 runs, 2 RBIs, a strikeout and a walk you have to go through, find his at-bats and count all of the events. But the trade-off is, I think, this formulation gives a better feel for the pace of the game, and allows the events to be easily recreated: in the top of the first CC Sabathia escaped a base-loaded two-outs jam; Phil Hughes took over to start the eighth and walked the only two batters he faced, both of whom came around to score on Raul Ibanez's single; Utley's two solo-HRs were the only runs through the first seven innings; Cliff Lee didn't allow a runner past first until the ninth, and up to that point faced just three batters over the minimum; the Yankees burned through five relievers, who gave up four runs, in the last two innings; the top of the ninth ended with Shane Victorino getting thrown out at home on a Ryan Howard double and the game ended with two more Cliff Lee strikeouts. All of this can be easily seen through a close, but not difficult, reading of the chart.

What do you think of this format: Complicated and poorly laid out? Hard to read? Brilliant? I welcome constructive criticism in light of what you want from a representation of a baseball game.

Comments

Kind of like it, but don't hold your breath getting it adopted. I LOVED the Bill James boxscore idea, but the editor of The Sporting News told me "Forget it! It'd be a nightmare to proofread." I imagine he'd same the same thing in this case. Of course, where, exactly, is TSN these days?

There's definitely promise in your re-design. These things stood out to me as an illustration of game flow.

1. Cliff Lee dominated across a complete game (good story hook), while Yankees juggled a half-dozen relievers late.
2. Yankees didn't do anything until the 3rd time through the lineup

Text in traditional box scores all flows the same direction, which helps the reader move from area-to-area. Unfortunately, the criss-crossing info obscures other compelling events.

1. Utley's 2 homers (as you said)
2. Player substitutions
3. Fantasy value

A box score that's both fantasy-friendly (for individual players) and indicates game flow would probably catch on. We've already seen the NFL significantly re-shape their stat reporting to match fans' fantasy interest.

I applaud the creativity. I think it would make a fantastic scorecard. I wonder how difficult it would be to do this same type of thing with just pencil and paper. Simulating the multiple colors would be the trick.


This reminds me of another very creative chart. It's a History of the New York Yankees chart.

First, I love the project, as I'm sure many feel the same about box scores being so accessible and concise, while at the same time wanting more information. The inclusion of pace and "aesthetics" is a great idea. (On that point, I do think Fangraphs WPA charts give good impression of "swings" in games, if not necessarily pace.)

My reservation is that I can't imagine the traditional box score totals ever being wholly replaced, especially when they have proven to be so useful for collecting data while maintaining brevity and clarity.

I guess some key questions arise up front of just what data is truly necessary/desirable. Only then can you we do our best Edward Tufte impression and come up with a better box score.

i thoroughly enjoyed this. it gives the reader a finger on the pulse of the game. visually appealing and easy enough to read once you remember what the indicators mean.

agreed, it is a tad more difficult to find an individual players line, but i enjoyed looking ahead every 8 or so batters... gives you a better feel for how the game is progressing.

in terms of adoption: i don't see why not. tradition aside, we're fast becoming a different kind of fan base. we have more room in our newspapers (not to mention the internet) and an increasing desire to understand the game better. this is an inventive and streamlined solution that speaks to an visually-oriented culture.

I think you mean "Brian" Bruney.

I love the boxes showing base advancing. It's clear, simple, easy to understand at a glance. Really nice. I'm less pleased with the red lines for moving runners. In highly active innings like the Phillies' ninth, it's too busy and tough to decipher without stopping to think about it. An ideal box score should be immediately comprehensible. But this is a great start. Kudos.

I read this post via a link from Baseball Primer.

This is more of a play-by-play graphic than a substitute for a box score, in my opinion.

As I commented at Baseball Primer, if you wanted to condense your graphic, you could use a "sparkline." Edward Tufte has done a lot to popularize sparklines, or word-sized graphics useful for depicting sequences and time-series of data, such as the W-L and standings progress of a team over a season. On his message board in 2006, he posted:

Here's a sketch of the results of the last 25 or 30 at bats for a baseball player. A period is an out; a slash a walk; the verticals singles, doubles, triples, and home runs; and the whiskers beneath are the resulting runs batted in from that at bat.

It's actually 27 plate appearances. I'll use S, D, T, H instead of verticals for the basehits and arabic numbers instead of whiskers for the RBI.:

...../../.S...S.DD..T...H..
        1     1  2  1   3 1

I would think you could adapt such a scheme to present play-by-play sequences in a very small space. Here's my best shot of a portion of the posted example, the Phillies' batting innings from game one of the 2009 World Series, with each inning separated by a space:


../D/. ... ..H. ... ... .H.S. /:. //../S. .DSS.D.
1 1 2 1 1

I added : to represent a double play (and I (tried to underlined the final out, which was a baserunning out, to distinguish it from a batter's out). It's not all the information, but it's a quick and dirty start, presented in far less space.

Other changes I thought to make, that I tried out below:
- change the / to a W for walk
- change the strikeouts from . to K
- add a row representing the batter's place in the batting order

123456 789 1234 567 891 23456 789 1234567 8912342
..WDW. ... ..HK KK. K.. .HKSK W:. WWK.WS. .DSS.D. 
             1           1             2    1  1 

I find this to be, in essence, an interesting failure. A lot of people are going to have a hard time translating that graphic into anything meaningful. There are people in the world who are intimidated by vary basic graphs. This . . . this will overwhelm a lot of people.

Also, it doesn't tally up totals, which is a fundamental problem.

I always want to see how each player did in the game. The traditional box score gives me that at a glance. This makes me work far too hard to find those numbers. Does your graph add something: Yes. But not what I want.

My initial reaction is it is too visually busy. It really is more like a play by play, and frankly a three paragraph AP article can capture the essence of a game. I don't think people look at a box score to get a sense of the flow of the game, which an article can do just as well or better, but to get a quick sense how players did.

mcb,

could you link an example of Bill James's box score?

ewdewlad,

I guess you are right that it is more scorecard than box score. And thanks for linking that History of the New York Yankees chart. That is very cool.

Cory,

I like that way that you address the problem, by starting off thinking about which data are important to include and then based on that thinking about a structure. I guess for me recreating the narrative of the game is more important than listing each player's stats, but, based on the comments, it looks like I am in the minority on this point.

Bob,

thanks for noting this mistake. I corrected it. I do think that that base runners could be done better. I really wanted to include them, but don't feel this way is totally natural.

bobm,

I am a huge fan of Tufte's work and I like to think his ideas inform my work. The spark lines you present are interesting. They are smaller and easier to make than mine while keeping maybe 75% of the data. I wonder if there is an easy way to add the base runner information in another line or two.

Linus,

The Tufte-ian in me would say that almost anyone has the capacity to understand graphs, charts, tables and maps with huge quantities of data as long as they are well laid-out. So the fundamental problem is not that some people cannot understand a chart with too much data, but that people cannot understand a poorly laid-out chart. From that principle if this chart is overwhelming it is not inherent to the fact that it is char or that it includes too much data, but because I designed it poorly.

Travis,

Fair enough. I will try and find a way so that a future version has what you want, to easily see how each player did, and what I want, an easy way to recreate the narrative of the game in your head.

DB,

I think what I like about this format is that it allows someone to get a sense of the flow of the game without having to read an AP article. There is no way a three-paragraph AP article could contain all of the data in that graphic, so the author distills what he thinks are the salient events and narratives in the game. But in the time it took you to read that article -- two or three minutes -- you could inspect this chart and find those events and narratives on your own, as well as many minor ones that might only interest you. As Tufte would say, a well designed chart allows the viewer to engage the data with his own 'cognitive style.'

Anyway, thanks to everyone for their comments.

Glad to see so many fans of Tufte! Great job taking initiative with the sparklines bobm.

Another simple but useful graphic could be graph lines showing the team's respective scores rather than digits in grid form. This could be good first because the visual representation could easily emphasize pace and magnitute, and also because you could overlay more data (e.g., win probability and leverage) without losing the clarity.

In part agreement with some other commenters, I think the totals for players remains important, but luckily new media isn't so constrained as newspapers regarding size, color, etc. There should be a way to combine the best of both scoresheets and box scores.

The great thing about the new world of information is that there doesn't have to be only one type of boxscore. There can be many. And proofreading is no longer an issue, if they are generated automatically.

I agree that aligning a single player's effort is important. So one suggest would be to keep the graphical improvements, but put it back into a format more like Bill James's idea, with the names on the left like a traditional box and the play-by-play items progressing down the column and wrapping to the top where necessary. That's something anyone who has kept a scorecard knows how to interpret. Displaying the pitchers becomes difficult in this scenario, but it's not intractable.

SJ - by Bill James's idea, do you mean something like this?

                INNING
              123456789  AB R H RBI BB SO
JRollins   SS . . .  WS   4 2 1  0   1  0
SVictorino CF . .  . WS-  4 1 1  1   1  0
CUtley     2B W H  H K.   4 2 2  2   1  1
RHoward    1B D K  K .D   5 0 2  1   0  2
JWerth     RF W  K S W    2 0 1  0   2  1
RIbanez    DH .  K K S    4 0 1  2   0  2
BFrancisco LF  . .  W.    3 0 0  0   1  0
PFeliz     3B  .  K : .   4 0 0  0   0  1
CRuiz       C  .  . . D   4 1 1  0   0  0 
Runs          001001022  34 6 9  6   6  7

I quite like this approach, but that's partly because I'm more interested in getting the feel of the game at first glance rather than the player by player stats. If this were to exist at some of the sites reporting box scores but not all, then I could get the player by player data when I want it from the sites with more traditional box scores.

Dave,
I personally enjoyed your format, and I believe it to be a better supplement (in the non-static print world), compared to the Bill James style, to the totals for a player (which I also believe are still important).

What I would suggest for improvements:

. Indicate what type of out it is in the "out"
box (since it is blank):
G - Ground out
F - Fly out
C - Caught Stealing
S - Sacrifice
etc.

. Remove the red dots and the red x's
(including the red lines to them). Even though
current box scores (like those on MLB.com)
indicate which players left however many
players on base (and yours supplements that
with where), I personally do not care for that
information. I am much more concerned about
what a player did than that which he did not
do.

If a player is thrown out at a certain base,
that would be indicated by the "out" box and
an indication of C (caught stealing) or S
(sacrifice). To me, it is not important as to
whose at-bat that out took place in.

. Even though the red lines are helpful in
indicating that there was player movement
during an at-bat, the only thing that I found
of benefit was that it denoted that the batter
advanced runners or had RBI.

Trying to determine who the runners were that
advanced (or scored) was somewhat confusing.
Personally, I don't know what the best way to
graphically indicate that would be.

You are to be congratulated for your efforts. I, too, am more interested (based on how much time I have to break down a game) in what actually occurred during the course of a game rather than just know what the totals were. Your method definitely gives one more of a sense of actually being at the game.

Part of the box score format (including the James one) is being able to type in. Presumably you could use a spreadsheet program with a template for yours and copy/paste what you want. And the player's line could be put in with his name on the last appearance.

Oh, and "Some of the stats are different but the layot is very similar" should have said "layout".

I like it. These are a few recommended changes:

1. The red lines and dots were not intuitive, and judging from the comments above they can be left out, since the information to clutter ratio is a bit low.

2. If a player got a RBI, as someone suggested that should be indicated by a number above the player's at bat box. The number would be the current number of runs scored for a team, so if a player got three RBIs there would be three numbers, separated by commas. Then it would be easier to tell when the runs scored during a game.

3. I don't think there is any real need to show runners how runners advanced, especially if they don't score. If this information is to be included, it would be better shown as a line drawn between the base the runner advanced to, and the at bat when they advanced. For example, on the right, there would be two diagonal left sloping lines leading from the middle and the top of the Jeter grey stock, the first sloping to the Cabrera at bat and the second sloping to the Teixera at bat.

3. Stolen bases should be indicated in the black and grey stacks.

4. You may want to indicate home runs and triples, with a "HR" and a "T" in the black grey stack.

But good work on the whole. Am I correct in assuming that the pitching lines, and the totals for things like home runs and RBIs amassed by each player would still be shown separately?

Why should this replace the traditional box score, rather than supplement it? With the internets, this would be a tremendous "quick look" at the play-by-play, and incredibly informative. Then, below this image, the traditional box score (or whatever innovations we come up with to keep it relevant) could allow for the easy reading of Chase Utley or CC Sabathia's game line. Regardless, this is a fantastic way to visualize the key events in a game. How difficult was this to create? Could it be easily automated using MLB's play-by-play data? I'd love to see this on the Web.

BobK and Ed,

Thanks for the suggestions. If I do another version I will implement many of these.

Gilbert,

Thanks for pointing out that mistake.

Mitch,

Yeah it is completely automate-able from MLB's GameDay XML play-by-play data.

Baseball and its box scores are for the average fan, not for sabermetricians and writers. You as a baseball writer might lose sight of that, but I as a fan have not. What average fan wants to feel he's studying science when he reads the newspaper or checks a website for yesterday's game? The way it is now is fine. Here's a suggestion to all you baseball analysts - if you think YOUR fans are interested in all that, put it on YOUR website and the fans can check it each day. No need to campaign for a NEW box score across baseball.
The number one mistake of people who try to change the world is that they try to change the world. Change the people who want to be changed first and work your way up.

Do you play fantasy sports Eric? Do you look at more than just a final score? Those come from "average fans" who wanted a bit more information than "5-4 Cardinals."

Personally, I'm no BBWAA writer or member of SABR, but as an average fan with a brain, I appreciate it when I'm not pandered to as a luddite.

Anyway, the entire point of this piece is to devise a more informative box score that still maintains clarity for luddites like yourself. If it can be done, fans will follow. That's why current box scores thrived, despite using symbols and so forth, because they provide interesting information in a reasonably clear way to all fans.

I don't like it at all. I can't tell from Utley's line that he scored at all. Nothing reaches the top...

Why not just reproduce a mini-scorecard? Yes it would take up marginally more space, but it would be completely clear as to what happened in the game.

What an enjoyable post! I really like how the density and spacing of 1/2 innings gives a feel for the pace of the game. It does seem immediately clear to me that Utley has two HR - the two full height black bars make that obvious.

For me, the weaker point is deciphering the advancement of baserunners...

I might print this horizontally v. vertically. Would make reading player names easier and high scoring games wouldn't be squeezed to the margins.

Ryan,

thanks for your comments. I actually addressed your baserunner concern in my next version here.