Designated HitterMarch 29, 2007
Enhanced Gameday
By Joe P. Sheehan

MLB.com's Gameday application is a blessing for people trying to follow out of market games. It's easy to use and clearly presents play-by-play information about the game. That's why I was excited when Enhanced Gameday debuted during the 2006 playoffs. Enhanced Gameday keeps the basic aspects of Gameday and adds detailed Type/Location/Velocity information about each pitch. I'm not entirely sure how the application works, but the basic idea is that high-speed cameras and motion-capture software track every pitch and determine various data points for each pitch.

zumaya1.jpgIf you're motivated to sift through the XML, you can do tons of neat things with this data. Joel Zumaya was my motivation to sift through the XML. Like many other people during the playoffs, I was captivated by Zumaya's ability to throw baseballs harder than anyone I had ever seen. According to Gameday, there were 23 pitches thrown faster than 100 MPH in the playoffs, and Zumaya threw 15 of them. (Amazingly, Justin Verlander threw the other eight.) The Guitar Hero's fastest pitch left his hand at almost 105 MPH and his average velocity during the playoffs was 96.6 MPH.

Obviously Zumaya can throw hard, but where were those pitches going? This graph shows where he threw all of his pitches, with each pitch colored by speed, along with an estimated strike-zone. The angle for this graph is from the catcher's perspective, similar to here. From this graph you can clearly see Zumaya's reliance on his fastball and that he struggled to consistently throw it for strikes.

Zumaya appeared to have a one pitch plan when he was attacking hitters, but what about someone who can't just rely on speed to retire hitters? How did Kenny Rogers go after batters? Here are velocity graphs for Rogers, split up by batter handedness. Rogers threw 87 pitches to left-handed batters and 162 to righties. He appeared to pound the outside part of the plate when facing both types of hitters and seemed focused on not coming inside and over the plate to righties. He also threw different off-speed pitches on the outside part of the plate, depending on the handedness of the batter. Lefties got a slower off-speed pitch, which consistently missed the strike-zone, as if he were tantalizing the hitter to chase a ball. Righties faced an off-speed pitch that was closer to the strike-zone. While these graphs are interesting, they aren't showing anything new.

rogers velo L.jpg rogers velo R.jpg

Information pinpointing the release points of pitchers is also included in the XML files. To my knowledge, accurate information regarding release points has never been available to the public. This graph shows the release point for Barry Zito, compared with all other pitchers who appeared in the 2006 playoffs. Zito's bizarre BABIP patterns have been discussed at Inside the Book and Catfish Stew. Perhaps one reason for these weird splits is Zito's release point, which is closer to that of a right-hander than a left-hander. Chad Bradford's inclusion on the graph emphasizes how different he is from a "normal" pitcher. Even Cla Meredith, who has the second lowest release point, is more than two feet higher than Bradford.

Release Point.jpg

Here are the release points for four individual pitchers. Besides Zito, another interesting thing on this graph is the two distinct release points for Mike Mussina. Rogers and Chris Carpenter both had pitches where they changed their release point, but Mussina appears to have two deliberate release points. The release points for Mussina changed depending on the type of pitch that he threw.

Release Points.jpg

In order to classify what pitches Mussina threw, you need the horizontal and vertical "break" values given for every pitch. According to the Enhanced Gameday blog, break is defined as "the measurement of the distance between the location of the actual pitch thrown over the plate, and the calculated location of a ball thrown by the pitcher in the same way, with no spin." No measurement scale is given, but every pitch has a "fingerprint" consisting of its speed and two breaks, which identifies how the pitch spun through the air en-route to home plate. These fingerprints can be used to identify pitch types. Not every fastball will be exactly 89 MPH, with exact breaks, but all fastballs from a pitcher are going to have similar speeds and breaks, which are different compared to the speed and breaks for a curveball from that pitcher.

This graph shows the horizontal and vertical breaks vs. speed. The blue dots show the horizontal break for every pitch, while the red dots represent the vertical break. Each pitch has two dots, and from the graph, you can pick out clusters of Mussina's pitches.

Both Breaks.jpg

Here's the same graph, but with each type of pitch colored differently. Mussina has four pitches, two of which, B and C, he threw exclusively from his higher arm slot. The second graph is a close up of Mussina's release points, colored by the pitch type.

MPH.jpg Type by Release Pt.jpg

Once the pitches are classified, you can examine the "stuff" of a pitcher, and how he uses each pitch. I only have data for 75 pitches from Mussina, so I'm going to use Kenny Rogers again. I have data about 249 pitches for Rogers and he threw four types of pitches, shown in the graph.

rogers pitches.jpgRogers threw pitch A 68 times, and was able to have the batter swing and miss 11 times (16%), the highest percent of any of his pitches. Granted this is too small a sample to really mean anything, but pitch A could be Rogers' strikeout pitch. With a full season of data, you could establish not only which pitcher is the best at creating swings and misses (or ground balls or poorly hit balls or whatever), but which pitch they are using to get those results.

The biggest problem with the data currently is that it is incomplete. For whatever reason, Gameday didn't have data for every playoff game, either missing the game completely or just missing certain innings in the game. The recording of data did get more reliable as the playoffs progressed. Hopefully that is fixed for the 2007 season.

The other problem is that there is only one month worth of data. There just isn't enough information from this trial run to make any definitive statements. Mussina made only one start in the playoffs, so the dual release points could have just been a coincidence. Barry Zito could have been struggling with his delivery in his starts, so his release point might actually resemble that of a typical lefthander. This analysis is just scratching the surface of what Gameday has to offer. With more than a month of data, you could better visualize how pitchers approach left-handed hitters compared to right-handed hitters, see which pitcher has the most movement on his pitches, see which pitch is the hardest to make contact with or hit hard or hit in the air, and possibly even expand the analysis to hitters as well.

Here's the link to the XML from Kenny Rogers' start vs. the Yankees in the ALDS. The web directory is organized intuitively, and with a little poking around, you can find the XML files for any playoff game. There's a lot more information contained in these XML files that I didn't use in any of the graphs in this article because I wasn't able to figure out what it meant. If you have any ideas about what parts of this information may mean, I'd love to hear from you.


Joe P. Sheehan graduated from Oberlin College last May. A 22-year-old Red Sox fan, Joe would like to work for a MLB team. He played baseball at Oberlin and has followed the game for as long as he can remember. He is not to be confused with the Joe Sheehan of Baseball Prospectus.

Comments

Cool stuff, thanks for the research. I'd be curious though what your thoughts are on Mussina. If his release point is different for each type of pitch he throws, couldn't hitters pick up on that and know what type of pitch is going to be thrown? Obviously he's been pretty successful thus far in his career so history suggests that batsman aren't doing this. BTW, I suggest you use Young Joe Sheehan or some moniker like that to avoid confusion for the rest of us in the SABR community.

Now here's two things I love, slapped together: XML and Baseball stats. YES!! YEESSSS!!

I don't know what's going on with Mussina's release points. I agree that the different release points would appear to tip his pitches and I originally thought the differences were related to his tiring as the game went on or that he was struggling with his mechanics and couldn't throw strikes from one place. When I tested these ideas though, neither looked as plausible as the pitch type idea. He could also be physically moving side-to-side on the mound before he throws, which would change his release point, although I think that would tip hitters too. Another possibility is that in his one playoff start he happened to have two release points.

Very cool stuff. Are there any sites that explain how to access and use the XML files?


P.S. Maybe you should go by Joseph Sheehan.

I remember back when you could use Gameday for mundane matters like finding out what transpired in a game. MLB.com has a bloated application that eats up RAM and processing power.

I just need the facts when I'm following online. Remember that Gameday is the application of choice if you are at work and you just want to dip in from time to time to see what's happening. When I'm at work, I really don't care what the break on Kenny Rogers curve is. If I really cared, I would watch the game.

I don't want a scattergram. I just want to know who's ahead. You know. Stuff like that.

Bob, your concern is best addressed on the MLB.com blog. There's nothing stopping MLB.com from providing the Classic Gameday, and the enhanced one.

What Joe has done here is beyond fabulous, and at the same time, barely scratches the surface. Huge applause.

I don't mean to denigrate Mr. Sheehan's work here, but he was giving a positive review of a consumer product that I didn't think worked well at all. It may have worked well for what Mr. Sheehan wanted, but my expectations are different than his. Obviously many of the readers of this site seem to enjoy the Gameday features, I just don't happen to be one of them.

Joe, how do you convert the coordinate system provided on the XML file to feet?

Max, I believe that the data given in the XML is in feet. The fields that represent the release point are x0, y0, and z0. y0 is 55 for every pitch, and x0 and z0 are the x,y values that I graphed. 55 feet away from home plate is a good place to begin tracking the pitch, you would usually pick it up after a pitcher drives off the rubber. z0 is the height of the ball at the release point, and the numbers given for z0 (roughly between 6-7) are the right size it were measured in feet. x0 is the horizontal distance away from a stating point that the ball is at release. The values for x0 also make sense if it were measured in feet.

There really isnt any documentation about whats in the fields, so I made educated guesses about what different fields mean. In the words of Charles Barkley, I could be wrong about them, but I doubt it. Although if anyone has ideas about what a field means or other things youd like to see done with the data, Id love to hear from them.

MLB.com set up the application now so you can turn off the more sophisticated pitch data when viewing the game in real time. But you can go back and get it later.

This would seem to be a good compromise.

Joe, I want to thank you for showing us just some of the uses of this data. I have been looking at the xml data in the past few days trying to break down what all the data means.

Your explanation of the x0,y0 and z0 values being in feet has helped quite a bit.

For anyone else who wants to break some of this data down here are some other things that I think I have figured out.

the vx0, vy0 and vz0 are the speed in feet per second in each direction. I confirmed this by checking the speed of various pitches in mph at their start speed and comparing that to the value of vy0 and they always matched up (.6818 mph = 1 fps)

the values for x and y for the pitch location are in centimeters. I believe this to be the case since when you look at pitches on the right edge of the strike zone and the left end of the strike zone they are roughly 40 apart. 42.5 centimeters is equal to 17 inches which is the width of home plate.

sz_top and sz_bottom are the height of the strike zone for the current batter again in feet.

I am still looking into what pfx,pfy, px and py are. hoping I can crack that this weekend.

Just thought that I would share what I have learned. Again I might be completely off on these but they all seem to make sense and work with the pitches I have watched.

I'm thinking the following: px and pz is the location of the pitch in the strike zone in feet (with p standing for pitch). x and y would be the location in the strike zone as well, but uses the old classic gameday coordinate system. How those values are arrived at, whether used to convert to feet to find px and pz or is totally determined seperately by some person watching the inputting the locations of the ball when it crosses the plate, I don't know. I don't think this coordinate system is actually centimeters, though that was a good guess. I made a couple equations in excel using Harden's Wednesday pitching data. With y being px in feet and x being "x", I got the equation: y = -0.03x + 2.9657 and those number values would be in feet. There's .0328 feet in one cm, so you were close with that assumption, derek. Then with y being pz and x being the y coordinate you get the equation y = -0.0428x + 8.7438. Not there was imprecision and that equations just document the general motion; meaning there's is not an exact correlation between px and x, and pz and y. The rho-squared value for the first was 0.9393 and seoond was 0.9171. If there was an exact correlation those values would have 1.00. So what accounts for this error? Do the values change with changes in the strike zone or are these values totally found seperately with the px and pz found by the camera and the x and y determined by someone watching the game and so the rho-squared value is actually measuring the human error.

I didn't know what the vx0, vy0 and vz0 values meant, but what you said makes sense. And if not for Joe's article I would have never ran across this stuff and had little idea what it meant.

I've been looking for this data online for a few days and haven't been able to find it. Perhaps someone here will be find enough to direct to a link where I can download it.

To clarify, yes, I found the link in Joe's article. That helps. But is there an html version of a page that makes navigation easier? Also, is there a complete set that could be easily downloaded?

I'm pretty sure that the regular x and regular y are the pitch coordinates when not using enhanced gameday. These are available for pretty much all of last season, even prior to enhanced. I don't know that they translate to centimeters at all or if they are just plotting coordinates. If you look in the hit location xml file, there are xy coordinates there as well that are also scaled at 0:250.