Designated Hitter April 19, 2007
More Fun With Enhanced Gameday

Enhanced Gameday is back for the 2007 season. The information presented in the XML files remain very similar from last year's playoffs, with a couple of additions. Two fields have been added: (1) break length, which is used to quantify the break of a pitch in a different way than the vector sum that was used in the playoffs and (2) break angle, which is used to indicate how the ball appears to the hitter as it approaches the plate. I'm still trying to figure those fields out and incorporate them into my analysis. Although the enhanced system is currently running in only eight stadiums, there are still a bunch of interesting things to study.

One thing I didn't have time to include in my first article was a look at how consistent pitchers are from start to start, in terms of movement on their pitches, their release point and pitch location. Ideally, I could look at many starts from a pitcher and find common features for good outings and bad ones. However, the season is only two weeks old, so I don't have more than a couple of starts from most pitchers, but two starts is better than no starts.

When a pitcher has an unusually good (or bad) outing, occasionally the reason given is that his pitches were "on" (or "off") that day. Part of a pitch being "on" is locating it for a strike, but the movement of a pitch is also important. The adjoining graph shows all the pitches that John Lackey made over his first two starts this season, colored by pitch type. Each pitch thrown is represented by two points. The solid circle is the break in the horizontal direction and the hollow circle is the vertical break. Looking quickly at the graph, you can tell that he threw pitches 'A' and 'B' much more than pitch 'C.' You can also see the different basic movement for each pitch. Pitch 'A' and 'C' both had the same relative movement, breaking 10 units vertically and less than 0 units horizontally.

Lackey's first two starts this season were good. On April 2, he pitched 5.0 innings, allowing four hits and giving up an unearned run. He also walked four batters and struck out five. On April 7, he went 7.0 innings and allowed seven hits and one run, while striking out six and not walking anyone. One thing I noticed from these graphs was that the movement on his fastball was very consistent in both starts. The vertical break was virtually identical, while the horizontal break was very close as well. Pitch 'C' was not as consistent however. He threw more of pitch 'C' in his first start than in his second one, and the horizontal break was also bigger by several units in the first start. The patterns of the clusters for pitch 'B' changed as well. I'm not sure what to make of the differences for pitch 'B,' but the biggest difference between Lackey's two starts was his ability to avoid walks in his second start, so maybe it impacted that somehow.

I'm also interested in how consistent the release point of a pitcher is. A visual of Lackey's release point for his two starts, colored by start, is presented below. This graph is a close-up of the release point graphs from my previous article, and shows how consistent a Major League pitcher can be with his release point. In his second start, he threw all but three pitches from a roughly 4.2x5.9" window. However, in his first start, the window was roughly 4.2x10.1". I haven't looked in-depth at other pitcher's release points over different starts, but releasing every pitch from an area the size of an index card would seem to be pretty consistent for a release point. My first thought on seeing the difference in release points was that it could have caused Lackey's wildness in his first start. However, of the 26 pitches in the first start that were thrown outside the smaller window, 12 were strikes and 11 were balls, with three put into play, so the difference in release point doesn't seem to impact balls and strikes.

I read another analysis on release points at Lookout Landing, which measured the release point of Felix Hernandez as 1.6x3". I have release point data from Hernandez' season opening start versus the A's, so I can measure his release point too. Here's a visual of Hernandez' release points (below left), which shows he released his pitches in a roughly 11.3x7.1" box. While I found Hernandez to have a bigger release point window, I'd still say that releasing every pitch from a window the size of a piece of paper is consistent. The study from Lookout Landing only looked at when Hernandez threw a fastball and a curve, but even when I just look at fastballs, I still get an 8.6x5.6" window.

Regarding release points, being consistent isn't necessarily a good thing. Consistent mechanics would seem to lead to a consistent release point, but pitching machines have totally consistent release points and are designed to get hit hard by batters. Having a little variety is definitely a good thing, as it keeps the hitter guessing and prevents him from always looking for the pitch in a specific spot. Jeff Weaver might not be the best example of a good pitcher, but he has a release point that looks like the blast pattern of a shotgun. His release points from the 2006 playoffs are shown in the above graph on the right. Weaver had a successful run during the playoffs, so I would be interested in comparing this to his release points during a similar run of failure.

The consistency of pitch location is another feature of a pitcher's start that I want to analyze. The following two graphs illustrate the location of all of Lackey's pitches in his two starts, colored by the speed of the pitch. Both graphs are from the perspective of the catcher, and show the same basic pattern for pitches outside the strike zone. In both starts, he primarily threw pitches outside the strike zone to two places, what would be up and in for right-handed batters and low and away for right-handed batters. I imagine that these results suggest patterns that Lackey has with setting up hitters and how his pitches break. There are green dots, pitches that were 80-84 mph, in the lower right of each graph, while his fastball, which was in the 90-95 mph range, was never thrown below the strike zone. Both graphs also show a lack of pitches thrown at the top of the strike zone as well as inside to a lefty. These patterns could be chance events, but they also could be real decisions that Lackey made during his start. I would suspect that the handedness of the batter impacts the pitch selection and location, but Lackey hasn't thrown enough pitches to lefties to notice any change.

These next two graphs are more for fun than anything else right now. They are the same as the two above, except that they quantify the pitch density in different areas. The strike zone is the red square in the middle, and the numbers show the percentage of pitches thrown in each region of the strike zone by Lackey for that start. These graphs reinforce what the above graphs show, but are pretty cool to look at. With more data, the patterns in these graphs will become more statistically meaningful and could even be expanded to show BABIP for pitches in different sections of the strike zone.

Joe P. Sheehan played baseball at Oberlin College and graduated in May 2006.

How do you get the XML feeds for the games and is there a program to parse the data or import it into a spreadheet?

Astonishingly cool stuff, Joe. Thanks.

MGL,
What I've done to put xml files in excel is I copy the xml text into notepad and at the beginning of the file I put . Then I save the notepad file with an .xml extension. Then all you do is open the .xml file in excel. Hope that works for you and if there's a better way to do it, please someone tell.

Well, the site interpreted what I put after "put" as a html tag, so I'll try again and hopefully this makes sense: "less than sign ?xml version="1.0"? greater than sign"

What I do is save the webpage as an XML file in either IE or FIrefox. Then I go to Data> XML> Import in Excel. Excel then automatically gives you the data in columns.

I've had good luck with going into the Data menu in Excel and using Import External Data/New Web Query. You can pull in the XML directly that way. Some verions of Excel don't mind if it doesn't have the XML version tag.

In your charts you have the strike zone width at 2 feet. Isn't home plate 17 inches wide?

Outstanding work. Even for an Oberlin grad. Sorry to hear you couldn't get into Kenyon. ;)

Dan, how do you save the gameday webpage? hen I open it in a new window, i have no options to save as an .xml file or whatever. Thanks for any help

Here's a web page with help on how to get & use the MLB Gameday data ...

http://www.friarwatch.com/2007/04/23/how-to-use-mlb-gameday-data/#comment-40

Amazing stuff!