Command PostAugust 31, 2007
And Now for Something Completely Different...
By Joe P. Sheehan

Rich wrote an article last week about Ryan Braun and included a link to Braun's hit chart on Fox Sports. MLB.com provides a similar chart, both of which show where the balls that Braun has put in play this year have landed. Both these charts only let you look at one stadium at a time for each player though, which doesn't give a complete picture of his spray patterns. If the player has enough at-bats in one park you can get a general idea of where he hits the ball, but it would be ideal to see where every ball he hit ended up. Another feature that would improve these hit charts would be an indication of how the ball was hit, either on the ground or in the air. MLB.com does have the ability to show fly-outs and ground-outs, but it doesn't split up the hits based on their flight path, which is important too.

I try not to complain unless I have a solution (yeah, right), and after Rich's article prompted me to start to playing around with the XML files that support MLB.com's hit chart, I made my own hit charts that added my features. I think adding these features will make the hit charts much more informative and valuable, and you can get an more accurate idea of a hitter's hitting pattern and potentially visualize some other cool things.

Looking at an individual player is a good place to begin examining the new hit charts and below are two charts for Kevin Millar. The chart on the left shows how each ball was hit, independent of if it was a hit or an out, with the black dots being ground balls, red being line drives, the blue representing fly balls. Millar has a reputation as a pull hitter, which is apparently well deserved judging from his line drives and deep fly balls to left field. He only has five line drives to the right side of the field, which makes you believe when he does hit to right, he isn't driving the ball at all.

The results of his at-bats, shown in the chart on the right, confirm that Millar doesn't have much success hitting to right field. On this chart, the black circles represent all outs, while the green dots are singles, the yellow dots are doubles, blue are triples, and red are home runs. You can see when he does go the other way, it is usually not very well hit, and results in an out. If there were ever a right-handed hitter to use an over-shift against, Millar is the perfect candidate. (I had a problem adding legends to the charts, so any chart using just red, blue and black dots is showing how each ball was hit, regardless of if it was a hit or not, while any graph with red, blue, yellow and green dots shows the result of a ball in play, such as a single or double.)
millar.png millar2.png
David Ortiz on the other hand, frequently faces an over-shift and still hits very well. Based on the locations where he hits balls, Ortiz seems to be almost a mirror image of Millar, although Ortiz hits to left more than Millar hits to right. The big difference between Ortiz and Millar is how they hit the ball the other way. While Ortiz hasn't hit many home runs to left, he does have a bunch of doubles that way. One reason for this difference is the Green Monster, but opposite field hitting/power is important for Ortiz, if for no other reason than to make teams slightly wary of using the shift. You can actually see some of the results of the shift, with an extra cluster of groundball outs behind where the 2nd baseman usually plays.
ortiz.png ortiz2.png
Moving on from batters, the same hit charts can be created for pitchers, in this case for teammates Fausto Carmona and Paul Byrd who, respectively, have the highest and 2nd lowest groundball percentages in the American League. Carmona and Byrd are as far apart as you can get in terms of how they get outs and their graphs reflect the differences in their styles, with Carmona relying heavily on infielders and Byrd mostly using his outfielders.

Carmona:
fausto.png fausto2.png
Byrd:
byrd.png byrd2.png
Another interesting thing to do with the hit charts is get a rough idea of the defensive ranges of players. Below is a chart showing every ball in play that Yankee pitchers have allowed at Yankee Stadium this year. You can actually see where the outfield wall should be, based on the location of the doubles and home runs and while it's tough to see fine details on a chart like this, you can almost make out the deeper fence in left field compared to right field.
yankee.png
Continuing to look at the outfield, you can get an idea of where the Yankees' defense has allowed hits this year. Using the outfield hits and outs as a guide, there appear to be three zones where hits don't occur in the outfield, one for each fielder. These areas are surrounded by hits of all types, which give a rough idea where the zones end. There is some overlap between the zones, caused by different outfielders being in the game, different positioning by the outfielders and probably the different scorekeepers tracking the balls, but even with these three problems you can get an idea of the range showed by Yankee outfielders. The only problem with those ranges is they don't really mean much for individual players, except in right field for Abreu, because the other positions have been manned by several players for the Yankees this season.
sizemore.png
However, if you have a fielder who has played in all of his team's games, the ranges become meaningful on an individual level. Above is a chart showing every ball in play that Indians pitchers have allowed this year, and while the Indians have used several outfielders at the corner positions Grady Sizemore has been a fixture in center the whole year. Sizemore is a great defensive outfielder, which is shown several ways on the chart, most obviously that there are few hits to center field. This could be due to a scorer bias of somehow mis-marking hits (which I don't think is happening), but it seems that Sizemore simply covers a lot of ground, especially compared to the Indians' left fielders, where there appear to be some doubles and triples on balls that are hit right at them. The range of the right fielders appears to be slightly larger than that of the left fielders, but still smaller than Sizemore's. Another bit of evidence for Sizemore's defensive prowess is the lack of hits directly over his head. A ball hit over the head of an outfielder is one of the hardest plays to make, but Sizemore has made virtually all of those plays. (There is one possible explanation for the lack of hits behind Sizemore that is not related to his defensive skills but rather based on the two clumps of doubles at the wall, on either side of Sizemore. Because balls are marked where they are picked up and not where they land, these balls could have landed directly behind Sizemore and been picked up off to the side. With the current data, you can't really tell for certain which actually happened, but comparing Sizemore to Yankee Stadium, it seems like the mis-marking is happening to some degree.)

The next step in analyzing where balls are hit to is to look at what pitches were hit to certain areas. In order to answer this question I needed to merge my hit location database with my pitch database. With this "super-database", I can show hitting charts based on any conceivable split. Want to see how and where balls have been put in play against Paul Byrd when he has two strikes on a left-handed batter? Look no further. Below on the left is Byrd vs. left-handed batters with two strikes. The same situation for right-handed batters vs. Byrd is on the right. Neither graph is drastically different than what you would expect, with the more balls being pulled than hit to the opposite field. Balls that are hit the other way are not hit as far and tend to be fly balls as opposed to line drives.
byrd3.png byrd4.png
Getting a little more in depth, how about looking where different pitches are hit? Below are charts for where Justin Verlander's fastball has been hit by left-handed hitters (on the left) and right-handed hitters (on the right). Generally when hitters pull his fastball it is on the ground, but if it is hit in the air, it goes to the opposite field. This distribution of flyballs and groundballs doesn't appear to be unique for Verlander.
verlander3.png verlander2.png
Going a little further, here's a chart showing every fastball, thrown by right-handed pitchers, that has been put in play by a right-handed hitter.
rhb.png
This is overkill, and if you can read anything into this graph you're a better man than me. I don't have the ability to sort every pitch based on reaction distance yet, but using reaction distances would probably be a better solution than just using "fastballs" and "change-ups". Using static definitions of a pitch, you run into the problem of groupingJamie Moyer's 84 MPH fastball with Verlander's 95 MPH fastball. Hitters are going to react and hit those pitches differently and this chart doesn't show that.

There are some problems with the MLB.com hit location data, primarily that the balls are marked based on where they are picked up by a fielder, not where they first hit the ground or where they go through the infield. By marking where a ball was picked up, you lose the information about where it should have/could have been fielded. Knowing where an outfielder picked up a ground ball is nice, but knowing exactly where that ground ball went through the infield or where a fly ball actually landed would be better. Another possible problem with the data is the ability of the scorekeeper to really know where the ball landed. There aren't any landmarks in the outfield to gauge where a ball was picked up which makes it harder to accurately plot the data.

These hit charts can help create informative profiles on hitters, pitchers and stadiums and on a large scale they can even help visualize player's defensive ranges. One big advantage with the hit location data as opposed to the pitch data is that the hit chart data is complete for all stadiums for the whole year. Scorekeepers manually enter this information for every ball in play, and it even goes back for several years, allowing for possible comparisons across years.

Comments

Nice data, but the presentation needs an overhaul. (See Tufte, and remember that red + green is bad news for the color blind portion of your audience).

Good stuff, Joe. You keep adding value converting the GameDay data to charts. The possibilities seem limitless. Thanks.

Cool stuff. I had been thinking of merging the hit chart and pitch data for a batter or two to see if the batter could hit a certain pitch farther than another (just for my own fun). It looks like you could do it a lot quicker than me and do a more comprehensive bunch.

All the minor leagues also have the hit-to locations (although no pitch-by-pitch data), but I don't know the accuracy of it. They might just be put in a random spot according to the scorekeeper (e.g. "fly out to right field" is put in a random spot in right field).

This is really grand