Touching BasesApril 15, 2010
Clusters in the Outfield
By Jeremy Greenhouse

"I waved in my outfielders. When they got in around me, I said, 'Sit down there on the grass right behind me. I'm pitching this last guy without an outfield.'" -- Satchel

Outfielder positioning has been a hot topic over at The Book Blog recently. Max Marchi has done some great research on the topic of defense positioning.

Using MLBAM data, which reports the location of where the ball was fielded, as well as Peter Jensen's Gameday translations, I queried the hit locations of balls in the air that left the infield but stayed in the ballpark. I restricted my sample to only hitters who had at least 100 balls in the air from one side of the plate through 2008-2009. I then ran a k-means algorithm that split the spray chart into three different clusters. I wouldn't say that the centers of each cluster indicate where a fielder might be positioned, since a lot more than just getting to balls goes into positioning, but one might put it that they indicate the middle of a fielder's area of responsibility. I think of it as a tidy way to quantify someone's spray chart.

For example, Joe Mauer hits the ball in the air the other way a lot. The left-fielder is responsible for three times as many fly balls off Mauer's bat as the right fielder. Conversely, Carlos Pena pulls a fair share of his fly balls. Assigning each ball to a fielder yields the following chart:


Logically, a fielder would get to the most balls the fastest by standing in the middle of his zone. Again, that often doesn't align with the actual job of the fielder, which is to prevent runs. Averaging the clusters produces the following centers:


So the difference in the average hit locations between a great pull hitter and a great opposite-field hitter comes out to around 30 feet.

The most interesting and informative chart is probably the one that splits batters by handedness.


On average, corner outfielders have to move 15-20 feet depending on the handedness of the batter. This is the result of pulled balls traveling farther than opposite-field balls. The center fielder only moves five feet in general. Grouping by pitcher handedness didn't produce any visibly different results.

Now, I'll look at some of the most extreme differences in cluster centers. While Pena and Mauer have an extreme difference in the rate of balls they put in play to each field, their clusters were in close proximity as compared to Scott Podsednik and Ray Durham, whose centers were 50-100 feet apart.


As for right-handed batters, Derek Jeter is the only player who hits a higher rate of balls in the air to the opposite field than Joe Mauer. Jeter leaves the right fielder responsible for over half of his fly balls, and he forces the right fielder to play closer to the line than any other right-handed batter. I'll compare him to Jesus Flores.


All of the previous charts have dealt with fly ball angle, but fly ball distance is just as important in outfield positioning. The first pair I noticed was Cody Ross and Gregor Blanco


Here, we see some of the unreliability in either the GameDay location data or the pixels-to-feet. Cody Ross has power, and power to center, but something is off. He doesn't routinely hit 400-foot flies that stay in the ballpark. Oh, well.

Looking at pull power to left, there's a more realistic difference between Chris Iannetta and Ryan Roberts.


And the obvious choice for the final coupling is Luis Gonzalez and Paul Bako.


The only player for whom my clustering algorithm spat out something funky was Clete Thomas. His spray chart is unusual in that he appears to have decent power to left-center, but not so much to right-center, which creates a distinct region in left-center where no fielder would ever play, and leaves a neighboring vacancy where the center fielder is traditionally positioned.



Great stuff Jeremy! You ought to be selling this stuff to teams, although they can probably come up with pretty much the same stuff just by eyeballing the spray charts.

The real trick is to figure out how to regress the spray charts. Clearly a hitter's sprat charts, especially in small samples, is not what he is going to do in the future, which is all teams care about. For example, I doubt that the TRUE total angle (between LF to RF) of a hitter's spray pattern varies much among hitters such that when you see an OF alignment whereby the fielders are bunched up (against a so-called "gap hitter"), or another one where both corners are close to the line, my guess is that the OF is not in an optimal configuration.

Great, great stuff, Jeremy!

Jeremy - After I wrote the article that included the distance multipliers for translating the hit locations I found that it was necessary to have separate multipliers for balls hit to the outfield. This doesn't affect the angular measurements in your article but may account for some of the differnces in outfield distance in your Ross-Blanco graphs. A more important consideration is that Gameday hit locations are where the ball is picked up on hits, not where it hits the ground. That will make a big difference on line drive hits in the gap that manage to make to the wall where the wall angle will cause the ball to kick toward the center fielder. It probably would make your graphic analysis more useful to distinguish between balls caught in the air and FB and LD hits.

Thanks for the kind words, MGL and David.

MGL, I agree that it would be much trickier to take into account regression and out values.

Peter, yes, I knew that the multipliers were different for the outfield, which is why I didn't report the differences in distance for a few graphs. I think the illustration still does the trick, though.

Good idea using only balls that are caught. I think I might experiment more with this subject.

Good article. next time, though, use different colors than red and green so that us colorblind people can have an easier time reading your charts.

Sorry, Alex. Red and black and blue from now on.

I agree about the great stuff, and also about the lousy color selection. :) Better tip: just use a grayscale instead of colors and use different symbols instead of always using circles.

Thanks, Studes. Will do.

Are the sample sizes for switch-hitters large enough to run a comparison of, say, Victor Martinez against himself, from each side of the plate? Do some switch-hitters show power to the same areas of the field regardless of which way they're swinging at the moment?

Excellent work, Jeremy. Blanco is especially interesting.

These are some really cool graphics, well done.

Would you be able to look at the spray charts depending on the pitch? Say if they are pitching someone inside or outside how it affects them.

I look forward to whatever you do next with this data, it should be very interesting.

Nightfly, good idea to compare switch hitters to themselves. Might try that next post.

Fat Ted, looking at pitch velo and pitch locations is the topic I'm thinking of trying for next post as well.