Touching Bases November 05, 2009
Visual Scouting Reports (Beta)

What if I could just punch a couple lines into my computer and get to see the strengths and weaknesses of a player in graphical form? Harry Pavlidis does a good job using pitchf/x data to give a brief summary of pitchers, and Dave Allen is like King Midas graphing with R. I've set out to develop my own set of hitter graphs and I ask for your help in improving them for future, more in-depth, player analysis.

Here's what I've got so far, using Jayson Werth's 2008-2009 data as an example.

I'll break down the three components one by one. For now, the graphs represent the three most meaningful locations of the baseball's flight--from the pitcher's hand to the strike zone to the hit location. Here's Werth's "Batter Zone."

These are from the batter's perspective. Here, you can see Werth's expected run value is worst against pitches up in the zone and down and away. As you know, this holds true for most hitters. Where you see blue on the graph on the left, he performs worse than his average self. Then on the right, you see how he compares to the league average. He excels on pitches down and in, but is worse when challenged up.

So, how to improve these visualizations? I'm using a standard strike zone, but I'd like to create contour lines showing each batter's individual strike zones, and swing zones, showing where he's most likely to let it fly. I'm unsure how large the data frame should be. Right now, set at four feet by three feet, it captures the intricacies within the strike zone, but it might be leaving out some information for players like Vlad. The downside to expanding the frame is that for most graphs, the extra space will be occupied entirely by the average value of a ball, which will overwhelm the details of the visual. Lastly, for the graph on the right comparing Werth to average, I don't know whether to fix the color bar so that great hitters, like Chase Utley, appear red everywhere, since he's above average at everything, or to color in blue locations where he has a mere expected value of .01 runs better than average, since he's not as awesome in those locations as he is in others.

Here is how Werth does against release points, which is informative in showing his platoon splits.

It appears to me that Werth has a normal platoon split, but struggles a fair bit against righties with a lower arm slot.

Lastly, Werth's spray charts.

Werth pulls his grounders at a high rate. In the outfield, depending on the precision of the data, the center fielder should shade a bit towards left.

I'd appreciate any input on how to improve this set of graphs. I'd also like to come up with graphs to show how hitters fare based on velocity and movement, but nothing comes to mind, and I have ideas for how to present hitf/x data if we ever get more of it.

I ran through the Phillies lineup excluding switch-hitters, so here they are, with brief comments. A quick glance at these graphs certainly won't give you any answers, but it might give some food for thought.

Chase Utley:

Utley is an insanely good hitter, no matter where you pitch him. However, don't try to brush him back, as Buster Olney suggested, because he will take his HBPs, which I'm guessing is what that graph's upper-right red portion consists of. He pulls almost everything.

Howard also famously pulls his ground balls. Shifting against him is an obvious strategy, but the real question is where the third baseman should play.

Ibanez has similar batter zones as Utley, but he's not as good anywhere.

Feliz is actually a good hitter on pitches away. I'd imagine that's because he lays off of most of them, since he can't hit them anyway. But he can be beat on the inner half. Feliz shows no platoon split and a normal spray chart.

Carlos Ruiz:

Boy did Ruiz have a great series. He hits most of his flies the other way,but has hit all of his home runs to his pull field.

this is trivial, but more meaningful axis labels would be nice. you can retain the px and pz stuff, if you want, but i'd put them in parens after an english description of what they are.

Jeremy,

Fantastic work here. If Dave Allen's the King Midas of R, then you must be the prince. I especially like the creativity of the 'butterfly' release points although I wonder how the sample size on the edges skews the visual. Are those points meaningful?

I was a little confused at first by the hit location visualizations only because I thought they were plotted on the same background. I figured it out once I realized that grounders shouldn't be hitting the wall very often. Are you using GameDay x,y coordinates for these? If so, how did you handle grounders that went through the infield? Did you ignore them like Jeter apologists do? Zing.

Thanks dockmo. Will do. And I meant to label the colorbar as "run value. Will improve the labeling for sure.

I must be missing something. Utley's chart is mostly dark blue. Isn't dark blue bad?

No. Blue is bad relative to other locations. So for him, the low end is 0 and the high end is .15 or so, so blue represents actually good. Do you think it would be better to make Utley's entire chart red which would make it more difficult to differentiate places where he's super awesome vs. where he's merely great?

I'd definitely go with changing the color over changing the scale. The color scale should be constant so you can easily compare player to player, and not have to look at the scale every time. The way it's set up right now, Ruiz and Feliz looks better than Utley and Ibanez if you're just taking a quick look.

Also, I don't see a label anywhere for what the different colors are showing exactly. Maybe I just missed it, though.

Other than that, these are awesome. Great job.

I think it would be helpful if the magnitude of the difference between blue and red was consistent between players. For example, Utley's second chart has a range of >0.15, while Feliz's second has a range of only 0.10. What if there is a player that hits all balls nearly equally? It would be weird to have a whole blue-red range displayed when the difference between the extremes is only, say 0.02.

That is totally awesome.

With the analysis you provided that is a service I would be willing to pay for. Let me know if you ever make it available for all players.

This is pretty cool. It would be interesting to look at batter-pitcher matchups based on strengths and weaknesses in the strikezone for both the batter and the pitcher. For example, when A.J. Burnett pitches to Chase Utley, what areas of the strike zone with which pitches should he exploit the most given Utley's weaknesses and his own strengths? Not sure how you could merge that into one graphic rather than side by side, but it'd be pretty informative I'd imagine.

DavidH, I will try that out. I think I'll do the graph that compares his performance against himself as a blue-red scale and the one that compares his performance against the league as a set scale, so above average is always red.

Albert, thanks for stopping by. I could make the same graphs only using breaking balls from right-handed pitchers, but right now I'm not trying to look at anything so specific.

What program do you use to make these charts?

those colors are far out man!

kidding of course. Id like to see the splits for Howard vs rhp and lhp(if anything shows up at all on lhp!), he might have been the Yankees series MVP.

RZ, I'm using R.

Eric, yeah, the Yankees LHPs did a great job neutralizing Howard. If he keeps slipping, a platoon next year might actually be the right move.

Makes sense.

For the spray charts showing the outfield, it would be nice to get some range rings or something. This would make it easier to compare distances and relate them to different fields.

Tom and Dan,

Thanks for commenting. Your comments must've gotten caught in the queue, so sorry it's taken me a bit to respond.

Tom, the way the local regression works makes it so those outer points won't skew the rest of the results, but they in turn are less likely to be influenced by most of the other points. The grounder graph goes to 160 feet. Any grounder that got through 160 feet, I used some trig to backtrack where it would have gone through the infield at the 160-foot mark.

Dan, I'm going to improve the labeling for sure. And I think I'll have one graph with the ranges set, and the other determined by the hitter's strengths and weaknesses. See how that looks.