F/X VisualizationsFebruary 05, 2010
Thoughts on a New Box Score
By Dave Allen

I have fond memories of, as a child, reading box scores in the newspaper. In the pre-internet, or at least pre-internet in my house, days box scores in newspapers was the medium by which I, and I assume, most people consumed baseball data. The data were all there, tightly yet efficiently packed in a format that allowed you to pull out any or all you wanted without feeling overwhelmed. Each was small enough for box scores for all the day's games to fit on one page.

I still read box scores, the medium has changed to the internet, but the box score itself is largely the same. I guess the format has stayed largely the same since the mid-1800s. Some of the stats are different but the layout is very similar. Over 150 years with little change shows that the format is remarkably successful, but that does not mean there cannot be innovations. FanGraphs's WPA charts are not box scores per se, but are a very effective way of presenting what happened in a game.

I thought it would be an interesting exercise to attempt to create a new box score. I wanted it to retain the original box score's quality of presenting a relatively large amount of information in a relatively small space, but making that data accessible and not overwhelming. Beyond that I hoped my new method gave a more immediate feeling for the pace and tenor of the game, like the WPA chart does.

Here is my attempt. The image is may be too small, but I kept it that way so that it didn't push out the right margin of the page. You can click on it for a larger version. I used game one of the 2009 World Series for the example.
New Box Score
Each at-bat is represented by a bar, the height of which denotes the base the batter reached. White bars are for outs, black for hits or walks. The batter's progression around the rest of the bases that inning is indicated in gray (steals have a vertical black line through them). Runners on-base during an at-bat are indicated in red: circles for those not moved over in the at-bat, lines to show their progression as a result of the at-bat and an 'ex' if they were thrown or tagged out in that at-bat.

The score can be counted along as the black or gray bars reach the top. That also allows you to count individual batter's runs scored or pitcher's runs allowed. Red lines that reach the top are RBIs.

Compared to a traditional box score it is harder to find an individual player's line. For example to see that Chase Utley went 2-4 with 2 HRs, 2 runs, 2 RBIs, a strikeout and a walk you have to go through, find his at-bats and count all of the events. But the trade-off is, I think, this formulation gives a better feel for the pace of the game, and allows the events to be easily recreated: in the top of the first CC Sabathia escaped a base-loaded two-outs jam; Phil Hughes took over to start the eighth and walked the only two batters he faced, both of whom came around to score on Raul Ibanez's single; Utley's two solo-HRs were the only runs through the first seven innings; Cliff Lee didn't allow a runner past first until the ninth, and up to that point faced just three batters over the minimum; the Yankees burned through five relievers, who gave up four runs, in the last two innings; the top of the ninth ended with Shane Victorino getting thrown out at home on a Ryan Howard double and the game ended with two more Cliff Lee strikeouts. All of this can be easily seen through a close, but not difficult, reading of the chart.

What do you think of this format: Complicated and poorly laid out? Hard to read? Brilliant? I welcome constructive criticism in light of what you want from a representation of a baseball game.

F/X VisualizationsJanuary 22, 2010
How Do Pitchers Change Their Approach Against Good Hitters?
By Dave Allen

Nick Steiner, who over the last couple months has been producing some great pitchf/x content, had an interesting piece asking how many HRs Albert Pujols would hit if he saw the same pitches as Juan Pierre. He wrote the piece in mid-September and concluded he would have hit 62 HRs up to that point in the season. It is a very cool question, and implicit in it the question is the understanding that pitchers pitch differently to good hitters than they do to not-quite-as good hitters.

I think this is a very interesting idea to explore further, and the PITCHF/X data set is a great tool for it. To do that I created two groups of hitters. First the twenty regulars with the top wOBAs in 2009 (wOBA is a stat of TangoTiger's construction that measures overall offensive impact), and second the twenty regulars with the lowest wOBAs in 2009.

One common assumption is that good hitters see fewer fastballs and this analysis bears this out. The top-wOBA group saw 58.4% fastballs versus 61.5% for the bottom-wOBA group. But that actually understates the difference. The top group saw many more pitches in hitter's counts and pitchers throw more fastballs in hitter's counts. It is best to consider the difference in each count.

Fastball Frequency by count
        top   bottom
0-0   0.626    0.663
0-1   0.551    0.545
0-2   0.549    0.511

1-0   0.587    0.664
1-1   0.542    0.559
1-2   0.497    0.484

2-0   0.659    0.780
2-1   0.579    0.679
2-2   0.530    0.528

3-0   0.717    0.848
3-1   0.735    0.823
3-2   0.591    0.705

Here you can see the difference is largely driven by hitter's counts (e.g., 1-0, 2-0, 2-1, 3-0, 3-1) where the top group saw on average 10% fewer fastballs than the bottom group. Interestingly in pitcher's counts (e.g., 1-2, 2-2) the differences are very small.

The next thing we can look at is where those pitches end up. Here I plot the location of fastballs to the two groups. Areas where the top-wOBA group sees more pitches are red and where the bottom-wOBA group are blue.

FA_rr_loc.png

Not surprisingly the top group sees many fewer balls in the strike zone. The extra pitches end up inside more than they end up outside, which is a little surprising to me. This also shows that the pattern of good hitters seeing fewer pitches in the zone is not just a result of them seeing fewer fastballs, which are more likely to be in the zone. That is good hitters see fewer fastballs AND the ones they do see are less likely to be in the strike zone.

Overall the top group saw 47.6% of their pitches in the strike zone, compared with 51.8% for the bottom group. But again this 4% difference understates the difference because the top group gets more hitter's counts in which pitchers should be around the zone. Breaking up by count we see:

Proportion of pitches in the strike zone
        top   bottom
0-0   0.507    0.548
0-1   0.428    0.473
0-2   0.325    0.325

1-0   0.505    0.575
1-1   0.478    0.526
1-2   0.376    0.424

2-0   0.505    0.592
2-1   0.545    0.580
2-2   0.443    0.489

3-0   0.471    0.554
3-1   0.607    0.646
3-2   0.553    0.598

Here the difference increases to 4% to 7% in each count. It is clear the pitchers avoid the heart of the zone, and the zone as a whole, against the better batters.

This is another example where the pitchf/x data support the prevailing assumptions: good hitters see fewer fastballs and fewer pitches in the zone. But there are some interesting patterns: the smaller frequency of fastballs seen by good batters is largely driven by a much smaller frequency in hitter's counts -- not all counts across the board -- and the out of zone fastballs that good hitters see are more likely to be inside than outside.

F/X VisualizationsJanuary 15, 2010
The Tigers and Pirates Sign Probable Closers
By Dave Allen

Yesterday the Tigers and Pirates signed their probable closers. Both teams had question marks at the back-end of their bullpens and found free agents who should have no problem sliding in to the closing roles.

Octavio Dotel

The Pirates -- who had non-tendered Matt Capps leaving their closer position empty -- signed Octavio Dotel. Using a fielding-independent pitcher-evaluation framework that gives pitchers credit for strikeouts, ground balls and avoiding walks (a framework Rich used to rank pitchers back in February), Dotel succeeds in spite of giving up a lot of walks and not getting many grounders by striking out just under 11 batters per nine innings.

Although he also throws a slider and curve ball, Dotel throws his fastball almost exclusively. Last year he threw it over 82% of the time and you have to go back to 2003 to find a year he threw it less than eight times out of ten. Relievers who throw a fastball that often usually bring the heat -- think David Aardsma, Mike MacDougal or Matt Thornton -- but Dotel's fastball averages just 92 MPH. In fact among the ten relievers who throw a fastball most often Dotel has the slowest fastball.

Still this slow fastball is very good . Batters miss a quarter of the time they swing at it, compared to an average whiff rate of just 14%. The result is that over the past three years he is in the top fifteen among relievers for whiff rate (or the lowest fifteen for contact rate).

Part of the reason for this is Dotel pitches up in the zone where batters whiff more often, though rarely hit grounders. I broke the zone into bins and compared the fraction of his fastballs in each bin to the average RHPs fastball to RHBs, the more red the color represents bins where Dotel throws fastball more frequently and the blue less.
dotel_FA_loc.png
Dotel has a consistent swath, from up-and-in to down-and-away, where he throws his fastball. In that swath he throws the ball more often than the average righty and outside he throws the ball less. This is a pretty good place to be, as up-and-in and down-and-away are the most successful locations for a fastball.

The Pirates get a very good relief pitcher in Dotel: his career ERA out of the pen is 3.11, supported by a FIP of 3.36. This should make him a solid closer. (Thanks to Rich for noting my error, including his innings as a starter in his ERA, here.)

Jose Valverde

Valverde has a good pedigree of closing games for the Diamondbacks and then the Astros. He should take the Tigers' closing role, as they had three flame throwers, Ryan Perry, Daniel Schlereth and Joel Zumaya, who can rack up strikeouts but give up too many walks.

Valverde is a little bit better than Dotel. He strikes out just as many batters but is a little better at limiting walks and gets a few more grounders, though still is predominately a fly-ball pitcher.

Valverde brings the heat with a 96-mph fastball, but mixes in a splittler which he throws about a quarter of the time. The splitter is a very good pitch. He throws it slightly more to lefties, and the pitch, like a changeup, has a very small platoon split. In fact over the past three years -- before that he did not throw it as often -- he has had small to negative platoon splits.

Also, while his fastball is an extreme fly-ball pitch, getting just 31% balls in play on the ground, the splittler, which 'sinks' in comparison to his fastball and is thrown lower in the zone, gets 57% ground balls per ball in play. So the pitch keeps him from being as extreme a fly-ball pitcher as Dotel.

Valverde is also a very good relief pitcher, he solidifies the back-end of the Tigers bullpen and should be a good closer. Still some found the price, a two-year 14-million dollar deal and a draft pick, a little high.

F/X VisualizationsJanuary 08, 2010
Looking at Some BBWAA Vote Trajectories
By Dave Allen

First off congratulations to Andre Dawson on his election to the Hall of Fame.

In this post I want to look at the some of the other players on the ballot and see what we can say about their possible vote trajectory based on looking at historic comparables. But just to be clear from the outset, these are not predictions, as the small sizes are quite small.

In these plots I show how the vote share of each player changed over his subsequent ballots. Along the x-axis is the number of times on the BBWAA ballot, and on the y-axis the proportion of votes he got on that ballot. Circles indicate when a player reached the 75% level. I do not indicate players elected by any manner other than the BBWAA. For each graph I highlight a group of comparable players in red.

First off let's look at Roberto Alomar. He got 73.7%; that is the closest a first-year player has come to 75%. So I compared him to all players who received less than 75% but greater than 60%.
alomar.png
These players were all elected relatively quickly, with most elected the next year. Phil Niekro lost some support in his second year and had to wait the longest, five years, before induction.

Barry Larkin was next among first-year players. He got 51.6% and I looked at players who received within 5% points of that total.
larkin.png
These players were, also, all eventually inducted. Cy Young, the only player in this group to get in on the second try, was not in the Hall's first class, but his vote total shot up the next year and he was elected. I don't think this is the best comparison for Larkin; he will probably have to wait more than one more year. In this group Tony Perez took the longest, as his votes meandered upward for a number of years before being elected in 2000 on his 9th ballot. But history looks good for Larkin.

Next among first-year players is Edgar Martinez, who had 36.2%. There are a lot more players in this range so I just looked at those within 2.5% points.
martinez.png
Four of these players shot up quickly and were elected before their tenth time on the ballot. The other three never reached 75% -- although Jim Bunning came painfully close on his 12th time on the ballot -- but all were, ultimately, inducted by the Veterans Committee.

The only other first-year player to reach 5% was Fred McGriff. I highlight others withing 2.5% points of his 21.5%.
mcgriff.png
This presents a more muddled picture. Three guys reached 75%, in as fast as six years or as long as 13 ballots. Roger Bresnahan was elected by the Old Timers Committee, a precursor to the Veterans Committee. Red Schoendienst was elected by the Veterans Committee. Three of his comparables are still on the ballot and the last four didn't make it in.

Finally I am going to turn my attention to the two saber-darlings on the ballot: Tim Raines and Bert Blyleven. For these two we have more data than just their first year vote total so I am going to construct their comparables differently.

Blyleven, as Rich covered yesterday, came very close, falling just five votes shy. Here I highlight all other players who received over 70% but less than 75% on a ballot late in the process (tenth ballot or after).
blyleven.png
Going by initial vote starting from the highest: Bunning's first total was just under 40% and he -- as noted above -- reached 74.2%, but that was his high-water mark and was ultimately inducted by the VC. Next is Jim Rice who got 74.2% in his 14th year on the ballot and was elected on his 15th ballot last year. Duke Snider started with a total very close to Blyleven's and road a steady growth to the fastest induction among this group. Bill Terry started at just 4% -- this was before players who did not reach 5% were dropped -- and his vote share increased steadily and he was elected in his 14th year. Finally Red Ruffing, who also started below 5%, increased steadily and in his 14th year, 1967, received 72.6%. At the time if no player was elected the BBWAA would hold a runoff and the top vote getter would be inducted. Ruffing was elected in the 1967 runoff.

Tim Raines has been on the ballot for three years with the following totals: 24.3%, 22.6% and 30.4%. For his group I chose players who got between 15 and 35% in each of their first three years.
raines.png
This is a mixed bag. Jimmy Collins and Bresnahan were inducted by the Old Timers Committee, and Schoendienst by the VC. Two guys reached 75%, four are still on the ballot and the other four didn't make it. We will see how it goes for Raines.

Again with the small sample size these are in no way predictions, but an attempt to put these players's vote totals in some historical perspective.

F/X VisualizationsDecember 18, 2009
The Pitchers of the Next 'Big Trade'
By Dave Allen

Last week I looked at the pitchers involved in the Winter Meetings's 'big trade,' and then this week an even bigger trade went down. The Blue Jays sent the Phillies Roy Halladay for a package of three prospects. The Phillies then turned around and sent Cliff Lee to the Mariners for slightly lesser group of three prospects. By now there has been extensive analysis of the trade, but the emerging consensus is: the Blue Jays needed to trade Halladay before he became a free agent after failing to do so last season; the Phillies took a slight hit to their farm system for an upgraded ace willing to sign a long-term deal rather than test the free agent waters; and the Mariners, looking to compete for the AL West title in 2010, picked up one of the game's best pitchers.

As I did last week, I am going to take a pitchf/x look at the major league pitchers in the deal. They are two of the best pitchers in baseball. Over the past two years they are two of just seven starters to post an ERA below three. They did so throwing 482 (Halladay) and 455 (Lee) innings, only CC Sabathia has thrown more over that period. They rank one and two in lowest BB/9 and one and three for the highest K/BB ratio over that period. Halladay adds the third leg to the stool, by also inducing over 50% GB per BIP, which makes him a little bit better than Lee. Still these are two of the best pitchers in the game. Additionally by limiting walks they are able to go deep in games, which helps their teams by reducing bullpen strain.

Roy Halladay

I have written two articles at FanGraphs looking at Halladay's pitchfx numbers. The first broke down his pitches to RHB and LHB. It showed that he has a very even pitch distribution, throwing one of three pitches -- two-seam fastball, cutter or curve -- often to both LHBs and RHBs.

+-------------------+------+------+
|                   | vRHB | vLHB |
+-------------------+------+------+
| Two-Seam Fastball | 0.34 | 0.31 |
| Cutter            | 0.39 | 0.43 |
| Curveball         | 0.26 | 0.20 |
| Changeup          | 0.01 | 0.06 |
+-------------------+------+------+

Batters cannot go up and expect one specific pitch over 60% of the time like they do with against some pitchers.

The second post showed that he uses his cutter and two-seam fastball to give him a pitch to go inside and outside against both LHBs and RHBs. His two-seam fastball used inside against RHBs and outside against LHBs, and his cutter is the opposite. This allows him to avoid the middle of the plate, while varying the location of his pitches -- inside and outside -- to both RHBs and LHBs.

This helps explain his strike outs and walks, but whence the grounders? The obvious place to look is pitch height. Here Halladay's pitches are in red and the average in gray.

hallady_pheight_1218.png

His cutter is much lower in the zone than the average cutter, probably leading to his great groundball rate. His two-seam fastball is not that much lower than average, rather much more often in the zone, further reason for his low walk rate.

Cliff Lee

Over the past two years -- since Lee really emerged as a dominant pitcher -- no pitcher has a had a more successful fastball, which is a surprising fact. Part of this is that no one is better than Lee at getting his pitches, and his fastball particularly, in the zone.

To look at this in a spatially explicit manner I broke the strike zone and area around in into a number of bins. I calculated the frequency of Lee's fastballs in each bin and than compared that to the frequency for the average lefty's fastaball. Bins in which Lee had a higher frequency than the average lefty were red and a lower frequency blue. The intensity of the color indicates the size of the difference. As always the images are from the catcher's perspective, so RHBs stand at -2 and LHBs at 2.

lee_loc_1218.png

Against RHBs Lee locations his pitches more up and away than average, and as a whole more in the zone and just out of the zone than average. This is as expected. Against LHBs the pattern is even more extreme. In every strike zone bin he has a higher frequency than average, and is much lower in the farthest away from the zone bins.

Overall it was another exciting week. The Blue Jays cashed in on Halladay and to continue their rebuilding process. The Phillies got one of the best pitchers in the game, whose grounders should play very well in their small park and locked him up for years. And the Mariners picked a great pitcher, who as a lefty mitigates opposing LHBs's advantage at Safeco, as they look to compete in 2010.

F/X VisualizationsDecember 11, 2009
The Pitchers of the 'Big Trade'
By Dave Allen

In terms of excitement the Winter Meetings were underwhelming, particularly compared to their intense coverage. But, for three teams there was excitement in spades. As you surely know the Tigers, Diamondbacks and Yankees pulled off a big trade. Here I will give a pitchf/x-based look at some of the pitchers in the trade as an introduction to their new fans.

Edwin Jackson

Jackson had a breakout year in 2009. For the first time he got his BB/9 below three, and also for the first time the value was below league average. He was probably the beneficiary of some BABIP based luck, but he still was a very good pitcher.

He is, for the most part, a two-pitch pitcher.

+-------------+-----+-----+
| Pitch Type  | RHB | LHB |
+-------------+-----+-----+
| Fastball    | 60% | 67% |
| Slider      | 37% | 20% |
| Curve       |  2% |  4% |
| Change      |  1% |  9% |
+-------------+-----+-----+
Righties see the slider or fastball 97% of the time, and lefties 87% of the time. That is what you can do if you throw your fastball in the mid-90s and have a devastating slider.

It looks to me that the big step forward for Jackson was the out-of-zone swing rate on his fastballs. In 2007 the rate was 21%, then 25% in 2008 and now 28% in 2009. Swings at out of zone pitches turns balls into strikes or weak contact. Jackson's in zone percentage did not change much this year, so I think the decrease in walks was from batters swinging at his out of zone fastballs at a greater rate. It would take a little more digging to see why exactly they did that.

Ian Kennedy

In his 60 MLB innings Kennedy has not lived up to his incredible minor league numbers; Jeff Sackmann's Minor League Splits gives him a major league equivalent FIP of 3.83 based on his minor league career. The refrain is that his meager stuff can get the job done in the minors, but will not translate directly to the Bigs. But just 60 innings is not enough to make such a designation and, anyway, the Diamondbacks would be happy with a lot worse than a 3.83 FIP.

Kennedy throws a fastball that averages just south of 90 mph, a slider, curve and change that is about 10mph slower than his fastball. In limited time in the majors he did a good job of keeping his fastball away to LHBs and the change down and away.

ken_lhb_1211.png

In Arizona he should get a solid shot to establish himself as a starter on a longer leash than when he was in New York.

Max Scherzer

Scherzer is an exciting pitcher, striking out over a batter an inning while walking just 3.34 per nine. At 25 he is one of the game's top young pitchers. The consensus is that Arizona was concerned about his long-term health and wanted to cash in on him while he is still healthy.

He throws three pitches.

+-------------+-----+-----+
| Pitch Type  | RHB | LHB |
+-------------+-----+-----+
| Fastball    | 70% | 72% |
| Slider      | 20% |  7% |
| Change      | 10% | 21% |
+-------------+-----+-----+

Scherzer's fastball works in the mid-90s. His secondary pitch is a slider to RHBs and a change to LHBs. What make Scherzer an exciting and potentially elite pitcher is his ablity to miss bats, as evidence by his strike out per inning and also by his bottom 15 contact rate (in other words top 15 whiff rate). The extra whiffs come courtesy of his excellent fastball.

Whiff Rate
+-------------+-----+-----+
| Pitch Type  | Sch.| Ave.|
+-------------+-----+-----+
| Fastball    | 20% | 14% |
| Slider      | 26% | 27% |
| Change      | 26% | 29% |
+-------------+-----+-----+

You can see that the only place Schzerer is better than average is with his fastball. But because most pitchers, Schzerer included, throw mostly fastballs, so having a fastball that is far above average is going to lead to tons of strikeouts.

Daniel Schlereth

Schlereth is an electric reliever, over the course of his minor league career he averaged 12.8 K/9, but also 4.9 BB/9. He joined the Diamondbacks pen part way through and pitched about how one expect, 22 Ks and 15 BBs in just 18 innings. If he can cut down on the walks while keeping the big strikeouts he will be an elite reliever.

The most interesting thing about Schlereth's usage so far, and be warned this is based on just 18 innings, is he throws curveballs over 40% of the time. No full time reliever threw that many curves in 2009 . The curve is nasty with a 40% whiff rate. It will be interesting to see his pitch usage over a full year coming out Detroit's bullpen.

Both Detroit and Arizona have two very interesting new pitchers to follow next year. In addition we recently heard that Detroit might try Phil Coke as a starter, which is another intriguing aspect of the trade.

F/X VisualizationsDecember 04, 2009
Pitchf/xing Passed Balls and Wild Pitches: Part Two
By Dave Allen

Two weeks ago I introduced the idea of evaluating catcher's ability to prevent wild pitches and passed balls using the pitchf/x data. In that post I presented the idea and some preliminary findings.

Here I will present that evaluation. I constructed a model which gives the probability a pitch gets passed the catcher based on the pitch type, its location and the handedness of the batter/pitcher.

Before presenting how the catchers ranked under this model I will address some questions posed by commenters. First MGL:

Obviously most WP are pitches thrown in the dirt (I assume), and almost no PB are pitches in the dirt. That is important. Also, a fastball in the dirt is extremely difficult to catch. A slider is somewhat difficult and a curve ball is not all that difficult

The pitchf/x system gives pz, the height of a pitch as it crossed the plate. Negative values are possible, those pitches have hit the ground before they got to the plate, and if they could keep going down they would have ended up somewhere below the plate. Other pitches that are very low, but positive, when they cross the plate will end up in the dirt. If one were not lazy, like me, he could go back and calculate, roughly, if a pitch will have a negative height before it reaches the catcher. I did not do this, but just looked at the reported height as it crosses the plate. Anyway here is how sliders, curves and fastballs vary for PB+WP% by height.

height_pb.png

It looks like MGL is correct low fastballs are much more likley to get by the cacher than low sliders or curves.

Now Guy:

Dave: You mention that catchers have more trouble with inside pitches. While that could be the presence of the hitter, it might also be that catchers have more trouble with balls on the glove side of their body. What does this pattern look like for RHP vs. LHB? And with LHPs?

Another great question. In my post I showed just the RHB/RHP image and inferred that since inside pitches were harder because of the batter, but without looking at the other ones it could be for other reasons.

Here is the rate by horizontal location. RHBs are in black, LHBs in gray. RHPs are solid and LHPs dotted.

width_pb.png

First off since the black lines both increase sharply to the left of the graph and gray lines to the right, we have that inside pitches do in fact have the highest passed ball rates regardless of handedness of the pitcher or batter. Outside pitches get by the catcher more often in same-handed at-bats than opposite for some reason. [On the left sided the dotted gray line (LHB/LHP) above the solid gray (LHB/RHP) and on the right side the solid black line (RHB/RHP) is above the dotted black (RHB/LHP) ].

Ok now for the catcher evaluations. I went through each pitch a catcher saw with men on base and based on its location and pitch type gave it a probability that the average catcher lets it by. First off there is considerable variation in expected number of passed balls/wild pitches a given catcher sees. Over the course of the pitchf/x era (part of 2007 and all of 2008 and 2009) Gregg Zaun saw the toughest pitches, with an expected 10.2 getting by him for every 1000 pitches with men on base. On the other hand Jason Varitek saw the easiest pitches. The average catcher would only let 7.1 by per 1000 pitches with men on. So it seems the model does project some variation.

It turns out that both these catchers do a good job. Here are the leaders and laggards in difference between expected and actual WP+PBs in the pitchf/x era. Each one is worth 0.28 runs, so over about two and a half years the best catcher is only about one win over average and the worst only one win below average.

+--------------------+-------+
| Catcher   WP+PB - expected |
+--------------------+-------+
| Zaun, Gregg        | -32.1 |
| Suzuki, Kurt       | -32.1 |
| Ruiz, Carlos       | -30.2 |
| Molina, Yadier     | -26.7 |
| McCann, Brian      | -24.5 |
| Varitek, Jason     | -23.7 |
| Coste, Chris       | -21.1 |
| Quintero, Humberto | -18.1 |
| Barajas, Rod       | -14.8 |
| Torrealba, Yorvit  | -14.7 |
+--------------------+-------+
| Iannetta, Chris    |  11.4 |
| Montero, Miguel    |  11.7 |
| Doumit, Ryan       |  14.4 |
| Snyder, Chris      |  14.7 |
| Burke, Jamie       |  15.4 |
| Navarro, Dioner    |  15.7 |
| Molina, Jose       |  16.5 |
| Shoppach, Kelly    |  16.7 |
| Olivo, Miguel      |  29.9 |
| Molina, Bengie     |  30.2 |
| Posada, Jorge      |  36.4 |
+--------------------+-------+
F/X VisualizationsNovember 20, 2009
A Pitchf/x Look at Passed Balls and Wild Pitches
By Dave Allen

Catcher defense is one of the more enigmatic areas of baseball study. It has developed relatively independently of other position player defensive analysis. This is probably because, although catchers field some ground balls and pop ups, their main defensive contribution is very different from that of all other position players. This contribution is mostly in preventing stolen bases, passed balls and wild pitches.

The difference in ability to do those things, as well as not make fielding and throwing errors, resulted in a range of 13 runs above average (Gerald Laird) to ten runs below(Mike Napoli) in 2009 by devil_fingers' calculation. This is about the same range of catcher performance that Brian Cartwright predicted before the 2009 season. About one extra win picked up best the best defensive catcher, and one run given up by the worst.

These analyses are based on Tangotiger's WOWY method. He calculates each pitchers' rate of PBs and WPs and then predicts how many PBs and WPs a specific catcher should expect to have based on how many PAs he has with each pitcher. The difference between these predictions and the actual amount he gave up is a measure of his ability to prevent PBs and WPs. David Gassko takes a similar approach, but uses pitching staff numbers: strikeouts, earned runs and hits batsmen, which predict PBs and WPs quite well. Then finds the difference between expected, based on these numbers, and actual for each catcher.

With the availability of the pitchf/x data we can take the same idea, but on a per pitch basis. By examining the pitchf/x characteristics of each pitch we can create a model which predicts how often the average catcher lets a pitch pass (as a PB or WP). From there we can predict the number of PBs and WPs that the average catcher gives up if he saw the pitches seen by a given catcher, and then how many more or fewer PBs and WPs that catcher gave up.

One limitation here, which has been discussed before, is we do not know where the pitch was supposed to go. Maybe a catcher called the pitch on the outside and it was on the inside edge, a place most catchers do not give up a PB, but since he was expecting it elsewhere it gets by. In a pitch position based model the catcher would be penalized in such a scenario.

In this post I will briefly summarize some findings concerning pitchf/x and PBs and WPs, and then present a full model and catcher evaluation in a future piece.

The first thing we can look at is the difference between a passed ball and a wild pitch, which is obviously a subjective decision of the scorer. Here I plot the frequency distribution for the distance from the center of the plate for all pitches, passed balls and wild pitches.

dist.png

You can see that passed balls are a little farther from the center of the plate than the average pitch, but that wild pitches are drastically so. Thus scorer are calling pitches far out of the zone wild pitches while those that look more like a normal pitch a passed ball, but there is considerable overlap.

Next we can look at the probably that a pitch gets by the catcher, if it is passed or wild, based on its location. I think this is going to depend on the handedness of the batter and pitcher so here I show the graph for RHB v RHP. The image is from the catcher's perspective so the batter stands at, roughly, -2.

heatmap_passed.png

There is a strong directionality. Inside pitches are more like to get by the catcher than outside. This could have to do with batter being in the way, making inside pitches harder to see, or could be the pitch location versus expectation of location issue I talked about above. Also catchers miss balls in the dirt more often then they miss high pitches.

Finally we can see the wild pitch/passed ball rate on each pitch type. This is the rate of these occurrences per non-contacted pitch of each type.

+-------------+-------+
| Pitch Type  |  Rate |
+-------------+-------+
| Fastball    | 0.24% |
| Changeup    | 0.49% |
| Curve       | 0.60% |
| Slider      | 0.73% |
| Knuckleball | 1.37% |
+-------------+-------+

Again the results here are not very surprising. Fastballs have very small rate, while knuckleballs are off the charts. There is definitely an interaction here between pitch type and pitch location, fastballs are less likely to be far out of the zone than a curve or knuckleball. In addition it would be interesting to see how spin deflection and break of a pitch affect it. I will combine all of these in the next post into a larger model predict passed ball and wild pitch rates and then using that to evaluate catchers.

F/X VisualizationsNovember 06, 2009
The Best Pitch of 2009
By Dave Allen

Everyone loves end of the season superlatives, so I thought I would join the fun and present 2009's best pitch. First let me say that this is a shameless rip-off of John Walsh's original 'Searching for the game's best pitch' when he looked at the 2007 season's best. I use Walsh's metric which values each pitch by the change in run expectancy from before and after the pitch. Walsh came up with the idea and describes it in that article, it is the way I have always done it and is the way FanGraphs values pitches. The caveat is that it is not stripped of the influence of ballpark, defense or luck. Harry Pavlidis has addressed that with his Expected Run Value, but here I am sticking with the original.

Another way to look at and value pitches is as Chris Moore and Jeremy Greenhouse have done here at Baseball Analysts looking at the process rather than the results. They value fastballs by their expected value based on movement, speed and location. But, again, I am going to go 'old fashioned' and just go with change in expectancy.

A second thing to note is that the owner of the best pitch of 2009 was in the news yesterday for something other than having the best pitch in baseball. I didn't notice until reading it over at Shysterball this morning. The timing is purely conicidental, my only hope is that this news will provide the pitcher a degree of solace if he is feeling down about the recent events.

Anyway, the best pitch of 2009 is Tim Lincecum's changeup. By FanGraphs' reckoning it reduced the run expectancy of Giant's opponents by 35 runs, no other single pitch was above 30 runs. On a rate basis (per pitch) it is the best for any pitch thrown over 200 times. I get the similar results (different numbers but Lincecum's change still comes out on top) when I run it with my pitch classifications and run values (FanGraphs goes with the BIS pitch identifications).

With each year since his debut Lincecum has thrown fewer fastballs, thrown them slower and thrown more changeups. It looks like he is really getting more comfortable throwing other pitches and taking a little bit off his fastball. In 2009 he threw the changeup 13% of the time to RHBs and 26% of the time to LHBs. It is nine mph slower than his fastball.

Here are Lincecum's pitches based on their spin deflection (Mike Fast has told me this is a better term for pfx_x and pfx_z than horizontal movement and vertical movement).
pt_move_116.png
This spin deflection is more like that of a splitter than a normal changeup. Most pitcher's changeups 'drop' more than that pitcher's fastball, but also tail in more to same handed batters (have more horizontal spin deflection). I think that is the movement of a circle change. Lincecum, I think, uses a split finger grip for his change and the result is that the horizontal spin deflection is similar to his fastball and the different in deflection is only in the vertical component. It is interesting to note the Josh Kalk found that a splitter performs better after a fastball than a changeup performs after a fastball.

Here is how that looks in regards to where the pitch ends up.
lince_rhb_116.png
lince_lhb_116.png
Just as the spin deflection would suggest the pitch ends up down in the zone or below the zone compared to his fastball.

A big reason for the pitch's success is its whiff rate, that is the percentage of time that a batter misses it when he swings at it. Lincecums' changeup has a whiff rate of 43%, while the average change just 28%. The rate at which batters swing at his change and whiff when they do swing is highly dependent on the height of the pitch. Here are those rates for Lincecum's change (orange) and the average change (gray).
lince_sw_116.png
Lincecum is better at inducing swings on his change and better at getting whiffs, particularly low in and below the strike zone.

In the middle of the season I checked in on Lincecum's chanegup over at FanGraphs and noted that its value was dependent on its speed differential from the preceding fastball and the number of fastballs preceding it in the at-bat. So we cannot say that Lincecum's changeup succeeds in a vacuum; its success is predicated on his fastball. That is one of the limitations of this pitch valuation system. Another is that Lincecum gets credit for the Giants excellent defense, pitching in the NL and pitching half his game in a pitcher's park.

With those caveats in mind, there you have it Tim Lincecum's changeup, 2009's best pitch.

F/X VisualizationsOctober 23, 2009
Angels Send the Series Back to New York
By Dave Allen

Unfortunately Rich's Dodgers are out of it, but they had a solid season and reached the NLCS for a second year in a row. His Angels, on the other hand, won two of three in Anaheim and the series heads back to New York. The series resumes this weekend on Saturday and potentially Sunday, with the Angles needing to win both. Rich was treated to quite the game on Monday from the first row, as he watched the Angels beat the Yankees in extra innings in a bullpen extravaganza. The two teams used fourteen pitchers, the kind of game you get when there are so many extra rest days between games and so much rides on every out. I am sure Rich enjoyed the game thoroughly.

The Yankees dominated Tuesday's Game 4, but the Angels won last night. The seventh inning was key. In the top half, the Yankee bats came alive after being shut down by John Lackey all night. They scored six to take the lead. A.J. Burnett started the bottom of the seventh giving up a hit and walk, and was relieved by Damaso Marte who gave up a sacrifice bunt and then a ground out that scored a run. That left one out, the Yankees up by one and Erick Aybar on third. So the Yankees turned to Phil Hughes, who has been a dominating reliever for them, posting a FIP of 1.83 on the strength of amazing strikeout (11.4 per 9) and walk (2.28 per 9) rates. Unfortunately he was not at his best last night

He walked Torii Hunter, throwing all fastballs and cutters. Since he got behind early he could not go to his very good curve. He then gave up a single to Vladimir Guerrero on a fastball in a four pitch at-bat. Then another single to Kendry Morales also on a fastball in a five pitch at-bat. Finally he struck out Maicer Izturis, throwing three curves in a four pitch at-bat.

Hughes usually has great command on his fastball, but last night because of nerves or just randomly he did not. Here are the fastballs:
fa_loc.png
In gray are all his fastballs over the season, in black his from last night with the two hits in red. You can see how he usually gets a high percentage in the zone, but last night only three out of eight. The single to Guerrero was on a pitch right down the middle of the plate and the pitch to Morales (a switch hitter who was batting lefty) was up-and-away, a good but not great location for a fastball to a LHB.

The Angels capitalized on a rare off-night for Hughes and send the series back to New York.

F/X VisualizationsOctober 06, 2009
Porcello Versus the Twins' Lineup
By Dave Allen

Yesterday I wrote about the Porcello/Baker pitching matchup, another interesting facet of tonight's game is the match up between Rick Porcello and the Twins' lineup. Porcello succeeds by getting lots of ground balls, over 54% per ball in play fifth best in the league. The Twins on the other hand have a high ground ball (3rd highest), high BABIP (7th best) offense. It seems this match up would play into the Twins favor, as their hitters hit lots of grounders and beat them out for singles or on ones through the gaps for extra base hits.

I wanted to see how much this is the case for individual Twins. So here are the career BABIP on grounders and SLG on grounders for some probable Twins starters. I also included the 2009 AL average for these values for comparison. I left out Jose Morales and Matt Tolbert as they had too few grounders. I sorted by SLG on grounders. All these numbers are from Baseball Reference.

                BABIPgb   SLGgb
Carlos Gomez     0.268    0.317
Denard Span      0.275    0.302
Delmon Young     0.260    0.281
Michael Cuddyer  0.252    0.277
Orlando Cabrera  0.240    0.263
Joe Mauer        0.253    0.261
Nick Punto       0.245    0.260
AL AVERAGE       0.240    0.260
Jason Kubel      0.197    0.211

With the exception of Kubel all these hitters have average or better slugging on ground balls. It looks like this may partially neutralize Porcello's main strength.

F/X VisualizationsOctober 05, 2009
Baker-Porcello: A Study in Batted Ball Contrasts
By Dave Allen

Tomorrow's one game playoff between the Tigers and Twins features an interesting pitching match up. Rick Porcello and Scott Baker exist on opposite ends of the fly ball- ground ball spectrum. Porcello who throws a 'sinking' two-seam fastball over 60% of the time and gets grounders on 54% of his balls in play compared to just 29% fly balls. Baker throws a 'rising' four-seam fastball and gets grounders on just 34% of his balls in play to 47% fly balls. That puts Porcello in the top five GB% for starting pitchers and Baker in the bottom five. You can see an explanation for this difference by looking at the frequency distribution of the heights of their fastballs.

height.png


Recall there is a tradeoff in ground ball rate and whiff rate for fastballs based on the height of the pitch. Porcello works down in the zone where he gets grounders, but not many whiffs and consequently has one of the lowest strikeout rates in baseball. While Baker works up in the zone, gives up tons of fly balls (a good number of the desirable infield variety), but has an above average strikeout rate.

So tomorrow's game is not only an exciting one-game playoff of utmost importance to both teams, but a nice demonstration of the strikeout/ground ball trade off based on fastball height.

F/X VisualizationsOctober 02, 2009
Mariano Rivera: Another Appreciation
By Dave Allen

For my last post of the regular season I wanted to examine one of the most singular and interesting players in major league baseball, Mariano Rivera. I know I have written about him before but the amazing Sports Illustrated cover of him inspired me to look deeper into his pitchf/x numbers.

In two months Rivera will turn forty and the average speed on his cutter is down a couple MPH in the last couple years, but his performance is still amazing. Unless something ridiculous happens in the next couple days he will finish up his ninth consecutive year with a FIP under 3, sixth out of the last seven years with an ERA under 2 and 12th of the last 13 years with at least 30 saves.

Rivera, famously, throws a cutter almost exclusively. He mixes in a four-seam fastball about 15% of the time to RHBs, but only 1% to lefties. So against RHBs it is about 85% cutters and against LHBs almost all cutters. As I have mentioned before his cutter has an incredible bimodal horizontal location distribution, which I have seen in no other pitch. Here it is to lefties, about 58% of the pitches inside to LHBs (Rivera's glove-side):

fc_lhb_1.png

Here are his cutters to RHBs, 64% outside (Rivera's glove-side):

fc_rhb_1.png

His fastball is thrown extremely inside to RHBs.

fa_rhb_1.png

Effectively he has two pitches to LHBs (inside and outside cutter) and three to RHBs (inside and outside cutter and an inside four-seam fastball). Throughout this article I classify each pitch as either inside (x<0 to RHBs, x>0 to LHBs) or outside (x>0 to RHBs, x<0 to LHBs). Here is how the five pitches do by run value and some other per-pitch-metrics. FA denotes fastball, FCi inside cutter, rv100 is the run value per 100 pitches with negative good, whiff is the percentage of swings that miss the ball, oswing is the percentage of pitches out of the zone swung at, called is called strikes per pitch, gb% is ground balls per ball in play, iff% infield flies per ball in play and slgcon is slugging on contacted pitches.

          rhb-FA   rhb-FCi  rhb-FCo  lhb-FCi  lhb-FCo  
rv100     -1.3      -0.2     -1.8     -3.6     -2.5    
whiff      0.10      0.25     0.26     0.17     0.21   
oswing     0.43      0.29     0.36     0.50     0.18   
called     0.11      0.21     0.16     0.11     0.36 
gb%        0.63      0.42     0.44     0.55     0.69   
iff%       0.04      0.15     0.06     0.20     0.0    
slgcont    0.333     0.597    0.408    0.273    0.408  

Generally he gets more whiffs against RHBs, but much poorer contact against LHBs. His slugging on contact against lefties with his inside pitch is 0.273, much lower than the average BABIP. Amazing. The result is his remarkable reverse platoon split, evident in the run value numbers. The glove-side version of his cutter is better than the arm-side version, that is inside to lefties (Rivera's glove-side) is better than outside to lefties (Rivera's arm-side) and outside to righties (Rivera's glove-side) is better than inside (Rivera's arm-side) to righties.

Rivera also provides a really interesting place to start to look at pitch sequencing. I think that pitch sequencing is the next big area for pitchf/x analysts to examine. It is something that Joe Sheehan, Josh Kalk, Max Marchi and Jonathan Hale have looked at, but for the most part is understudied.

Rivera offers a relatively simple jumping off point since he has so few pitch types. In this case I am going to look at how the location of last pitch influences the success of the next one. To keep things even simpler I am going to lump together his inside four-seam fastball and inside cutter to RHBs.

Proportion of pitches thrown inside
                          vLHB    vRHB
all                       0.58    0.45
following inside pitch    0.63    0.55
following outside pitch   0.40    0.37

Against both RHBs and LHBs he is more likely to throw inside after an inside pitch and more likely to throw outside after an outside pitch. I am not sure if this is because Rivera knows there are certain batters who have trouble with inside or outside pitching and throws them one or the other more frequently. Or, alternatively, he might be playing a reverse expectation game, after coming inside he thinks the batter expects it outside, so he goes back inside again. I am not sure.

Here is how the location of the last pitch affects the current.

rv100 vLHBs
                           inside    outside
following inside pitch     -3.9      -3.4
following outside pitch    -2.8      -2.3 

rv100 vRHBs
                           inside    outside
following inside pitch     -1.7      -0.4
following outside pitch     1.4      -2.7 

Against LHBs the difference is not statistically significant, but against RHBs it is. In that case an inside pitch does better after an inside pitch and an outside pitch does better after an outside pitch. So Rivera is correct in his sequencing. I am not sure why the pattern shows up only for RHBs. This is just the tip of the iceberg in terms of how pitch sequencing affects success, but is an interesting first step.

Another great season for Rivera, more data for folks like me to see just how he does it.

F/X VisualizationsSeptember 25, 2009
A Last Look at Home Runs
By Dave Allen

In my last post I looked at how the horizontal location of a pitch hit for a HR related to the angle of that HR in play. I thought the result was aesthetically pleasing and did a good job of showing the strongest trend (most HRs are pulled), but I thought that it may have hidden some other underlying structures or patterns.

So in the short post I want to take a last look at the data in a slightly different way. I broke up the plate into 10 bins and angle of HRs in play into 10 bins. Then counted the number of HRs that went from each of the 10 plate bins into each of the 10 angle of balls in play bins. Here is the result for RHBs.
cdm_r.png

Here you can see that most HRs are hit from pitches middle-in and pulled to left field, just as the previous figure showed. What this shows even better, though, is that the majority of opposite field HRs come on pitches away. In fact inside pitches are very rarely hit for opposite field HRs.

The same overall trend is seen for LHBs
cdm_l.png

F/X VisualizationsSeptember 22, 2009
Correction to Last Friday's Post
By Dave Allen

I made a rather large error in my post last Friday about home runs. The error was in the last two figures that showed the relationship between the horizontal location of a pitch and the horizontal angle in play of the resulting HR. This error led to an incorrect conclusion. Here is what the graph should look like for RHBs.
c_rh_all_hr.png
And here for LHBs.
c_lh_all_hr.png
I want to thank Mike Fast who pointed out the problem to me and also apologize to the readership here at Baseball Analysts for this error. I have edited the original post with the correct graphs and text.

F/X VisualizationsSeptember 18, 2009
Home Runs: Where Did You Come From, Where Are You Going?
By Dave Allen

Last week I looked at Carlos Pena's HRs, examining the angle in play based on the horizontal location of the pitch. Today I am going to do so for all batters. First off it is important to understand how pitchers pitch differently to RHBs and LHBs. Here is the frequency of fastballs thrown to RHBs and LHBs by horizontal location. I flipped the horizontal location for lefties so the inside of the plate is on the same side of the graph for both groups.
fast_loc.png
As you can see pitchers throw much further away to lefties than to righties. This is true of both LHPs and RHPs, so it is not an artifact of say opposite handed at-bats tending to be pitched farther away and there being more RHPs. Pitches to RHBs are centered only slightly away of the center of the plate. Strangely the power profile of lefties and righties suggest that pitchers should do the exact opposite.
pow_anglr.png
Although both have more power inside, the difference is more pronounced for RHBs that for LHBs. So that RHBs have slightly more power inside than LHBs inside, while LHBs have much more power away than RHBs away.

So we have a situation in which LHBs seen most of their pitches far away in the zone and have relatively good power there, while RHBs see pitches most of their pitches closer to the center of the plate, maybe shifted slightly outside. But their power is much greater middle-in.
This section is a correction of the original version.
The result for RHBs is that most of their HRs come from the middle of the plate, where they see a lot of pitches and still have good power.
c_rh_all_hr.png
The highest density of HRs are on pitches middle-in and most of those are pulled to left field. Even pitches that are slightly away are generally pulled. It is a little hard to see, but most of the opposite field HRs are on away pitches. That is there are few steep lines going from the bottom left of the graph to the top right of the graph.

Now, recall that lefties see mostly outside pitches, and that they have fairly good power on those pitches. The result is that most of their HRs come from pitches away.
c_lh_all_hr.png
You can see the higher density of HRs middle-away compared to the RHBs higher density middle-in. With that exception the image is largely a mirror image of the RHBs image, with most of the HRs pulled to right field. This graph also shows that my conclusion from last week probably wrong, Carlos Pena is really not that extreme in his HRs. I do think that Pena's HRs did come even more away than most lefties, but this does show that Pena is just an exaggerated version of what most average lefties looks like, not a major outlier.

F/X VisualizationsSeptember 11, 2009
Another Look at Carlos Pena's HRs
By Dave Allen

Before the season I looked at HR rate by pitch location and noted batters who hit HRs in locations most do not. One batter I profiled was Carlos Pena, as he hits HR predominately on pitches away while most batter hit them on middle-in pitches. Part way through the season he was one of the league leaders in HRs so I did a follow up. Now he is out for the season with two broken fingers, but is still leading the AL in HRs. So I thought it I should check on where the pitches he hit his 39 HRs were in the strike zone.

Here are all of his HRs plotted over the grayscale rate for all LHBs.
hr_loc_pt.png
Most of his HRs are the outer half of the plate, with a big number on the outer quarter where most LHBs hit very few. In the middle-in section where most LHBs have the most HRs he has surprisingly few.

I talked in the original post about one problem with this method. I am comparing the rate of HRs hit by the average batter to the actual number for Pena, not his rate. Maybe he has few HRs inside because he gets few pitches there. To get around this below I plot the HR/FB rate for an average lefty and for Pena based the on horizontal location of the pitch.
pow_ang2.png
So it does in fact look like Pena gets much more power on the outside of the plate than the average lefty, and actually less than the average lefty on the inside quarter. In my post in the middle of the season Rich asked how he did this. Most batters have more power on pulled balls and pull more inside pitches. So is Pena's outside power from opposite field power or from an ability to pull outside pitches? To examine this I took inspiration from Max's work looking at relationship the between the horizontal location of a pitch and the horizontal angle of the resulting ball in play. In this case I just looked at Pena's HRs. Remember that -45 is the third base line and 45 is the first base line.

x_ang.png

Pena has hit only a handful of opposite field HRs, all from pitches away (I checked those are all on fastballs). But the bulk of his power is from hitting pitches on the outer half and even quarter of the plate to right field. He routinely pulls HRs on pitches on the outer edge of the plate.

Pena is a great story. He kicked around for years before busting out with the Rays two years ago. We will see if the lead he has in HRs in the AL holds up over the next couple weeks.

F/X VisualizationsSeptember 04, 2009
Break versus Movement
By Dave Allen

As we all know by know the pitchf/x data is an incredible resource for baseball analysts. For each pitch thrown in a major league game we get scads of data, so much so that it is hard to even know where to begin. And once we have begun it is easy to just go with the flow of analyzing what others have analyzed. At the PITCHf/x summit Alan Nathan noted that one piece of information, the break of a pitch, is rarely looked at in pitchf/x studies.

In my posts when I have examined the movement of pitches I have used the word 'break', but done so incorrectly, using it to describe the movement of a pitch. So I thought it was important to make a post clearing up the difference between the two pitchf/x terms and make a preliminary examination of pitch break

MLB's GameDay calls movement: (images and descriptions from MLB Advanced Media here).

pitchfx.jpg
The Pitch-f/x or 'PFX' value is the distance between the location of the actual pitch, and the calculated location of a ball thrown by the pitcher in the same way but with no spin; this is the amount of 'movement' the pitcher applies to the pitch. A faster, straighter pitch like a fastball will have a higher Pitch-F/x value than a slower, breaking ball like a curveball.

As stated this leads to the counterintuitive result that fastballs 'move' more than curveballs. Here is a histogram of the movement of the four main pitch types.
move_pt.png

Here his how GameDay defines the break of a pitch.

break.jpg
Break is the greatest distance between the trajectory of the pitch at any point between the release point and the front of home plate, and the straight line path from the release point and the front of home plate. Curve balls and sliders will have larger break value than fastballs. Pitch trajectories shown in blue indicate breaking pitches.


This leads to a more intuitive result that fastballs break the least and curves the most. Here is a histogram of the 'break' in inches of the four major pitch types.
break_pt.png

In my posts where I have examined the results of a pitch by its movement I have exclusively used the PFX or movement value, which is often broken up into its vertical, pfx_z, and horizontal, pfx_x, correspondents. These are often used to produce the horizontal versus vertical movement graphs that are show the different pitch types of a given pitcher.

Since break is a more intuitive value I wanted to know if it did as well at predicting the results of a pitch as movement. Here I will just look at curveballs, which I assume is the pitch whose outcome is most impacted by its break.

Here is the run value (again negative is good for the pitcher) of a curveball based on its break, on the left, and movement, on the right. The gray indicates the error.

rv_bvm.png


As you can see if you could choose just one piece of information, the break or the movement, of a curve in order to predict its success you would definitely choose movement. The error bars are smaller and non-overlapping. That is if you have a curve with 6 inches of movement it is quite likely to have a different run value than one with, say, 10 inches of movement. On the other hand if you have a pitch with 10 inches of break on average its run value is lower than one with a break of 14, but we are no where near as certain.

It is too bad, the intuitive value is not as good a predictor as the non-intuitive value. Still it is an interesting piece of information, which is currently not often reported or examined.

F/X VisualizationsAugust 28, 2009
The Interaction of Speed and Location on Fastball Success
By Dave Allen

One thing I have been interested in is how pitch location and speed interact. Are there pitch locations where it is especially important for a fastball to be fast (up in the zone) and others where a slow fastball does just as well as a fast one (the outside edge)? We have some assumptions going in, but I wanted to see what the data have to say. I am going to restrict my attention here to four-seam fastballs.

We know about fastball success by speed. Josh Kalk showed the faster the better for fastballs, not too surprising. And Max Marchi gave us the success of a fastball by location. For horizontal location you get a 'W' shaped graph. That is pitches outside the zone and down the middle of the plate result in higher run outcomes (the outer branches and middle of the 'W'), while pitches on the edge of the zone result in lower run outcomes.

To see how these two factors interacted I plotted fastball success by horizontal location for three groups of four-seam fastballs: all fastballs, those over 95 mph and those under 87.5 mph. The result below is just for those pitched to RHBs, so the inside is negative numbers and outside is positive numbers. The error bars are the shaded bands. The run value is the change in run expectancy so negative is better for the pitcher.

rv_x.png

Outside of the zone there is no difference between the three groups. So a batter's ability to lay off a fastball inside or outside the zone is, seemingly, unaffected by the pitch speed.

The difference is pitches over the plate. With the largest difference in the middle of the plate. The slower the pitch the more pronounced the 'W', so the more penalty for hitting the fat of the plate. Pitches on the edges of the zone are fairly close, slow and average fastballs do almost as well as fast ones.

Let's look at the same pattern for vertical location. I normalized the zone so that each batter had the average top and bottom of the zone, which are indicated. I also flipped the graph so that the dependent variable (pitch height) is along the vertical axis.

rv_y.png

Here pitch speed can cover up an inability to hit the zone, but just above the strike zone. Fast fastballs above the zone do much better than slow or average fastballs. This difference between fast and average is maintained through the top third of the zone, and between fast and slow through all but the bottom fifth of the zone. For fastballs low in the zone there is no difference based on pitch speed.

Generally we do see some interesting interactions of fastball speed and location on fastball success. A faster fastball will not save someone who cannot get the ball in the zone, but fastball speed gives a pitcher a lot of leeway to hit the fat part of the plate and pitch up in the zone.

F/X VisualizationsAugust 21, 2009
Do Batters Swing Too Often in a Full Count?
By Dave Allen

A while ago iamawesomer wrote an interesting piece about the game theory of swinging at 3-2 pitches, and MGL often talks about how he thinks batters swing too often in a full count. The idea intrigued me and I wanted to examine it.

First off a little background, batters tend to swing more as they get more strikes. This makes sense, with no strikes they can be selective and wait for their pitch. But with two strikes letting a strike go by ends the at bat. Similarly batters tend to swing less when they have three balls compared to fewer. Again this is a good strategy. The benefit of going from 3 to 4 balls is more than going from 0 to 1 balls. So taking a pitch, that could be a ball or a strike, is better with three balls than with fewer.

It seems like this trend of breaks down when the count is full. Consider the two counts 2-2 and 3-2. In both counts the penalty for taking a strike is the same--a strikeout--but the benefit from taking a ball is greater at 3-2. Taking a ball at 3-2 results in a walk, while taking a ball at 2-2 just brings the count full. If a pitch is right on the border of a strike/ball a batter has more incentive to take that pitch at 3-2 than 2-2. But that is not what they do. Batters swing at more pitches at 3-2, the trend is true for pitches in the zone and pitches out of the zone. Also if you look at pitches in a given location batters swing at that pitch more often at 3-2 than 2-2. So batters are either swinging too often at 3-2 or too rarely at 2-2 or both. For this post I am going to look at the full count.

I am going to restrict my attention to RHB/RHP. I think the results would be similar in other cases, but I have not checked. Here is the swing rate by pitch location at 3-2.

swing.png

In other at-bats batters swing at pitches inside more often than outside, but this preference breaks down when the count is full. Overall this is a huge area over which batters swing.

I took the run value by location of 3-2 pitches swung at (swinging strikeouts, fouled off and balls in play) and subtracted the run value of a 3-2 pitch taken (walks and called strikeouts). That value I plotted in colors with red negative (penalty for swinging) and blue positive (better to swing). On top I plotted the 50%, 75% and 90% swing contours.

sw_no_rv.png

The white is the break even. The average batters, if he knew the exact locatoin a pitch would end up and preformed optimally, would swing at pitches inside that white band and take outside.

In the blue region batters swing over 75% and for most of it over 90% of the time. So batters do a good job of swinging at pitches they need to. In the red region just outside the break even batters swing between 75 and 50% of the time. So they swing at a large number of pitches they should take, they do not do a good job of taking pitches they should take.

Generally a batter would want to swing inside the blue and always take inside the red. It is not possible to do this perfectly, the batter does not know where the ball will end up when he swings. Most likely if he tried to be more selective and take more balls (those in the red area), then he would also end up taking some additional strikes (those in the blue). Right now it looks like batters are too swing-happy, they should be more selective, and give up some called third strikes in exchange for more walks.

F/X VisualizationsAugust 14, 2009
One of the Game's Stranger Hitters
By Dave Allen

One of the things that I, and I assume most of us, love about baseball are its peculiarities and oddities. The historical oddities, like when was the last time a pitcher gave up two triples in the first inning of his first major league start. Strange park dimensions like the Green Monster. And players who succeed in atypical manners. One such player is Pablo Sandoval.

He seemingly takes a horrible approach at the plate, swinging at tons of pitches out of the zone, but he is a very productive hitter. He is not particularly fast or hit that many line drives, but he has sustained a high BABIP over his major league and minor league career. He has two great nicknames. In this post I want to highlight what makes Sandoval such a stranger hitter.

The most remarkable fact is that he swings at almost 45% of pitches out of the strike zone, second only by his teammate Bengie Molina. I wanted to show just how extreme this is. So below I have his 50% swing contour compared to the average hitter. What I mean by this is a plotted all the pitches he swung at and took. Then I had the computer draw a smooth line so that pitches inside the line are more likely to be swung at and those outside are more likely to be taken. I discuss the methodology more specifically in the comments section of this post. Sandoval is a switch hitter so I broke it up for his at-bats as a lefty and righty. Sandoval is in orange and the average hitter is in gray.
swing_sand.png
There is a drastic difference between his and the average. Remember that the images are the catcher's perspective so as a RHB he stands to the left of the zone and as a LHB to the right. So it seems he is particularly fond of the low and inside pitches. The only place where he is close to league average is away when batting as a righty, he lays off those pitches. But everywhere else he swing zone is much larger than the average batter. It looks like he swings at more pitches batting lefty than righty.

He can get away with this because, somehow, he can make contact while swinging at these pitches far out of the zone. He makes contact on out of zone swings 76% of the time, solidly above league average of 62% for out of zone swings. And not only can he just make contact he makes good contact out of the zone. Check out the location of his extra base hits.
xbh_pitches_loc.png
If you compare that to my HR heat-chart and the locations of other specific hitters HRs you will see this is a very strange pattern. He is hitting lots of HRs out of the zone, below the zone, above it and in from it. Batting lefty he has a large number of doubles off pitches off the zone away, opposite field doubles I would guess. Sandoval is leading the league in out-of-zone HRs and out-of-zone extra base hits. Not surprising, I guess since he swings at some many pitches out of the zone. It all shows that he can swing at pitches way out of the zone regularly, and not only make contact, but make very solid contact.

A batter's job is to score runs, to do that you need some combination of hitting for power and not making outs. Sandoval goes about that in one of the stranger ways possible. He hits for power even when swinging at pitches way out of the zone. He can avoid outs because he rarely strikes out, he has good contact skills even when swinging at pitches way out of the zone, and it seems he can sustain a high BABIP. All of this in some one who just celebrated his 23rd birthday. San Francisco fans, and baseball fans, have lots more of Sandoval's strange ways to enjoy.

F/X VisualizationsAugust 07, 2009
Regression and Pineiro
By Dave Allen

Recently there has been some discussion about estimating a player’s true talent level. The idea is that a player's true talent, and how we should expect him to perform going forward, is not the player’s current level of production. Rather it is a weighted average of his current year and past production (with more recent production weighted more heavily) and then this average is regressed to league average, with the amount of regression depending on how many plate appearances (for batters) or batters faced (for pitchers) or inning played (for fielders). The details of how to do the weighting and to which population’s mean you regress are important and discussed at the Book Blog and THT.

I wanted to look at an example of a player whose current year production is far out of line from a long career of established production. Joel Pineiro leads all starters in ground ball rate, at 60.4% ground balls per ball in play. Since 2002 his GB rate ranged between 44% and 48%. In addition, Pineiro leads all starters league in BB per batter faced at 2.6%, again far out of his previous range of 5.4% to 8.5%. This is a rather huge shift in his numbers.

Here are his five pitches.

pitches.png

The movement on these pitches is fairly standard. It is important to note his two-seam fastball ‘sinks’ compared to his four-seam fastball. Here is the breakdown of his pitch usage over the past three years, those covered by PITCHf/x.

+--------------------+------+------+------+
|                    | 2007 | 2008 | 2009 |
+--------------------+------+------+------+
| Four-Seam Fastball | 0.54 | 0.36 | 0.11 |
| Two-Seam Fastball  | 0.03 | 0.23 | 0.60 |
| Slider             | 0.16 | 0.20 | 0.11 |     
| Curve              | 0.16 | 0.09 | 0.09 |     
| Changeup           | 0.11 | 0.12 | 0.09 |     
+--------------------+------+------+------+

His two-seam fastball is hit on ground just under 68% of the time it is put in play, so his increased usage of that explains the jump in grounders. He gets his fastballs in the zone about 54% of the time while his breaking and off speed pitches are in the zone under 50% of the time (47% for his change, 42% for his curve and 49% for his slider). Finally batters swing at and make more contact with his fastballs than his off-speed and breaking pitches. As a result he has many fewer walks and strikeouts (he has struck out just 10% of batters the lowest rate in his career).

I think this is an interesting example in which the PITCHf/x data partially explains a recent abrupt change in numbers. Obviously we do not expect Pineiro to continue to walk under 3% of batters faced and get over 60% of his balls in play on the ground. An estimate of true talent and expectation going forward must include some weighting of past performance and regression to the mean. But I think the PITCHf/x data, just like scouting data, can be used to adjust the weighting, maybe weight this year even more heavily if we expect him to use this pitch break down going forward, or regress to different pool, one with this breakdown of pitches, to get a better estimate of his true talent going forward.

F/X VisualizationsAugust 06, 2009
Strikeouts and Ground Balls
By Dave Allen

The main tenet of defense independent pitching theory is that pitchers can only control strikeouts, walks and the types of batted balls (grounders, fly balls, line drives, pop ups) they give up. Under such a theory the best pitchers are those who give up few walks, line drives (likely to be hits), and fly balls (likely to be HRs), while getting lost of strikeouts, pop ups (almost always outs) and ground balls (rarely extra base hits). In this short post I want to consider the relationship between strikeouts and ground balls. The holy grail of pitchers is the one who can get tons of strikeouts and ground balls, while giving up few walks. Why is this combination so rare?

In black below is the relationship between whiffs (misses per swings) and the vertical location of a four-seam fastball. Also on the graph in blue is the relationship between ground ball per ball in play and vertical location. The graph is a little hard to understand because vertical location is the independent variable so it is along the horizontal axis, and there are two dependent variables displayed at the same time. The red lines indicate the average top and bottom of the strike zone.

whiff_ground_byy.png


The overwhelming trend within the strike zone is for whiff rate to increase with vertical location and for ground ball rate to decrease with vertical location. This is why it is rare to find an extreme ground ball pitcher who also gets a lot of strikeouts. The one exception here is the bottom of half foot of the strike zone where ground ball rate is very high and whiff rate has bottomed out and starts to rise again. If a pitcher could regularity locate in that bottom half foot, he could get whiffs and grounders, but as I noted last Friday it is important to consider just how accurately a pitcher can locate his pitches. Most likely few pitchers could regularity hit that spot.

F/X VisualizationsJuly 31, 2009
Measuring a Pitcher's Ability to Locate a Pitch
By Dave Allen

In many of my past posts I have displayed heat maps showing how a specific value, HR rate, run value, BABIP, varies over pitch location. One thing I mentioned in passing in the BABIP post, but probably should have been mentioning all along is that just because a location is the best to pitch to does not mean a pitcher should attempt to throw it there. We must think about a pitcher's ability to locate and what happens if he misses his spot. MGL put it best in asking this question, in this post at the Book Blog:

Let’s say that pitch f/x data tells us the following about a particular pitcher or group of pitchers:

On the average, the run value of a high inside fastball is -.001 where minus is good for the pitcher. The run value of a low outside fastball is +.001. In other words, the run value of the former is better than the run value of the latter.

Now, put all pitch sequence and game theory stuff aside.

In an average situation against an average batter, where those run values above absolutely apply, which pitch should a pitcher attempt to throw, and why? We are just talking about one pitch, and again, put aside anything to do with pitch sequences and game theory.

Zach Sanders provided the answer.

Low and away.

Your phrasing: “which pitch should a pitcher attempt to throw, and why?” The key word is attempt. If you make a mistake down and away, you probably won’t get burned as much as if you make a mistake going up.

If he has perfect control, then by all means take the one which the better value, but there is human error involved.

And MGL's further explanation.

You CANNOT use the run values of pitch locations based on hit f/x data to make any decisions about what pitches to throw unless you consider what happens when you miss your exact location (and the distribution of those misses, location-wise), which will happen some non-trivial percentage of the time.

I was thinking about the pitch f/x article or two a while back that told us exactly what I told you - that the high inside fastball was a very effective pitch. What the data and article did NOT tell you was the run value of a pitch that was ATTEMPTED to be thrown high and inside. ...

In general the reason why pitchers do NOT throw high and/or inside that much in this day and age is not because they are not man enough anymore as some broadcasters would have you believe, but it is not necessarily because a high inside fastball is a bad pitch (if it hits that location). It is because a miss on that pitch will more often result in a HR (or extra base hit) or a hit batter. As well, batters will take a difficult to hit high and inside pitch more often now than they would in the old days when the strike zone was higher than it is now.


Here is a visual representation of what he is talking about. Below is the run value of a pitch from a right handed pitcher to a left handed batter.

example1.png

Suppose location B, up and in, has a slightly better for the pitcher run value than location A. So if a pitcher could hit location B exactly that would be the best place to pitch. But if in throwing to B some fraction of the time he misses and the pitch will end up in less favorable place than if he misses pitching to location A. Depending how often he hits his spot, and by far how off he misses he might be better off pitching to spot with a worse run value.

Ultimately what we would want to know is for a particular pitcher, pitch type and pitch location the probability density function of where the pitch will end up. This combined with the run value map would give us an expectation of the run value if that pitcher attempts to throw to a given location.

We do not have that information now, and we will probably never have anything that specific. But, if we knew the location of the catcher's mitt we would have some indication of where a pitch was intended. This was brought up at both pitchf/x summits and Marv White of Sportvision said that is it possible given the current technology, but not at the top of their list of things to do. There is some discussion over at the Book Blog about how hard it would be to collect this data and how much information it would give us. Either way I add my vote to that of other analysts interested in that data.

Without that though, I wanted to see if I could estimate how close a pitcher comes to hitting his spots. Again, without knowing where each pitch was intended to go this is impossible, but I think we can get an estimate for at least one pitcher. Again I turn to Mariano Rivera. Check out the location of his pitches to LHBs.

riv_1.png

The vertical location varies quite a bit, but there are two clear horizontal areas he pitches to. If we assume that he intends to throw all of his pitches to just either inside the right edge of the zone or just inside the left edge of the zone we can then see how close he is, along the horizontal axis, to hitting his spot.

I do think he probably varies the intended horizontal location by count. Probably intending to pitch closer to the zone when he has three balls, and pitching even farther on the edge when he is ahead in the count. So I am goign to restrict my attention to pitches from 0-0, 1-0, 0-1 and 1-1 counts.

Since the horizontal location varies by vertical location I am going to look at the deviation from the black lines below.

riv_2.png

Here is a histrogram of the deviations from these black lines.

hist.png

Over 75% of his pitches are within half a foot to either side of the target along the horizontal axis. In other words 75% of the time he can get his pitch within a 1-foot horizontal strip. Over 50% of his pitches are within 1/3 of a foot to either side of his target along the horizontal axis. So half the time he gets it in a 8-in horizontal strip.

This all assumes that you believe that he is always throwing at one of two targets. If you think he aims at a range of horizontal locations, then the variation I have measured is partially from those range of locations and partially from his ability to locate. In that case I am ascribing some variation in intended location to his ability to locate, so I think you can these numbers as the least accurate he could possibly be. They, also, says nothing about how far he is from his intended target along the vertical axis, because I have no way of knowing his intended vertical target.

I think of this as a first attempt at measuring how close a pitcher is to hitting his intended location. Catcher mitt location data will get us closer to measuring it, but it is probably something we will never be able to fully measure.

F/X VisualizationsJuly 17, 2009
Can Pitchers Control Their BABIP by Controlling Pitch Location?
By Dave Allen

At the PITCHf/x summit I gave a presentation about making the type of contour and heat maps that I often show here. In the presenatation I listed some of the things one could do with such maps and I said 'for example you can see how BABIP varies by pitch location.' A questioner at the end of the talk asked if I had done so. He thought if BABIP did in fact vary by pitch location, and pitchers can control the locatoin of their pitches then pitchers could control their BABIP. I, at that point, had to fess up that it was just an example and I had not in fact looked at it. Unfortunately I don't know the name of the person who asked the question, but here it is.

There is a long history of examination of how much control a pitcher has of his BABIP (batting average of balls in play). The first major work was by Voros McCracken who, in 2001, suggested that pitchers do not have the ability to prevent hits on balls in play. In 2003, Tom Tippett found that some pitchers, in particular knuckleballers, had the ability to suppress hits on balls in play throughout their career. In addition, the BABIP of a ground ball is higher than that of a fly ball and we know pitchers do control their ground ball rate. So, we should expect BABIP differences between ground ball and fly ball pitchers. The general understanding, at this point, is that pitcher's have some, but probably a very small, amount of contrl over their BABIP beyond their control over batted ball type.

Obviously pitcher's control the location of their pitches, so if BABIP varies by pitch location could this be how some pitcher's have the ability to depress their BABIP? Let's see how BABIP varies by location. Here I am just looking at RHB.

babip_xy.png

There is some trend for pitches down in the zone to have a higher BABIP. I am sure this is driven by the fact that high-BABIP ground balls are more likely on hits low in the zone while low-BABIP fly balls are more likely up in the zone.

EDIT: In my initial post I had the outside/inside orientation flipped in my interpretation. Below I have corrected that. I would like to thank Mike Fast for bringing this to my attention and apologize for any confusion this might have caused. As always the images are from the catcher's perspective.

Along the horizontal axis pitches in the middle of the plate have the highest BABIP, which is not surprising. Beyond that, though, on pitches low in the zone those inside have a higher BABIP than those away, and pitches up in the zone those away have a higher BABIP than those inside. For those down in the zone, which will most likely be ground balls, those inside pitches will be pulled and pulled ground balls to the left side of the infield are more likely to be hits. On pitches up and in are most likely to be home runs, which are not counted as balls in play. This might be partially responsible for the drop in BABIP up there. Also maybe these pitches 'tie up' the hitter causing popups which have a near zero BABIP.

I wanted to examine the horizontal gradient further, so I took a one-foot-high band of pitches centered at y = 2.5. My hope is to see how much the BABIP changes by horizontal location to see if it is reasonable for a pitcher to depress his BABIP based on the location of his pitches. Again this is just for RHB.

babip_x.png

So there is definitely a trend. The farther inside a pitch is hit the lower the BABIP. But look at the error bars the BABIP is effectively unchanged from x = 2 to x = -0.5. A pitch really has to be on the inside fourth of the plate before there is a significant drop in BABIP. From there to outside the zone away there is a big drop in BABIP.

It looks to me for a pitcher to seriously decrease his BABIP based on the horizontal loation of his pitches he either needs to induce swings (and contact) inside of the zone or be able to locate on the inner fourth of the plate.

If a pitcher could regularity locate pitches in the string zone, but just on the inner edge he could drastically lower his BABIP. I am not sure there are a lot of pitches with the control to pitch with the speed and movement required to get out major league hitters AND locate the ball that finely. If they miss too much to one side it is a ball, too much to the other it hits the heart of the plate. The one pitcher, off the top of my head, who I think might be able to do this is Mariano Rivera. Check out the location of his cutters to RHBs.


riv_to_rhb.png


While most of the pitches are on the outer half, he locates a good number on the inner quarter. Exactly the type of pitches that are in the zone AND can depress BABIP.

F/X VisualizationsJuly 09, 2009
Felix Hernandez's Power Change
By Dave Allen

A while ago I looked at the success of a changeup based on its speed separation from the preceding fastball. Since then I had the pleasure of answering some of Dave Cameron's questions on the Mariners. He asked me about Felix Hernandez's changeup, which keeps getting faster.

change_sp.png

At the same time his fastball has actually slowed slightly, so that the separation between the two pitches has gotten smaller. The difference averaged 9mph in 2007 and is down to 5mph this year. My work suggests that on a pitch by-pitch basis a separation between 5 mph and 10 mph is optimal, while others showed that on an overall average basis the bigger separation the better. Either of these results would suggest that Hernandez's changeup should be getting worse every year. But that is not he case. Remember that the run value is the change in run expectancy so negative is good for Hernandez.

+-------+------------------+-----------------+
| Year  | Changeup Run Val |  Aver CH/FB Dif |
+-------+------------------+-----------------+
| 2007  |           -0.017 |         9.0 mph |
| 2008  |           -0.022 |         7.1 mph |
| 2009  |           -0.032 |         5.2 mph |
+-------+------------------+-----------------+

At this point Hernandez's changeup is amazing, one of the top few in the game. It is interesting that his success runs counter to the prevailing trend. To examine it further I plotted the run value of his changeup based on its speed.

felix_sp.png

Overall felix's changeup gets better with increasing speed, which is very unlike the average player's changeup. As most pitcher's changeups get faster they start looking just like slow fastballs and get crushed, but since Hernandez throws such a fast changeup he can succeed throwing as his changeup fast as some pitchers throw their fastballs. Next I wanted to check out the success of his changeup based on how much slower it was than the preceding fastball.

felix_dif.png

Where as for the average pitcher there is a plateau in which the changeup is equally successful between 5 and 10 mph slower than the preceding fastball for Hernandez success peaks at 5 mph and falls off rapidly if it gets any slower. This again shows the Hernandez is succeeding with a fast changeup.

There are important limitations to studies that show trends for all pitchers averaged together; all pitchers are different. In this case Felix Hernandez succeeds with a power change that has little separation from his fastball. That same separation for the average pitcher, with a slower fastball, would be big trouble.

F/X VisualizationsJuly 03, 2009
Angle of Ball in Play by Pitch Type and Speed
By Dave Allen

Last week I looked at the horizontal angle of a ball in play as a function of the location in the zone where it was hit. Although there is some trend for lower pitches to be pulled more, most of the trend is dictated by the horizontal location of the pitch. As expected inside pitches tend to be hit to the pull field and outside pitches more to the opposite field.

Below I reproduce the trend for just the horizontal location. I found the average angle of a ball in play as a function of the horizontal location of the pitch. The center of the strike zone is 0 and negative numbers indicate pitches that are inside to right hand batters and positive numbers outside. The strike zone extends from -1 (inside edge to a RHB ) to 1 (outside edge to a RHB). The angle of a ball in play follows the -45/0/45 convention (-45 is the third base line, 0 2nd base and 45 the first base line), so negative numbers indicate the pull field for a righty.

all.png

Starting away and moving towards the batter more and more balls are pulled, with the trend slowing and stopping at about the inside edge of the plate. Here you can see the overall pull tendency. At x=0, the middle of the plate, the average ball is hit to about 7.5° to the pull field and at x=1, the outside edge of the plate, the average ball is hit right up the middle.

I was interested in how this varied by pitch type. I expected that slower pitches would be pulled more, as hitter have more time to 'get around' on such pitches.

fig1.png

The results confirm our expectations. The slower a pitch type the more it is pulled, so that through much of the strike zone the average curveball or changeup is pulled 10° more than the average fastball in the same horizontal location. This shows part of the danger of coming inside with breaking and off-speed pitches. These pitches, if they are hit, will tend to be pulled heavily, which is where most hitters have the greatest power.

I also wanted to see how much speed affected pull, regardless of pitch type. Here I plot the average angle of a ball in play by pitch speed for three horizontal locations, away (but in the zone), down the middle and inside (but in the zone).

sp_pull.png

The effect of pitch speed is strong, nonlinear and interacts with location. So for inside pitches there is not much effect of speed, the pull rate of a very slow and very fast pitch are not that far off. Similarly there is not a lot of difference in the pull rate of very slow pitches across location, they are all pulled heavily. But outside pitches are strongly affected by pitch speed, with slow ones being pulled and fast ones going to the opposite field. And very fast pitches are strongly influenced by location, with inside ones being pulled and outside ones going to the opposite field.

The results here are not that surprising, but nicely confirm long-held baseball expectations.

F/X VisualizationsJune 26, 2009
Do the Red Sox Get More Hits than Visitors Off the Green Monster?
By Dave Allen

Two months ago when Sky was looking at predicting home field advantage based on ballpark qualities he determined that a 'quirky' ballpark generally had a larger home field advantage than a non-quirky one. I thought that was a very interesting result and wanted to try to see it for a specific example. Obvisouly the most famous quirky feature in any ball park is the Green Monster at Fenway Park.

Maybe Red Sox hitters are better able to take advantage of the Green Monster, thereby giving Fenway a larger home field advantage because of its quirky dimensions. Like much of my work this is heavily indebted to earlier work on the subject by John Walsh. In early 2007 he looked at which Red Sox hitters take the most advantage of the Green Monster and also which non-Red Sox would benefit most by hitting at Fenway.

Percent of balls in air towards Green Monster

If Red Sox do take advantage of the green monster than you would expect them to hit more balls in its direction, with RHBs trying to pull more balls and LHBs trying to go the other away with more when they are home than when they are away.

Here is the frequency distribution of the angle of fly balls and line dives to the outfield by RHBs . The plot on the left is for Red Sox hitters when home and away, and on the right for all visitors at Fenway and all non-Red Sox teams when they are at home. I use the same -45 (3B line), 0 (2nd base), 45 (1st base) orientation for my last post. The Green Monster is indicated in green.

r_bia.png

It looks like visitors at Fenway change their approach much more than Red Sox hitters. Red Sox hitters' home and away spray patterns are virtually indistinguishable, but for visitors the spray pattern is shifted a degree or two toward third base when batting at Fenway. I assume this is caused by these hitters trying to pull the ball more, but it could also be a result of Red Sox pitching (maybe they pound the inside of the zone more than the average pitcher).

Here is the same figure for left handed batters.

l_bia.png

Both Red Sox and visiting lefties hit slightly more balls in play down the left field line at Fenway than elsewhere. Going along with that is a slight drop in the number of pulled balls in play at Fenway for both groups. The effect is subtle, but it looks like lefties might make some effort to go the other way more often at Fenway.

Here is an overview.

Proportion of outfield fly balls and line drives in direction of the Monster
+---------------------+--------+--------+
|                     |    RHB |    LHB |
+---------------------+--------+--------+
| Red Sox at Fenway   |  0.503 |  0.266 |
| Red Sox Away        |  0.511 |  0.257 |
| Visitors at Fenway  |  0.507 |  0.287 |
| Non-Red Sox at Home |  0.477 |  0.278 |
+---------------------+--------+--------+

Here you can see that Red Sox righties actually hit fewer balls in play in the direction of the Monster at Fenway than on the road. That is very surprising. Visiting RHBs see a big jump in their balls in play to that direction. For both Red Sox and visiting lefties there is a small increase in balls in play to that direction at Fenway compared to elsewhere.

Percent that actually hit monster

Ok so visiting hitters are hitting more balls in play towards the Green Monster, but are they actually getting more hits off it? I used the same technique as John Walsh and classified a ball in play as one off the monster if it was a fly ball or line drive that was a hit and fielded within 25 feet of the Monster (I am using the gameday batted ball locations with Peter Jensen's translation factors). When John went back and checked this he found that about 60% of the 'hits off the monster' were really that, so these numbers will be over estimates. But I don't think they will systematically over or under estimate Red Sox compared visitors.

Using such a definition here are the percentage of batted balls that I classified as 'hits off the monster.'

Proportion of balls in play that are hits fielded within 25 ft of the Monster
+---------------------+--------+--------+--------+
|                     |    RHB |    LHB |    All |
+---------------------+--------+--------+--------+
| Red Sox at Fenway   |  0.054 |  0.037 |  0.046 |  
| Visitors at Fenway  |  0.060 |  0.041 |  0.052 |
+---------------------+--------+--------+--------+

These numbers seem very high, so I am sure that I am overestimating the number of Monster hits by quite a bit. Still it seems that visitors, both lefties and righties, get more hits off the Green Monster than Red Sox hitters. This seems very counter intuitive. If these hits would have been outs elsewhere the Green Monster is giving visitors an advantage. On the other hand if visitors are changing their approach at the plate to get more hits off the Monster maybe their contact to other areas is weaker.

Home Runs Over the Monster

The other thing the Green Monster offers is a short, but high, porch to hit HRs over. If Red Sox hitters can adapt their swings to hit more HRs over it, that could be where the advantage shows up. Here is the HR rate per ball in the air by angle, just in Fenway.

hr_ang.png

Now here is a big advantage to Red Sox hitters. Over the length of the Green Monster Red Sox righties have a big HR/BIA advantage over visitors. In the rest of the field, expect for just along the right-foul line, there is little difference in HR-rate. Does it look to you like Red Sox righties tailor their swings to getting HRs over the Green Monster?

The next step would be to put it all together. How much do the additional HRs by Red Sox hitters weigh against the additional hits off the Monster by visitors? Could we calculate the value of the Monster to the Red Sox in such a calculation? Maybe another day.

F/X VisualizationsJune 19, 2009
How Strong is the Tendency to Pull the Ball?
By Dave Allen

Last week I took my first look at the HITf/x data examining how the location of a pitch influences the speed of the ball off the bat and vertical angle of a resulting hit. In this post I am going to do the same for the horizontal (or spray) angle of the resulting hit. This is the angle of a batted ball into the field. Sportsvision reports this angle with 45° corresponding to the 1st base line, 90° straight up the middle (2nd base and center field) and 135° the 3rd baseline. Based on the discussion here it seems others find a -45/0/45 orientation more intuitive. So here I shifted to that orientation so 45° is the first base line, 0° straight up the middle and -45° the third baseline.

Max Marchi already looked at this topic using the GameDay hit location to determine the horizontal angle of the ball in play. He examined the tendency of hitters to pull inside pitches and go the other way with outside pitches. He also looked at the possibility of defensive realignment based on a given hitter's spray chart. Here I am going to look at the first topic and ignore the second which led to an, at times, heated discussion over at the Inside the Book blog.

In Max's work he looked at how much individual hitters pulled the ball based on the pitch location. Here I am going to average over all hitters to find a baseline. Below I show the horizontal angle of a batted ball based on the location of the pitch. Remember that negative angles correspond to to the left side of the field and positive to the right. In this case I chose a red-to-blue color scheme to high-light the difference between pulled and opposite field balls in play. I also flip the colors between RHBs and LHBs so that red is always pulled and blue opposite field. Like always the images are from the catcher's perspective.

Horizontal angle by pitch location

pull_ang.png

As expected inside pitches result in the furthest pulled balls and it is not until you get to the outside edge of the strike zone that the average ball in play is to the opposite field. So batters have a tendency to pull the ball, with a pitch down the middle on average being hit to about 5° to the pull side. In addition there is a slight trend for pitches low in the zone to be pulled more. It looks like RHBs pull the ball more than LHBs.

Horizontal angle by pitch location for ground balls versus balls in air

I was also interested in how strongly ground balls are pulled compared to balls in the air (fly balls, pop ups and line drives). Conventional wisdom is that ground balls are pulled more, as evidenced by the infield shifts that hitters like David Ortiz experience. In addition, Matt Lentzner set up a simple bat-ball collision model that predicted most ground balls go to the pull side and more balls in the air to the opposite field side.

So we have conventional wisdom and theory telling us what to expect, let's see what the data say. I redid the above analysis first with ground balls and then balls in the air. Instead of using the GameDay classification for GB versus LD or FB, I used the HITf/x vertical angle. Based on Harry Pavlidis' work here it looks like 7° is a rough cutoff between a ground ball and a ball in the air. So that is how I separated the batted balls.

gb_bia_ang.jpg

Just as expected ground balls go to the pull side much more often than balls in the air. For about the inside two thirds of the plate the average ground balls goes at least 10° to the pull side. Again RHBs show a stronger tendency to hit to the pull field. This could be because infield hits are more likely to the left side of the infield than to the right, so RHBs have an incentive to pull ground balls while LHBs have an incentive to go the other way with ground balls.

Fly balls, pop ups, and line drives have a much smaller tendency to be pulled and again it is weaker in LHBs. In fact there is almost no pull trend for LHBs on balls in the air; they tend to pull inside pitches and go the other way with outside ones.

Speed of ball off bat by horizontal angle

Finally I was interested in how much additional power a pulled ball has than one hit the other way. Mike Fast showed that pulled balls are more likely to be home runes, more likely to be line drives and have higher BABIP than opposite field balls in play. In fact, Mike showed, a pulled fly ball is ten times more likely to a home rune than an opposite field fly ball. I wanted to see the difference in speed off the bat responsible for this huge effect. Here is the horizontal speed of the ball off the bat by horizontal angle for LHBs and RHBs.

spray_mph.png

Batted ball horizontal speed reaches a maximum roughly between 5 and 25 degrees to the pull direction. Pulled balls are roughly 10 to 20 mph faster than those hit in the same angle to the opposite field.

Of course all of this analysis averages over all hitters. We know there are hitters who are assumed to be 'dead-pull' hitters or those with power to all fields. The data are now there, in a small sample with more coming, to examine these classifications. Do such hitting syndromes exist? How consistent are they for an individual hitter year to year? How does it impact a hitter's performance? It will be very interesting when enough HITf/x data become available to look at individual hitters at this level.

F/X VisualizationsJune 11, 2009
Bat Meets Ball: Checking in on the HitF/X data
By Dave Allen

To begin with I want to say great work to all my colleagues here on their draft coverage. The interviews they all posted were first rate, Marc's coverage has been exhaustive and Marc and Rich's liveblog was a perfect way for me to follow along with the first round. So great work team.

The draft was probably the most exciting baseball event of the past week, but a not too distant second, for some of us, was the release of the first batch of hitf/x data. This is the analogous data for batted balls that pitfchf/x gave us for pitches. Like pitchf/x it is captured by two high speed cameras at each stadium. Based on pictures of the ball just as it is struck by the bat and fractions of a second afterwards the batted ball's initial speed and trajectory are estimated. For a technical discussion about how this is done and the accuracy of the method check out this post at Tango's and MGL's Inside the Book blog.

This first release of hitf/x data covers all batted balls from this past April and gives the speed of the ball just it leaves the bat and its vertical angle (or launch angle) and horizontal angle (or spray angle). Analysis of this week-and-a-half old data has already poured in. Ryan Howard crushes the ball. The optimal vertical angle to hit the ball at is around 11 degrees (with 0 degrees being parallel to the ground). David Ortiz is in trouble, balls came off his bat at the same speed as balls of the bat's of Alexi Casilla and Endy Chavez.

It has been a little while since I have had a really nice heat-map heavy visualization post and I thought this data would be a great opportunity to rectify the situation. Since there is only one month of data available the heat-maps presented here are more 'smoothed' than ones I have presented previously. For this reason I am not 100% comfortable about the conclusions at the outer edges of the images. But in and around the strike zone, where there have been lots of hits, I think the results are good.

Vertical angle of a hit based on pitch location

First off let's look at the average vertical angle of a batted ball based the location in the strike zone where it was hit. We know that hit balls with a low vertical angle tend to be ground balls and pitches lower in the zone are hit more often for ground balls. Thus, we should expect that pitches down in the zone are hit for a low vertical angle. Is that the case?

The vertical angle ranges from 90 degrees (popped straight up), to -90 degrees (driven straight into the ground), with a zero degree hit being parallel to the ground. Also remember that the images are from the catcher's perspective, so negative x-values are inside to RHBs and positive x-values inside the LHBs.

vang_loc.png

As expected the lower in the zone the lower the vertical angle of the average hit ball. In opposite-handed at-bats there is an additional trend for away pitches to have a lower vertical angle off the bat. So pitches down-and-away are the most likely to be groundballs and pitches up-and-in are the most likely to be fly balls and pop ups. In same-handed at-bats this inside-outside trend is much weaker and the gradient is largely just based on vertical location of the pitch.

Horizontal speed off bat based on pitch location

The initial speed of the ball off the bat is not as important in determining the success of a hit as the initial horizontal speed. A hit popped straight up very fast is just as bad a hit popped straight that is a little slower off the bat. On the other hand, the horizontal speed (the speed of the hit in the horizontal plane) is important in determining how hard a ball is to field and how far it goes. So below I plot the average speed of a hit ball in the horizontal plane (in mph) versus pitch location. Based on my HR heat maps I expect the highest speed hits to be slightly up-and-in.

speed_loc.png

Wow, that is the opposite of my assumption. The peak speed is up-and-away, and far up-and-away. There is a large peak speed out of the strike zone. The area of high speed hits extends from up-and-away to down-and-in through the strike zone. This is actually the same trend we previously saw with the highest run value of contacted pitches. Remember this is just based on batted balls, so there could be something of a selection bias. Maybe the only pitches up-and-away that are swung at and hit get crushed. Still this result is very surprising to me.

Edit

Peter Jensen made the following comment:

I think you may want to choose actual SOB to graph instead of horizontal SOB. Balls hit with a greater vertical angle will have a smaller proportion of their speed as a horizontal component. A batter hitting a high inside fastball is almost forced to hit it in the air because he is hitting it during a portion of his swing where the bat angle has the head above the handle. That portion of the swing also is near the maximum swing speed so the batter will be trying to undercut the ball slightly to raise the vertical angle of the ball off the bat even more and maximize his distance and the possibility of a home run. So the batter is sacrificing horizontal speed off the bat to gain maximum hit ball distance

A batter hitting an outside high fastball. Will be doing just the opposite. His bat angle still has the head lower than the hands causing a lower vertical angle. Most batters should be trying to hit the ball as a line drive to the opposite field since their chances of hitting a home run a relatively small and a line drive to the opposite field maximizes their run value. It also lowers the overall vertical angle of the hit ball and maximizes the horizontal component of the total speed off the bat. That is why your second set of graphs look the way they do. Change from HSoB to SOB and they should look very different. Love the graph images by the way.

Here is the total speed off the bat by pitch location.

speed_loc.png

Just as Peter suggests this pulls the location of fastest balls off the bat closer to the batter and up. It is still slightly outside, but not far outside like before. The area of high horizontal speed hits down in the zone were, not surprisingly, slowish in total speed.

End of Edit

The next couple of weeks will be very exciting as this new wealth of data is examined. It affords a novel way to examine questions about baseball, and a potentially valuable tool to evaluate batters. If you have any general questions about the hitf/x data or any specific questions you think the data could answer feel free to post them in the comments. Also, make sure to check out Mike Fast's and Harry Pavlidis' early work with the data that I linked above.

F/X VisualizationsMay 28, 2009
PitchF/X Detective: Has Bradley's Strike Zone Been Widened
By Dave Allen

Last weekend Milton Bradley claimed that his strike zone had been expanded in retaliation for his early season run-in with umpire Larry Vanover.

Bradley believes his strike zone is being widened, forcing him to chase pitches he normally doesn't swing at or risk being called out on strikes.

Asked if there have been repercussions from Vanover's fellow umpires since the incident, Bradley didn't mince words.

"There always is," he replied. "No matter what, I'm the type of guy [where] I don't care what somebody does to a colleague of mine. I'm not going to treat him any differently. I do things straight up, because I'm a straight-up, honest individual.

"Unfortunately, I just think it's a lot of 'Oh, you did this to my colleague,' or 'We're going to get him any time we can. As soon as he gets two strikes, we're going to call whatever and see what he does. Let's try to ruin Milton Bradley.'

"It's just unfortunate. But I'm going to come out on top. I always do."

This claim was brought to my attention in Craig Calcaterra's ShysterBall blog where he suggested that someone with "PITCHf/x-fu" could check this assertion. I am not 100% sure what "PITCHf/x-fu" is, but I like to think I have it. Either way I thought this was an exciting new application of the pitchf/x data, so I decided to take Craig up on it and see if Bradley's strike zone has been any different this year.

First off we need the smallest bit of background on the strike zone. It is called differently to right- and left-handed batters; the outside edge is extended out a couple inches to lefties. In addition, its size is count-dependent, expanding in hitter's counts and shrinking in pitcher's counts. These two facts make an assessment of Bradley's claims a little tricky. He is a switch hitter so we have to break up the analysis for him as a LHB and as a RHB. And any differences could be the result of differences in the fraction of time he is in hitter's versus pitcher's counts this year compared to the past.

The pitchf/x system was phased-in in 2007 and has been operational in every game since, so I am going to compare pitches Bradley took in the part of 2007 covered and all of 2008 to those he took in 2009 thus far (ignoring the count issue temporarily). Here are the pitches he took as a RHB. Remember, the images are from the catcher's, so negative values of x are inside to a RHB and positive inside to a LHB. The gray dots are balls and the black dots called strikes.

as_rhb.png

There are too few taken pitches in 2009 as a righty to make much of a firm conclusion, but it does not look terribly out of whack. There are two called strikes on the inside edge, but right below them are four balls also along the inside edge.

Here are pitches he took as a LHB.

as_lhb.png

Bradley has way more at-bats as a lefty and thus there are more taken pitches. These addition pitches allowed me to make called strike contours. These contours are closed lines such that a pitch inside the line is a strike 50% of the time or more and a pitch outside the line is a ball 50% of the time or more. Here you can see how the outside edge of the strike zone is shifted farther outside to Bradley as a lefty, as is the case to all LHBs. The inside edge of the pre-2009 and 2009 zones are almost exactly the same. Up and outside the pre-2009 zone is larger, but down and outside the 2009 zone is larger. As a whole the two are almost exactly the same size.

To make this conclusion statistically explicit, and correct for the count, I ran a binomial logistic regression. This is a regression in which the dependent variable only takes two values, in this case 1 if a taken pitch is called a strike and 0 if it is called a ball. The dependent variable is regressed against any number of ordinal and/or categorical variables. In effect this binomial logistic model uses these regressors to calculate the probability a taken pitch is called a strike, and tells you which of the regressors are statistically significant in determining that probability. The technique is identical to that taken in my earlier strike zone post, but this time I restrict the analysis to just Bradley's data.

I regressed Bradley's strike/ball taken pitches against the horizontal distance between that pitch and the horizontal middle of zone (with a different middle for Bradley as a LHB and RHB), the vertical distance from that pitch and the vertical middle of zone, the interaction of these two distances, the number of balls and strikes (to control for the count) and a categorical factor of pre-2009 or 2009.

 Binomial Logistic Regression
+-----------------+----------+------------+---------+------------+
|                 | Estimate | Std. Error | z Value |    P(>|z|) |
+-----------------+----------+------------+---------+------------+
| (Intercept)     |    5.995 |      0.370 |   16.21 |  < 2e-16 * |
| x Dist.         |   -0.364 |      0.022 |  -16.37 |  < 2e-16 * |
| y Dist.         |   -0.526 |      0.031 |  -17.48 |  < 2e-16 * |
| x*y Interaction |    0.012 |      0.000 |   13.87 |  < 2e-16 * |
| Num. Strikes    |   -0.897 |      0.178 |   -5.03 |  4.8e-07 * |
| Num. Balls      |    0.251 |      0.085 |    2.96 |    0.003 * |
| 2009            |   -0.023 |      0.217 |   -0.10 |    0.914   |
+-----------------+----------+------------+---------+------------+

Regressors with a negative estimate decrease the likelihood of a pitch being called a strike. So as the x or y distance increases the probability of a strike decreases, as expected. As the number of strikes increases the probability of a strike decreases (the strike zone shrinks in pitcher's counts) and as the number of balls increases the probability of strike increases (the strike zone expands in hitter's counts). All of these effects are strongly significant and mirror the results for all hitters.

The difference between the pre-2009 and 2009 zone is very slight, and if anything the 2009 zone is slightly smaller. Taken pitches in 2009, correcting for distance and count, are slightly less likely to be strikes. But this effect is very non-significant. There is over a 90% chance the difference between pre-2009 and 2009 zones is just due to chance alone. There is no statistical difference between Bradley's zone this year and his zone in 2007 and 2008.

I can understand Bradley was frustrated on Sunday. The Cubs had just lost seven straight games, and in five of those games they scored either zero or one run. He is hitting a meager .196/.322/.373 this season, but he has his decreased BABIP and LD% and increased GB% to blame for it, not the umpires.

F/X VisualizationsMay 22, 2009
Optimal Fastball-Changeup Speed Separation
By Dave Allen

A large part of the success of a changeup is assumed to be based on its deceptive nature. Hitters expect a fastball based on the changeup's delivery and movement, but the pitch is about 10% slower. This throws off the hitter's timing, hopefully causing him to whiff or make poor contact. If this is the case we should expect the success of the changeup to be at least partially based on the difference in velocity between it and the fastballs that precede it. In this post I am going to examine this assumption. Is the success of a changeup tied to this difference? What is the optimal difference is speed?

Josh Kalk examined this question in a slightly different manner, looking at the relationship between the success of a pitcher's changeup over the course of a season and the difference in speed between his average changeup and average fastball. He found a linear relationship with increasing success based on increasing difference. I wanted to take a more granular approach and look at the success of a changeup based on the difference in its speed from the last fastball thrown to the batter, all the fastballs thrown to the batter in that at-bat and all the fastballs thrown to the batter in that game.

Here is the run value of a changeup based on how much slower (release speed) it was than the most recent fastball thrown to the batter in the at-bat the changeup was thrown. Changeups thrown before any fastballs were thrown in an at-bat were excluded from this analysis.

last_fa_fig.png

This suggests that the optimal changeup is between 5% and 12% slower than the previous fastball. The gray lines show the standard error. The results are similar if you compare the changeup to all previous fastballs thrown in the at-bat and all previous fastballs the hitter has seen in the game. The results are highly non-linear. There is little difference between throwing a changeup between 5% and 12% slower, but if it is less than 5% or greater than 12% slower the effectiveness rapidly drops off. This rapid drop off it not surprising; changeups that are too fast are effectively slow fastballs and changeups that are too slow don't look enough like fastballs. But, I am very surprised by how flat the graph is between 5% and 12%.

These results are seemingly at odds with Kalk's. He found that pitchers who average only 5 mph difference between their fastball and changeup over the course of a season have less successful changeups than those who average 10 or more mph difference. My results suggest that an individual changeup has about the same success if it is preceded by a fastball that is 5 mph or 10 mph faster. I am not sure how to reconcile these two different conclusions, but I am going to think about it more in the future and welcome any comments.

F/X VisualizationsMay 15, 2009
What Does a Fastball Hitter Look Like?
By Dave Allen

So far most of the pitchf/x analysis has focused on the pitcher, but each at-bat says just as much about a hitter as it does a pitcher. Thus, the pitchf/x data offers a wealth of information about batters that is currently underutilized. There have been some exceptions: Max Marchi's look at how the location in the zone of a hit pitch correlates with the location in the field of the resulting ball in play and Josh Kalk's look at how different hitters respond to first pitch fastballs. There have also been some great pitchf/x analyses of individual hitters: Jeremy's look at Micah Owings as a hitter, Trip Somers' look at Nelson Cruz's plate discipline and Mike Fast's examination of Jack Cust's performance against fastballs. In this post I want to continue this application of pitchf/x data to hitter analysis.

You often hear certain hitters referred to as 'fastball hitters.' I wanted to see if this is justified. Is there a certain subset of batters who do particularly well against fastballs? The stereotype is that fastball hitters are high strikeout, HR hitters. Is this the case? More generally, what can we say about the offensive performance of good fastball hitters versus good non-fastball hitters.

For every hitter in the pitchf/x database I found the average run value for all fastballs and all non-fastballs thrown to him during part of 2007 and all of 2008 (the pitchf/x system was added incrementally to different ballparks during the 2007 season). Here are the leaders and laggards:

+-------------------+--------+------------+-------------------+--------+------------+
| Name              | num FA | FA run val | Name              |num nFA |nFA run val |
+-------------------+--------+------------+-------------------+--------+------------+
| Albert Pujols     |   1973 |     0.0348 | Jody Gerut        |    412 |     0.0332 |
| Shin-Soo Choo     |    813 |     0.0313 | Lance Berkman     |   1284 |     0.0329 |
| Mark Teixeira     |   2657 |     0.0260 | Manny Ramirez     |   1351 |     0.0311 |
| Chipper Jones     |   2068 |     0.0251 | Magglio Ordonez   |   1121 |     0.0309 |
| Jack Cust         |   2337 |     0.0229 | Chris Davis       |    480 |     0.0298 |
| Alfonso Soriano   |   1545 |     0.0223 | Vladimir Guerrero |   1525 |     0.0290 |
| David Ortiz       |   1938 |     0.0217 | Milton Bradley    |    891 |     0.0272 |
| Josh Hamilton     |   1687 |     0.0217 | Nomar Garciaparra |    708 |     0.0261 |
| Carlos Quentin    |   1242 |     0.0215 | Alex Rodriguez    |   1147 |     0.0258 |
| Ryan Howard       |   2030 |     0.0210 | Matt Holliday     |   1178 |     0.0213 |
+-------------------+--------+------------+-------------------+--------+------------+
| Omar Vizquel      |   1227 |    -0.0178 | Craig Monroe      |    564 |    -0.0162 |
| Nomar Garciaparra |    936 |    -0.0180 | John McDonald     |    495 |    -0.0167 |
| Jose Molina       |    894 |    -0.0199 | Brad Ausmus       |    427 |    -0.0171 |
| Carlos Gonzalez   |    625 |    -0.0204 | Adam Kennedy      |    534 |    -0.0176 |
| Chris Burke       |    892 |    -0.0205 | Brandon Inge      |   1062 |    -0.0179 |
| Tony Pena         |    894 |    -0.0218 | Jacque Jones      |    605 |    -0.0180 |
| John McDonald     |   1026 |    -0.0236 | Yorvit Torrealba  |    715 |    -0.0204 |
| Omar Quintanilla  |    638 |    -0.0260 | Endy Chavez       |    429 |    -0.0230 |
| Andy LaRoche      |    686 |    -0.0261 | Corey Patterson   |    653 |    -0.0267 |
| Wily Mo Pena      |    549 |    -0.0290 | Tony Pena         |    504 |    -0.0348 |
+-------------------+--------+------------+-------------------+--------+------------+

Of course the leaders of both lists are going to be amazing hitters, this is almost by definition since we searched for the best fastball and non-fastball hitters. But there are some interesting names among the leaders, with Shin-Soo Choo surprisingly the second best fastball hitter in the pitchf/x era. Amazingly Jody Gerut was the best non-fastball hitter. Nomar Garciaparra was a great non-fastball hitter and a horrid fastball hitter. The laggards are mostly no-hit middle infielders and catchers. Tony Pena and John McDonald, mercilessly, end up on both laggard lists.

About 60% of pitches thrown are fastballs so the overall performance (against all pitches) of the best fastball hitters should be better than the overall performance of the best non-fastball hitters. That is the case: they have a higher walk rate (13% versus 11%), a higher HR per fly rate (21% versus 17%) and a higher OPS (.942 versus .920). The non-fastball hitters strike out less (16% versus 18%) and have a higher batting average of balls in play (.337 versus .322). This begins to bear out the stereotype that fastball hitters tend to be high K, high HR hitters. But I don't consider Albert Pujols a fastball hitter, he is an all around amazing hitter. I think a better metric of "fastball hitterness" is the difference between the average run value of fastballs and a non-fastballs thrown to a given hitter. Here are the leaders (perform better versus fastballs) and laggards (perform better against non-fastballs) for this metric.

+-------------------+--------+------------+------------+------------+
| Name              |    num | run val FA |run val nFA |        dif |
+-------------------+--------+------------+------------+------------+
| Shin-Soo Choo     |   1369 |     0.0313 |     0.0004 |     0.0309 |
| Jack Cust         |   4224 |     0.0229 |    -0.0027 |     0.0256 |
| Gary Matthews     |   3209 |     0.0099 |    -0.0144 |     0.0242 |
| Brandon Moss      |   1067 |     0.0069 |    -0.0149 |     0.0218 |
| Travis Hafner     |   2060 |     0.0089 |    -0.0128 |     0.0217 |
| Brian Schnieder   |   1662 |     0.0059 |    -0.0153 |     0.0212 |
| Reed Johnson      |   2101 |     0.0089 |    -0.0123 |     0.0211 |
| Michael Young     |   4299 |     0.0097 |    -0.0113 |     0.0211 |
| Chris Young       |   3910 |     0.0107 |    -0.0093 |     0.0200 |
| Jason Bay         |   3378 |     0.0164 |    -0.0031 |     0.0196 |
+-------------------+--------+------------+------------+------------+
| Mike Jacobs       |   2296 |    -0.0045 |     0.0198 |    -0.0243 |
| Austin Kearns     |   1859 |    -0.0126 |     0.0121 |    -0.0247 |
| Willie Bloomquist |   1295 |    -0.0128 |     0.0133 |    -0.0261 |
| Clint Barmes      |   1505 |    -0.0100 |     0.0171 |    -0.0271 |
| Kenji Johjima     |   2718 |    -0.0174 |     0.0103 |    -0.0277 |
| Omar Infante      |   1441 |    -0.0139 |     0.0156 |    -0.0295 |
| Chirs Davis       |   1143 |     0.0001 |     0.0298 |    -0.0297 |
| Omar Quintanilla  |   1012 |    -0.0260 |     0.0039 |    -0.0300 |
| Jody Gerut        |   1249 |    -0.0005 |     0.0332 |    -0.0337 |
| Nomar Garciaparra |   1644 |    -0.0180 |     0.0261 |    -0.0442 |
+-------------------+--------+------------+------------+------------+

A casual glance confirms our picture of fastball hitters as high strikeout, high power guys (Chris Davis seems really out of place among the non-fastball hitters). But it is hard to make any conclusions about what fastball hitters are like generally because fastball hitters are on average better hitters (since most pitches are fastballs). The measure of fastball hitterness (average fastball run value minus average non-fastball run value) is positively correlated with almost any offensive measure: HR per fly, BB rate, OBP, SLG, wOBA, BABIP, LD%. What I need to do is compare fastball hitters against non-fastball hitters who are just as good, and see in what respects they differ.

In order to make this comparison I am going to look at the relationship between a hitter's fastball run value minus non-fastball run value and a number of offensive metrics (K rate, HR per fly, BABIP, BB rate, GB%, LD%) relative to the hitter's overall offensive level. I use wOBA as my measure of a hitter's offensive level (wOBA, another TangoTiger creation, is one of the best metrics of a player's offensive value). The first thing to do is find the linear relationship between wOBA and all these measures (it is positively correlated with just about any meaningful offensive metric). Then for each batter I look at the difference between his value for a given measure and that expected based on his wOBA. This gives the hitter's performance for that measure relative to his overall offensive level.

An example would be helpful. The graph below displays the relationship between wOBA and walk rate. Generally the more a player walks the higher his wOBA, as you can see by the trend line I drew in. For each hitter I calculate the residual, which is how much more or less that player walks compared to his wOBA peers. The red line is the residual for Jermaine Dye. He walked 3.4% less than expected based on his wOBA, so his residual is -0.034. The blue line is Gregor Blanco who walked much more than his wOBA would suggest, so his residual is 0.059. The green dot is Carlos Quentin. His residual is just below zero.

residuals.png

These residuals tell me if a player gets a greater than average amount of his offensive value from walks (like Blanco), or on the other hand if he gets less value from walks and gets his excess value elsewhere (like Dye does with his power). I calculated these residuals for all the offenses measure mentioned above. Now I am ready to see if fastball hitters get their value from walks, home runs, avoiding strikeouts (contact skills), having a high BABIP, or anything else by seeing the how my "fastball hitterness" correlates with each of these residuals.

The results confirm our initial assumptions. There is a strong positive correlation between fastball run value minus non-fastball run value and the HR per fly, BB% and K% residuals. So hitters who perform better against fastballs than non-fastballs hit more HRs, take more walks and strikeout more than the average hitter of the same offensive level. Fastball hitters tend to be power hitters. This would suggest that pitchers should throw fewer fastballs to power hitters, which is exactly what they do. It seems MLB pitchers knew all of this already, but I am happy to confirm for them.

F/X VisualizationsMay 13, 2009
Platoon Splits for Three Types of Fastballs
By Dave Allen

On Friday I looked at the run value of four-seam, two-seam and cutter fastballs based on pitch movement. In that post I noted, that it looked like two-seam fastballs had very extreme and cutters almost no platoon split. This comment was offhand, and I did not demonstrate that was the case. In this short post I will do that.

A month ago I looked at the platoon splits of fastballs, changeups, sliders and curves. My results reconfirmed what John Walsh showed in the 2008 Hardball Times Annual: fastballs have an intermediate platoon split, sliders a very extreme one, and changeups and curves none. In that post I grouped all fastballs together. Based on those results and the results of last week's post I was very curious to see the platoon splits for the different fastball types.

fa_platoon_08.png

These results are consisitent with the remarks I made on Friday:

  • Two-seam fastballs have an extremely large platoon split, as big as the slider platoon split.
  • The platoon split for cutters is not statistically significant.
  • Four-seam fastballs have a small yet significant split.

Interestingly, there is no trend for pitchers to throw the pitches in different proportions to lefties and righties. Approximately 48% of all pitches are four-seam fastballs, 8% are two-seam fastballs and 4% are cutters with almost no difference in same- and opposite-handed at-bats for either RHPs or LHPs. This is very strange it would seem pitchers would do well to throw two-seams fastballs much more in same-handed at-bats, as they do with sliders, and cutters in opposite-handed at-bats, as they do with changeups.

One pitcher who does this, and I would guess this is a big reason for his success, is Jon Lester. Lester, a lefty, throws all three of these fastballs. Here are the proportion of pitches to RHBs and LHBs that are each of the three fastball types.

+------------------+---------+---------+
| Fastabll Type    |     RHB |     LHB | 
+------------------+---------+---------+
| Four-Seam        |   0.317 |   0.322 |
| Two-Seam         |   0.155 |   0.290 |
| Cutter           |   0.133 |   0.077 |
+------------------+---------+---------+

This is the type of breakdown I think pitchers should use, way more cutters to opposite-handed batters and more sinkers/two seamers to same-handed batters. I am surprised that is the not the case generally. It would be interesting to see if successful pitchers, like Lester, are more likely to show this breakdown than the average pitcher.

F/X VisualizationsMay 08, 2009
Fastball and Changeup Run Value by Movement
By Dave Allen

Two weeks ago I looked at the run value of curveballs, sliders and knuckleballs based on their movement. Today I am going to do the same for changeups and three kinds of fastballs: four-seam fastballs, two-seam fastballs and cutters. This work was motivated by Sky Kalkman's Understanding Pitch f/x Graphs piece in which commenters suggested they have a hard time putting pitch movement in perspective.

Here is how the pitchf/x system measures movement from my post two weeks ago.

The movement of a pitch is the difference between where you would expect the pitch to end up as it crosses the plate based solely on its velocity, trajectory and gravity and where it actually ends up as it crosses the plate. This difference is broken up into its horizontal and vertical components. Then you can plot the horizontal and vertical movements of a number of pitches together in a scatter plot to see the movement of a particular pitch type or from a particular pitcher.

As in the previous post I used all the pitches in the pitchf/x database to do the analysis. This presented a problem; in 2007 and 2008 the pitchf/x system classified almost all fastballs as generic fastballs making no distinction between four- or two-seam fastballs, sinkers, or cutters. Starting this year the system made these finer fastball classifications. So the first thing I had to do was go back and reclassify each pre-2009 fastball as a four-seam, a two-seam/sinker or a cutter. Although sinkers and two-seam fastballs are different pitches I had a hard time differentiating them using the pitchf/x data so I lumped them here.

I used a k-means clustering algorithm that assigned a pitch to a cluster based on its vertical and horizontal acceleration and its speed. I am fairly confident in my classifications. The average horizontal and vertical movement and speed of each of the three types of fastballs I classified are quite close to the values Josh Kalk found when he classified the pitches. One slight discrepancy is that my RHP's cutters do not have as much positive horizontal movement as Kalk's (and my LHP's cutters do not have as much negative horizontal movement as Kalk's). I think that Kalk reclassified some sliders as cutters and I am missing those since I am just reclassifying fastballs not all pitches.

Four-Seam Fastballs

For each pitch type I first show the range of movement for all RHPs throwing that pitch in gray, and then some specific examples in green, blue and red.

four_movement.png

Four-seam fast are, on average, the fastest pitches (about 1.5 mph faster than two-seam fastballs and 3.5 mph faster than cutters), they 'rise' (drop less than expected from gravity) more than any other pitch and tail in to same-handed batters (away to opposite-handed batters) by about 5 inches. These fastballs include what are thought of as 'high-heat' fastballs. Chris Young has a very effective four-seam fastball that 'rises' more than a foot on average. Dan Haren as of last week had the best four-seam fastball of all starters. Four-seam fastballs have a large variation in horizontal movement both between different pitchers and between pitches thrown by the same pitcher, for example some of Ubaldo Jimenez's four-seam fastballs tail over 10 inches in to RHBs and others have almost no horizontal movement what-so-ever.

The run value images were created in the same way as described in the first post in this series. I just give the RHP ones to keep the post from data overload.

run_mov_FA.png

In same-handed at-bats the more vertical 'rising' movement the better. This trend is not unexpected, but strikingly consistent. For these same-handed at-bats horizontal movement has very little effect. In opposite handed at-bats a large central region of pitches has a very high run value. These fastballs have 'average' movement, and left handed batters have no trouble with them.

Two-Seam Fastballs

sinker_movement.png

Two-seam fastballs are a little slower, tail in more to same-handed batters, and have much less, sometimes even negative, vertical movement than four-seam fastballs. As I said before this group of pitches includes both two-seam fastballs and sinkers. These fastballs, when they are effective, induce lots of groundballs. As of last week Derek Lowe had the best two-seam fastball. It has nice 'sink' and a wide range of horizontal movement. Brandon Webb's sinker is the one of the best in the game, it has even more 'sink' than Lowe's. Justin Masterson pitches from a three quarters arm slot and is able to get negative vertical movement on his sinker (it drops more than expected from gravity).

run_mov_FT.png

Two-seam fastballs have an incredible platoon split. Against same-handed batter they tend to be very good pitches improving slightly with more horizontal movement towards the hitter and greatly with more downward movement or 'sink'. Against opposite handed batters two-seam fastballs are not very effective, and those with intermediate levels of vertical movement get crushed.

Cutter

cutter_movement.png

Cutters are, on average, slower than four- and two-seam fastballs by about 3.5 and 2 mph respectively. Their movement is intermediate to a four-seam fastball and a slider. You can't talk about cutters without mentioning Mariano Rivera's. It is amazingly successful, almost the only pitch he throws and one of the most unique pitches in the game. It has a wide range of vertical break and breaks away from RHBs. Roy Halladay has a very successful cutter with lots of 'sink'. Jake Peavy doesn't throw as many cutters as Halladay or Rivera, but his have very interesting movement too.

run_mov_FC.png

Cutters seem to have almost no platoon split. In fact the patterns look the same and are not mirror images of each other as is usually the case. So cutters from RHPs that break to the catcher's left do poorly against RHBs and LHBs, while those that break to the catcher's right do well against RHBs and LHBs. This is quite strange, and helps explain how Rivera can be so successful with just the one pitch.

Changeups

change_movement.png

In 2008 no pitcher threw more changeups than Edinson Volquez. His changeups have very extreme down and in movement. Jair Jurrjens was also in the top five of changeups thrown percentage, his has intermediate movement. Jered Weaver's change has more 'rise' than any other.

Changeups are predominately throw in opposite-handed at-bats so I just present those images below.

run_mov_CH.png

Changeups that have very little movement (close to 0,0) get crushed. Those with extreme vertical movement, either lots of rise or lots of sink, are very successful. Since changeups are thrown in opposite handed at-bats even those with neutral run values are good pitches.

Speed

The elephant in the room here is pitch speed. The success of a fastball or a changeup is very much tied to its speed, which this analysis ignores. In addition, pitch movement and speed are not independent. John Walsh showed fastball speed positively correlates with its vertical movement. So the success of four-seam fastballs with lots of rise might be since these tend to be faster pitches. In a future post I hope to examine this relationship between speed and movement, and see how they jointly affect a pitch's outcome.

F/X VisualizationsMay 01, 2009
Pena and Quentin: Home Runs from Down and Away
By Dave Allen

Before the season I looked at home run rate (per pitch) by pitch location. In that post I found that the highest home run rate was slightly up and in within the strike zone, a finding which has since been confirmed and expanded by Jonathan Hale. That post also presented some hitters who hit lots of home runs outside of that up and in region. Two examples I gave were Carlos Pena and Carlos Quentin. Here are the images I presented, with the average HR rate of all LHBs for Pena and RHBs for Quentin in gray and their 2007 and 2008 home runs plotted over that in red. Remember these images are from the catcher's perspective so Pena, a LHB, stands to the right of the strike zone and Quentin to the left of the zone.

quentin.png
pena.png

Both hit most of their home runs down and away, and very few in the traditional power region up and in. They also happen to be at the top of this year's early HR leader board, Pena tied for the lead with nine and Quentin just one behind with eight. It was interesting for me to see the two of them at the top of the list after profiling their abnormal home run hitting patterns before the season, so I wanted to check the pitch locations of their home runs so far this year. I used the images from above, shrunk the 2007 and 2008 home run indicators a little and plotted the 2009 home runs with larger circles.

quentin.png
pena.png

The home run locations are still fairly different from the average hitter and pretty close to the 2007 and 2008 locations. The centroid of Pena's 2007 and 2008 home runs was (-0.10,2.39) and of his 2009 home runs (-0.16,2.56). So his home runs so far have been even more outside than the last two years and slightly higher. Quentin's '07/'08 home run centroid was (0.18,2.33), and his '09 home run centroid is (0.03,2.26). So his home runs have moved in, but are even lower in the zone than the last two years. Both are still hitting more home runs in the outside half than in the inside half of the zone, which is very different than the average hitter. It is interesting that these two top home run hitters generate so much power in a location where most hitters have a near zero home run rate.

EDIT:In the comments Rich asked a great question about what type of pitches Quentin and Pena are hitting for home runs. Here is the breakdown of home run rate per pitch by pitch type for each of them and the over all league average.

+-------------------+-------------+-------------+-------------+
| HR rate per pitch |     Quentin |        Pena | Leag. Aver. |
+-------------------+-------------+-------------+-------------+
| Fastballs         |      0.0174 |      0.0163 |      0.0071 |
| Changeups         |      0.0132 |      0.0068 |      0.0075 |
| Sliders           |      0.0104 |      0.0180 |      0.0056 |
| Curveballs        |      0.0275 |      0.0089 |      0.0049 |
+-------------------+-------------+-------------+-------------+

Pena's per pitch rates are lower than Quentin's but his over all number of home runs is higher because he sees more pitches per plate appearance (4.0 versus 3.6). For almost every pitch type they hit more than league average, but the difference is very high for Pena with sliders and for Quentin with curves. So I graphed their home runs by pitch type.

quentin.png
pena.png

It looks like sliders for Pena and curves for Quentin are really pulling their average location down and away. Their fastballs are a little bit more away and down than the average hitter, but I think what makes their home run locations particularly distinctive is the large amount of breaking pitches they hit for home runs which are down and away. From Hale's article it does not look like most hitters sliders and curves for home runs in these locations. Great question Rich.

EDIT 2: Rich made another great suggestion of looking at the locations of where all these home runs ended up. First Quentin:

Picture%205.png

Rich's take, which I agree with:

I was surprised how many home runs he's pulling given your findings. I think it shows how strong he is as the average hitter wouldn't be able to turn on those breaking balls on the outer half of the plate like Carlos.

Now Pena:

Picture%206.png

Pena is hitting lots to dead center. It would be interesting to combine the two data sets, and see how the location of the pitch corresponds to the location of the home run, like Max Marchi did here. Or look at how the location of home run corresponds to the pitch type.

F/X VisualizationsApril 29, 2009
Looking Back at Burrell's Defense
By Dave Allen

I mentioned a couple of weeks ago how this offseason teams placed a greater emphasis on defense, and particularly outfield defense. Some teams went out of their way to create power-house outfield defenses, and on the other hand poor-fielding outfielders got much smaller contracts than expected. I have already checked in with an example of the former, now I want to look back at an example of the latter.

From 2005 to 2008 Pat Burrell cost the Phillies about 48 runs with his defense in left field--costing them almost 5 wins. I wanted to see if we could visualize this defensive ineptitude. I employed the run value by field location technique I first introduced here. This time I took all balls in play at Citizens Bank Park split up by when the Phillies were in the field and when the visitors were in the field. That way you can compare the defense of the Phillies's left fielders from 2005 to 2008 (mostly Pat Burrell) to all visiting left fielders in that time.

burrell_fielding.png

I had hoped that the results would be more dramatic, but you can definitely see that the red blob for the Phillies is smaller than the blob for the visitors. In addition there is much more deep green in left field for the Phillies than for the visitors. Good thing Burrell is now predominately a DH, too bad the Phillies replaced him with Raul Ibanez.

EDIT: In the comments LarryinLA suggested graphing the difference between the two images as a better way of displaying the information. In the image below positive areas (blue) are where the Phillies' defensive did better than the visitor's defense, and negative (red) where the Phillies' did worse.

burrell_dif_2.png

I think this shows the difference even better. It looks like Burrell was particularly bad on balls hit down the foul line.

F/X VisualizationsApril 27, 2009
Best Pitches of the Year So Far
By Dave Allen

After the 2007 season John Walsh looked at the best pitches of each type for 2007. For example, that year Heath Bell had the best fastball. For every 100 fastballs he threw the opposing team scored 2.7 runs less than expected. For this quick post I wanted to check in on pitchers so far this year and see who had the best of each pitch type. Like John I am going to measure a pitch by its run value (in the link John has a great description of the run value of pitch).

+-----------------------+--------+-------------------+
| Four-Seam Fastball    | Number | Run Value per 100 |
+-----------------------+--------+-------------------+
| David Aardsma	        |    101 |              -4.6 |
| Jonathan Broxton      |     89 |              -4.3 |
| Brian Stokes          |     75 |              -4.2 |
| Frank Francisco       |     76 |              -4.1 |
| Dan Haren             |    201 |              -4.1 |
+-----------------------+--------+-------------------+

It is incredible that over twice as many pitches and as a starter Dan Haren's four-seam fastball is right up there with those of four hard throwing relievers. Heath Bell's fastball is still very good checking in at 9th on this list.

+-----------------------+--------+-------------------+
| Two-Seam/Sinker       | Number | Run Value per 100 |
+-----------------------+--------+-------------------+
| Derek Lowe	        |     44 |              -7.8 |
| Josh Beckett          |     32 |              -7.8 |
| Jamie Shields         |     37 |              -6.3 |
| Rick Porcello         |     64 |              -6.3 |
| Ramon Ramirez         |     32 |              -5.3 |
+-----------------------+--------+-------------------+

It is my understanding that the new pitchf/x pitch classification system calls two-seam fastballs sinkers for some pitchers, so I grouped both of them here. Tiger's fans must be thrilled to see Porcello's name on any list that includes Lowe, Beckett and Shields.

+-----------------------+--------+-------------------+
| Changeups             | Number | Run Value per 100 |
+-----------------------+--------+-------------------+
| Dallas Braden	        |     79 |              -6.5 |
| Shairon Martis        |     45 |              -6.1 |
| Anthony Reyes         |    100 |              -5.2 |
| Jered Weaver          |     44 |              -4.8 |
| Johan Santana         |     74 |              -4.4 |
+-----------------------+--------+-------------------+

Shairon who? Luckily Harry Pavlidis broke down his stuff for us about a month ago.

+-----------------------+--------+-------------------+
| Curves                | Number | Run Value per 100 |
+-----------------------+--------+-------------------+
| Javier Vazquez        |     62 |              -6.5 |
| Wandy Rodriguez       |    133 |              -5.1 |
| Jeff Niemann          |     44 |              -4.9 |
| Jose Veras            |     42 |              -4.6 |
| Paul Maholm           |     48 |              -3.9 |
+-----------------------+--------+-------------------+

Wandy had the top curveball in 2007. Erik Bedard just missed the top 5 with -3.6 runs per 100 on his 127 curves, so on a total run value basis he is second only to Rodriguez.

+-----------------------+--------+-------------------+
| Sliders               | Number | Run Value per 100 |
+-----------------------+--------+-------------------+
| John Danks            |     55 |              -6.0 |
| Kyle Davis            |     32 |              -5.1 |
| Santiago Casilla      |     34 |              -4.8 |
| Yovani Gallardo       |     29 |              -4.8 |
| Mark Lowe             |     30 |              -4.6 |
+-----------------------+--------+-------------------+

This is an interesting list with mostly younger pitchers.

One HUGE caveat here is that I did not adjust for the strength of the batters faced. So if a pitcher has only faced poor batters his numbers could be artificially inflated. Also if a pitcher tends to throw a particular pitch only against very good or very bad batters that could throw things off. When I make these lists again at the all-star break or at the end of the year I will properly adjust for the batters faced.

F/X VisualizationsApril 24, 2009
The Breaking and the Knuckling: Run Value by Pitch Movement
By Dave Allen

Over at Beyond the Box Score Sky Kalkman posted an introduction to understanding pitchf/x graphics. It is a great post for people who are having a hard time understanding these graphics. I also liked the comments section where there is some discussion of the state of pitchf/x analysis. In particular some commenters noted areas of the current analysis they found lacking.

Trey Hilman's Chin commented:

I do have one question to go along with all this. For any particular pitch, is there a range of movement that is generally recognized as “good” for that pitch classification? I am terrible at judging “stuff” simply by watching a pitch, but it would be nice to look at some of these charts and intuitively see that a particular pitch had a “nasty slider” tonight, etc.

Similarly, azruavatar wrote:

5 inches of break is absolutely meaningless to me in the context of a slider. I also question whether all 5 inches are created the same. Rivera’s cutter is notorious for late movement. If a pitch moves 5 inches over 20 feet compared to 5 inches over 60 feet that’s an incredible difference.

It seems that people are having the hardest time intuitively understanding pitch movement and putting an individual pitch's movement in perspective. Another commenter suggested Josh Kalk's two-part Anatomy of a League Average Pitcher series. The first broke down the league average fastball, sinker and cutter by presenting the frequency distribution of speed and movement for these pitches, and the second did so for off-speed and breaking pitches. These allow one to see if, say, a pitcher's curveball breaks more than the average curveball. But we are still left wondering if that additional movement makes the pitch any more successful. I will begin to address this question here for the breaking (and knuckling) pitches, and look at fastballs and changeups in a future post.

The pitchf/x system measures pitch movement in a number of ways but the two easiest to understand are the horizontal movement (pfx_x) and the vertical movement (pfx_z) of a pitch. Alan Nathan has a helpful description of the meaning behind these two values:

pfx_x,pfx_z: The deviation (in inches) of the pitch trajectory from a straight-line in the x (horizontal) and z (vertical) directions...[T]he effect of gravity has been removed from pfx_z, so that both parameters are the "break" of the pitch due to the Magnus force on a spinning baseball...[A positive value of pfx_x corresponds to] a deviation to the catcher's right and a negative value to the catcher's left. Similarly, a positive value of pfx_z is a pitch the drops less than it would from gravity alone (most pitches fall in this category), whereas a negative value is a pitch that drops more than from gravity alone (e.g., a "12-6" curveball).

So the movement of a pitch is the difference between where you would expect the pitch to end up as it crosses the plate based solely on its velocity, trajectory and gravity and where it actually ends up as it crosses the plate. This difference is broken up into its horizontal and vertical components. Then you can plot the horizontal and vertical movements of a number of pitches together in a scatter plot to see the movement of a particular pitch type or from a particular pitcher.

Curveballs

curve_movement.png

In gray, are all curveballs thrown by RHPs. You can see that most tail to the catcher's right by about 5 inches (meaning they tail away from RHBs) and break down by about 5 inches. On top I plotted the curveballs of three pitchers with distinctive and successful curves. Bronson Arroyo's curve has almost no vertical movement, but far and away the most horizontal movement of any curveball in the game. A.J. Burnett's curve, on the other hand, has some of the most downward movement of any pitcher's curve, but average horizontal movement. (Arroyo's curve's dependence on its heavy horizontal movement compared to Burnett's on its heavy vertical movement may partially explain Arroyo's more extreme platoon split compared to Brunett's). Zack Greinke combines intermediate levels of horizontal and vertical movement in his very successful curveball.

I am using the pitchf/x given pitch classifications and you can see three strange 'blobs' off of the central cluster of pitches. These are not curveballs. I think they are misclassified changeups. One cluster comes from sidearm pitchers and another from pitchers who throw sinking fastballs and changeups.

Now that we have seen the range of movement for all and a select group of individual pitchers's curves we can look at how curveball success varies by movement. In the images below I show the run value of a curve based on its movement. I decided to take a slightly different approach from my run value by location heat maps. I wanted to show not only the run value by movement, but also roughly the number of pitches with that movement. So I plotted the heat map colors on top of the scatter plot of pitches. Note that I change the color scale in each image, while this makes it harder to compare across images, it makes it easier to highlight differences within a particular image.

cu_move_run_value.png

These are pretty messy complicated images. Studes suggests that at times these heat maps are too messy to be very informative. I think that is the case here (although I cannot agree too much or I lose my raison d'être). So I took a more traditional route below and plotted run value versus first the vertical movement (averaging over the horizontal) and then against the horizontal movement (averaging over the vertical).

cu_rr_summary.png cu_lr_sum.png

These figures reveal an interesting dichotomy between same handed versus opposite handed at-bats. In opposite handed at-bats the success of the curveball is mostly determined by its vertical break. The greater the downward break the more successful the curve. Conversely, in same handed at-bats the horizontal movement of the pitch largely drives the pattern. The more a curveball tails away from a batter the more successful it is.

Sliders

slider_movement.png

RHP's sliders, on average, have slight tailing away movement from RHBs and slight rising movement, although there is considerable variation. Greg Maddux's slider, for example, tailed in to RHBs. Justin Duchscherer's slider has little horizontal movement but above average rising movement. Carlos Marmol's slider is in the top five among sliders for both horizontal and downward movement, which makes it the slider with the most overall movement in the game.

I use the same technique described above for curveballs to produce the run value by movement images for sliders below. Since sliders are thrown overwhelmingly in same handed at-bats I only present those.

sl_move_run_value.png

Here, I think, the heat maps show a relatively clear gradient, with sliders that tail away from the hitter the most being the most successful.

Knuckleballs

There are fewer knuckleballs thrown than sliders or curves, but I really wanted to include them. John Walsh wrote the seminal pitchf/x article on the knuckleball. He found that, unlike other pitches, knuckleballs do not have a consistent pattern of movement, but a random horizontal and vertical movement each anywhere from -15 to 15 inches (for Wakefield, at least). The success of an individual knuckleball varies directly with its, seemingly random, amount of movement; batters make less and poorer contact the more movement a knuckleball has. Using the method described above I am able to make one slight addition to Walsh's conclusion.

rv_knuckle.png

Outside of the north-west quadrant we get a confirmation of Walsh's results; there is a lower run value as the break increases. But knuckeballs with positive vertical movement and negative horizontal movement have even higher run values than those with no movement. Thus knuckleballs that break up and in to batters, even if they have a lot of movement, are very unsuccessful. This makes knuckleballs even more random; even if a pitcher can get lots of movement on his knuckleball if it happens to break up and in he could be in trouble.

In a future post I will look at fastball and changeup movement.

F/X VisualizationsApril 17, 2009
What Did We Know This Time Last Year?
By Dave Allen

This early in the season the leader and laggard boards often have some interesting names, and it is fun to theorize which of these are legitimate breakouts (or breakdowns) and which are small sample size flukes. The pitchf/x data adds a powerful tool in helping with this classification. It allows us to look deeper into why a pitcher may have struggled or succeeded in a start. We have already seen some great analysis along these lines. RJ Anderson has a series of posts looking at Lincecum's, Sabathia's and Wheeler's performances thus far based on pitch speed and movement and release point. River Avenue Blues broke down Wang's first two games to see what might be up.

These are good examples of using all the data pitchf/x offers to assess recent performance. Of course what often happens is people just look at fastball speed and ignore movement, location, and release point data. For example after Cole Hamels first poor start everyone focused on his 86 mph fasball, but, as Hamels said himself, he started off with a fastball in the mid-80s early last year too. The image below shows Hamels's average fastball speed by start. The x-axis is not scaled by date, but by start (so no matter how far apart in time two consecutive starts are they are always the same distance apart along the x-axis). The division between seasons in marked with a red line.

hamels_sp_start.png

Hamels's fastball speed is right where it was last year (not to say that we should be worry free about Hamels; last year he pitched 261 innings after just 189 in 2007). This provides a useful way to see if a pitcher's speed is within his normal variation. Consider Wang:

wang_sp_start.png

His fastball in his injury shortened 2008 was 2 mph slower than his fastball in 2007. For his first two starts of 2009 it is in the low range of his already low 2008 numbers. That could mean trouble.

As I noted earlier the best pitchf/x analysis will take into account all the data, but most people will be lazy. Like I just did, they will look at just fastball speed. So I wanted to know how much we could learn only looking at that. More specifically what can we say about performance for the rest of the season looking just at fastball speed thus far into the season. I looked back at last year to find out. Most starters have started two games with about 100 pitches per start, about half of them fastballs. So what can we know with 100 fastballs worth of data?

I started off with the average speed of every pitcher's first 100 fastballs in 2008 and then compared that with his average fastball speed for all of 2007. I wanted to see how well that pitcher performed from that point forward, so I found their FIP from the game after they reached their 100th fastball on in the 2008 season. (FIP stands for fielding independent pitching. Developed by Tangotiger, it roughly gives the expected ERA of a pitcher if he pitched in front of an average defense). From that I subtracted that player's preseason CHONE projected FIP (CHONE is one of the best projection systems. It was created by Sean Smith). The result is how the pitcher performed over the rest of the season relative to his projection. Here are the players with the biggest increase and decrease in fastball speed.

The second column is how much faster (or slower) the player's first 100 2008 fastballs were compared to his 2007 fastballs. A positive number is a faster fastball in 2008. The third is FIP minus projected FIP. Like ERA a low FIP is good, so a negative difference is outperforming the projection.

+---------------------+--------------+----------------+
| Name                | FB speed dif | FIP - proj FIP |
+---------------------+--------------+----------------+
| Ervin Santana	      |         2.28 |          -1.16 |
| Tim Lincecum        |         1.65 |          -0.83 |
| Josh Beckett        |         1.36 |          -0.45 |
| John Maine          |         1.07 |           0.10 |
| Santiago Casilla    |         1.06 |           0.92 |
| Wandy Roriguez      |         0.96 |          -0.84 |
| Manny Delcarmen     |         0.89 |          -0.86 |
| Wilfredo Ledezma    |         0.82 |          -0.05 |
| Shaun Marcum        |         0.79 |          -0.26 |
| Leo Nunez           |         0.77 |           0.05 |
+---------------------+--------------+----------------+
| Francisco Rodriguez |        -2.34 |           0.05 |
| Mike Mussina        |        -2.34 |          -1.37 |
| Daniel Cabrera      |        -2.49 |           0.82 |
| Brad Lidge          |        -2.51 |          -1.10 |
| Jeff Suppan         |        -2.61 |           0.80 |
| Oliver Perez        |        -2.81 |           0.21 |
| Chris Young         |        -3.42 |           0.41 |
| Bob Howry           |        -3.89 |           0.84 |
| Cole Hamels         |        -3.90 |           0.15 |
| Heath Bell          |        -4.01 |           0.30 |
+---------------------+--------------+----------------+

Although there is considerable variation seven of the ten pitchers with the largest increases in fastball speed outperformed their projection and eight of the ten with the largest decrease underperformed their projection. In addition the top two were two of the biggest breakout pitching performances of last year and you could have seen it just 100 fastballs into the season. Of course the trend is not perfect, 100 fastballs into the season Brad Lidge, Mike Mussina, Hamels and Francisco Rodriguez were way below their 2007 averages and they all had great seasons (although Hamels's and Rodriguez's performances were slightly worse than projected). Here are the results for all players.

fip_sp.png

The relationship is very significant ( p < .01), but explains little of the variation (r2= 0.05). The equation for the best fit line is y = -0.24 - 0.15x. Where x is the difference in fastballs speeds (first 100 '08 fastballs minus '07 fastballs) and y is remaining 08 FIP minus projected FIP. So an increase of one mph is worth a 0.15 decrease in FIP (or each decrease of a mph is worth an increase of 0.15 FIP). Also if a pitcher is throwing just as fast in his first 100 fastballs of the season as he was all of last season (x = 0) you expect him to outperform his projection by almost 0.25 runs. If you thought going into the season he was a 4.00 FIP (or ERA) pitcher and his first 100 fastballs are just as fast as his fastballs the year before you would expect him to be a 3.75 FIP (or ERA) pitcher. But there is so much unexplained variation (95% in fact) this pitcher could end up performing very well or very poorly.

So, although the trend is significant, there is so much unexplained variation I would say with just the speed of the first 100 fastballs we don't know that much more than before. But that will not stop me from posting this season's leaders and laggards in fastball speed difference. Some of the pitchers have not reached the 100 fastball cutoff used in the above analysis. Remember someone at the top of the list could end up with very poor performance relative to projection, like Santiago Casilla last year. A pitcher at the bottom could end up like Mussina.

 Greatest difference between 09 fastball speed thus far and 08 fastball speed

+-------------------+--------+--------+
| Name              | Number |    Dif | 
+-------------------+--------+--------+
| Todd Coffey       |     61 |   1.93 |
| Justin Verlander  |    119 |   1.81 |
| Kevin Correia     |    109 |   1.23 |
| Jonathan Sanchez  |     74 |   1.14 | 
| Josh Johnson      |    163 |   1.14 |
| Matt Albers       |     55 |   1.13 |
| Chirs Volstad     |    117 |   1.09 |
| Adam Eaton        |     55 |   1.09 |
| Armando Galarraga |     97 |   0.98 |
| Jason Marquis     |    105 |   0.94 |
+-------------------+--------+--------+
| Geoff Geary       |     63 |  -2.04 |
| Matt Harrison     |     59 |  -2.05 |
| Daniel Cabrera    |    131 |  -2.25 |
| Manny Delcarman   |     68 |  -2.26 |
| Oliver Perez      |    126 |  -2.39 |
| Joe Saunders      |    128 |  -2.44 |
| Daisuke Matsuzaka |     62 |  -2.44 |
| Hideki Okajima    |     55 |  -2.66 |
| Dana Eveland      |     91 |  -2.88 |
| Dennis Sarfate    |     67 |  -3.12 |
+-------------------+--------+--------+

With all the caveats I will still venture that the pitchers at the top of the list, as a whole, out-perform their projections and the pitchers at the bottom under-perform. It will be interesting to see if any of the names on the top of this list turn out to be this season's Tim Lincecum or Ervin Santana.

Sorry this post was a little light on visualizations. I promise my next post will make up for it.

F/X VisualizationsApril 10, 2009
Checking in on Seattle's New Outfield
By Dave Allen

With about half a week's worth of games played I wanted to check in on a major story from the offseason: the increasing importance teams put on defense when acquiring players. We saw some all-hit no-glove guys get much smaller contracts than expected and we saw the Seattle Mariners trade for Franklin Gutierrez and Endy Chavez, two defensive standouts not know for their offense, and promptly make them two thirds of their starting outfield. The outfield hasn't reached its full defensive glory yet because Ichiro is the DL for a couple more days. But the first couple days the Ms still started a pretty good outfield with Gutierrez and Chavez every game and the third spot given to one of Ken Griffey Jr., Wladimir Balentien and Ronny Cedeno.

Their play has already received rave reveiws from Ms fans, so I wanted to see just how good it has been. Small sample size be damned, I thought I would check it out.

Again I am using Peter Jensen's Gameday defense metric as my guide (and his invaluable translation factors as my tool). In this case I took all balls in play at the Metrodome (from 2005 to 2008) and looked at the out percentage (1-BABIP) by location, those are the colors in the image. Over that I plotted all the non-homerun fly balls and line drives that Seattle's outfield saw in their first series, the filled circles are hits and the open outs. Now you can compare how Seattle's outfield did versus the average outfield at the Metrodome. A filled circle in the middle of blue is a hit in a location that most outfields turn into an out, and an open circle in yellow/red is an out which most outfields would let drop in for a hit.

seattle_out.png

The Mariner's outfield looks pretty good. A couple hits in the blue/green region (one of those in right is Griffey's fault) but a ton of outs in the yellow/green region. As a quick check I added up the expected number of outs and compared that to the number the Mariners actually made. There have been 40 balls in play to Seattle's outfield so far and the average outfield makes 21.75 outs. The Mariners made 25 outs. They are 3.25 outs above average just four games into the season (how many over Raul?).

Huge caveats apply here. 1) Jensen's translation factors that let you go from Gameday's pixel to feet sometimes change year to year and I am using the 2008 numbers for the 2009 hits. So the location of the hits could be off by a couple of feet. 2) Gameday records where the ball is fielded not where it lands, which would be more important. 3) This should be in no way viewed as a substitute for or peer of the real fielding metrics. Once they come out you can ignore these results.

F/X VisualizationsApril 07, 2009
Saying Goodbye
By Dave Allen

I know this post is supposed to be about opening day, but there was one more thing I wanted to do before turning my attention to the current season. Peter Jensen's amazing series on using the Gameday data to build a fielding metric prompted me to get that data and play around with it. The first thing I wanted to do was make a run value by hit location map. It seems only right to present such images for the two closed New York parks as a way of saying goodbye before really getting into the new season.

ny_parks

I used Jensen's hit factors to translate gameday's pixel into feet, so the two images should be to scale. The run value should include all hits, outs, foul outs and HRs since 2005.

F/X VisualizationsApril 06, 2009
Does the Umpire Know the Count?
By Dave Allen

In my previous posts I have averaged over all counts, but intuitively and empirically we know that pitchers and batters behave differently in different counts: Joe Sheehan showed that pitch location and batter's swing rates, John Walsh that pitch type frequency and Jonathan Hale that the size of the called strike zone all vary by pitch count. In this post I build on, combine, and present in a visual manner some of these previous results.

Below I reproduce the first panel from my deconstructing the run value map posts, but here separated by count and averaged over pitch types. The heat map is the batter swing rate, the percentage of pitches in a given location the batter swings at. Over that are the 25%, 50% and 75% strike contours for taken pitches. This means taken pitches inside the smallest contour are called strikes over 75% of the time, pitches between the smallest and middle contours are called strikes between 75% and 50% of the time and so on. The strike zone is called differently to RHBs and LHBs, so I restricted this analysis to just RHBs.

strike_0 strike_1 strike_2

Swing Rate

Batters swing more when there are most strikes (going down a column). In favorable counts batters swing slightly more inside, but that tendecy is lost in pitcher's counts. In order to see the trends in swing rate better I averaged over all locations in and out of the strike zone (using the 50% strike contour not the rule book zone).

in_swing
 Swing rate inside the zone
+-----------+---------+---------+---------+---------+
|           | 0 Balls |  1 Ball | 2 Balls | 3 Balls |
+-----------+---------+---------+---------+---------+
| 0 Strikes |   0.405 |   0.587 |  0 .559 |   0.096 |
| 1 Strike  |   0.727 |   0.762 |   0.795 |   0.742 |
| 2 Strikes |   0.850 |   0.880 |   0.898 |   0.927 |
+-----------+---------+---------+---------+---------+
spacer.gif out_swing
 Swing rate outside the zone
+-----------+---------+---------+---------+---------+
|           | 0 Balls |  1 Ball | 2 Balls | 3 Balls |
+-----------+---------+---------+---------+---------+
| 0 Strikes |   0.171 |   0.249 |   0.232 |   0.049 |
| 1 Strike  |   0.330 |   0.350 |   0.385 |   0.325 |
| 2 Strikes |   0.414 |   0.478 |   0.484 |   0.568 |
+-----------+---------+---------+---------+---------+
spacer.gif

There is no uniformly increasing or decreasing swing rate trend with number of balls like there is with number of strikes. Batters swing at roughly the same rate with one and two balls, and less than that when they have zero or three balls. But the size of this effect is quite variable depending on the number of strikes. It is very pronounced with no strikes and quite small with one or two. Interestingly batters swing more in 3&2 counts than in 2&2 counts (or any other count for that matter), which runs counter to the above trend. Intuitively this seems like a mistake on the part of batters and it would be interesting to see if this is case, perhaps taking a game theoretic approach like iamawesomer recently did.

Strike Zone

The size the of strike zone changes dramatically in the way that Hale previously demonstrated. As the number of strikes increases the strike zone shrinks and as the number of balls increases the strike zone expands. One thing we can do here, beyond Hale's original analysis, is see where this expansion and contraction take place. As the number of balls increase the top of the strike zone gets higher and the bottom lower, but the outside and inside edge do not change very much. As the number of strikes increase there is some small movement of the inside edge in, but most of the change is the top moving down and the bottom moving up. So most of the change is a vertical, not horizontal, expansion or contraction of the zone.

In addition this analysis allows us to measure just how big the strike zone is in each count. The measurements below are in square feet. (In the image the strikes count in the opposite direction from the swing rate images.)

zone_area
 Area of the strike zone (sq ft)
+-----------+---------+---------+---------+---------+
|           | 0 Balls |  1 Ball | 2 Balls | 3 Balls |
+-----------+---------+---------+---------+---------+
| 0 Strikes |    3.01 |    3.02 |    3.18 |    3.26 |
| 1 Strike  |    2.46 |    2.59 |    2.71 |    2.74 |
| 2 Strikes |    2.06 |    2.34 |    2.45 |    2.49 |
+-----------+---------+---------+---------+---------+
spacer.gif

There is a substantial change; at its largest the strike zone is over 1.5 times the size of the zone at its smallest. But are these changes statistically significant? I noted in a past post that it seemed different pitch types were called differently, and we know that the frequency of pitch types thrown in different counts is different. So maybe the changes we see are an interaction of these two facts. For example 3-0 pitches are overwhelmingly fastballs, maybe umpires call a larger strike zone for fastballs than other pitches and the differences we see are not driven by count, but by pitch type.

To address this, and the overall significance of the zone size changes, I ran a binomial logistic regression. This is a regression in which the dependant variable only takes two values, in this case 1 if a taken pitch is called a strike and 0 if it is called a ball. The dependant variable is regressed against any number of ordinal and/or categorical variables. I regressed strike/ball against horizontal distance from middle of zone (in inches), vertical distance from middle of zone, the interaction of these two distances, length of pitch break (in inches), the number of strikes, the number of balls and the pitch type (the analysis uses fastballs as the baseline and compares the other pitches to them). I used x distance, y distance and x by y interaction rather than just distance so the strike zone isn't forced to be a circle.

 Binomial Logistic Regression
+-----------------+----------+------------+---------+------------+
|                 | Estimate | Std. Error | z Value |    P(>|z|) |
+-----------------+----------+------------+---------+------------+
| (Intercept)     |    7.887 |      0.050 |  157.72 |  < 2e-16 * |
| x dist.         |   -0.570 |      0.003 | -163.49 |  < 2e-16 * |
| y dist.         |   -0.693 |      0.004 | -173.08 |  < 2e-16 * |
| x*y Interaction |    0.029 |      0.000 |  111.84 |  < 2e-16 * |
| Break           |    0.027 |      0.005 |    5.51 |  3.6e-08 * |
| Num. Strikes    |   -0.575 |      0.013 |  -44.91 |  < 2e-16 * |
| Num. Balls      |    0.213 |      0.010 |   21.76 |  < 2e-16 * |
| Changeups       |    0.012 |      0.039 |    0.31 |     0.76   |
| Curves          |    0.037 |      0.049 |    0.77 |     0.44   |
| Sliders         |   -0.038 |      0.026 |   -1.43 |     0.15   |
+-----------------+----------+------------+---------+------------+

So the effect of count is indeed significant. In fact, all else equal, each strike in the count decreases the likelihood of a pitch being called a strike the same amount as a pitch being one inch further away from the center of the zone (roughly equal estimates). The number of balls is also significant but the effect is less than half of that of the number of strikes (you can see in the image of strike zone area above, area decreases more as you increase strikes than it increases as you increase balls). The length of break is also significant, pitches with lots of break are slightly more likely to be called a strike. Once we control for break and count there is no significant difference in how the strike zone is called to different pitch types.

MLB is still interested in monitoring umpire performance and this year will replace QuesTec with a new Zone Evaluation system (which it seems is just the pitchf/x system). So I am sure MLB is aware, or will be aware soon, of the variable zone size based on count. I wonder if it is something they will try to change or if it is appreciated as being part of the fabric of the game.

F/X VisualizationsMarch 30, 2009
Deconstructing the Non-Fastball Run Maps
By Dave Allen

In this post I continue, and finish, my series deconstructing the pitch specific run value maps that I first presented here. In the first entry I broke down the different events that contributed to the run value maps for fastballs, here I will do the same for the remaining three pitches I looked at: curveballs, changups and sliders.

Recall, from the fastball post, the methodology I use:

The run value of a pitch is determined by the outcome of four events.

  1. If the batter swings at the pitch or not.
  2. If no to 1, whether the taken pitch is called a ball or a strike.
  3. If yes to 1, whether the batter makes contact.
  4. If yes to 3, the run value of that contact.

Below I present a series of three images for each handedness combination that show how the outcomes of these four events vary by location for fastballs. Reading left to right:

  • The first image addresses events 1 and 2. The heat map is the swing percentage by location to address 1. On top of that are three contour lines where 75%, 50% and 25% of taken pitches were called strikes to address 2. So if a batter took a pitch inside the smallest circle it was called a strike over 75% of the time. If he took a pitch in doughnut between the smallest and middle circles it was called a strike between 75% and 50% of the time, and so on.
  • The second image addresses 3 showing the contact percentage of pitches swung at.
  • The final image addresses 4 showing the run value of a contacted pitch (including foul balls).
At the top of each image is the average value over all locations.

Since there are fewer curveballs, changeups and sliders than fastballs I smoothed and regressed the data more to make the images below. Thus they are not as finely resolved as the fastball images, but, I think, still convey the patterns well.

For each pitch I first present the original run value map. Recall the number at the top of each image is the percentage of time that pitch type is thrown in those at-bats.

Curveballs are thrown roughly equally in the different handedness combinations and have a large area of negative to zero run valued pitches below the strike zone.

cu_rr

Batters swing less at curveballs than fastballs, and the swing map is much less coincident with the strike zone for curveballs than fastballs. So batters are taking more curveballs for strikes and swinging at more curveballs out of the zone compared to fastballs. In addition, batters whiff more against curveballs than fastballs. But when they do make contact the run value is positive compared to negative run-valued contact versus fastballs.

Batters tend to swing more at curveballs down and slightly away, but make contact at a higher rate and better contact at curveballs up and in. Most likely this is a result of the down and away break of curveballs. Pitches that do break (or break a lot) end up down and away, and batters miss them or make poor contact. Pitches that don't break (or not enough) end up up and in, and batters rarely miss and make good contact.

Another interesting aspect of these images is how the strike zone is called for curveballs. The top, bottom and away edges are called in the same manner as fastballs are to RHBs, but the inside edge seems different. Recall that fastballs were called correctly along the inside edge, but curveballs are called considerable away (the 25% strike contour is inside the rule book edge). So umpires are calling inside fastballs strikes against RHBs, but not inside curveballs. I am not sure if this is a statistically significant difference, but I will look at that in a future post.

cu_lr

As expected RHBs make more and better contact against curveballs from LHPs than curveballs from RHPs. The orientation of the contact percentage gradient has shifted and is now high up and away to low down and in. This is a result of LHPs' curveballs breaking in to RHBs.

cu_rl

The swing percentage and contact rates are similar to RHBvLHP, but the run value of contacted pitch is, strangely, much lower. The orientation of the contact percentage gradient is the same as the one we saw in RHBvRHP.

cu_ll

Like for fastballs lefties facing lefties have the lowest contact rate by a large margin. But surprisingly the run value of contacted pitches is highest here, which was not the case for fastballs.

The orientation of the contact percentage gradient here looks like that seen in RHBvLHP not like the one seen in LHBvRHP. With fastballs the contact percentage and run value location patterns were determine by the hitters (RHBvRHP was more similar to RHBvLHP than to LHBvRHP) but with curveballs it is the pitchers handedness that determines the pattern (RHBvRHP is more similar to LHBvRHP than to RHBvLHP). It seems that the break of the pitch (determined by the handedness of the pitcher) is more important in determining these patterns than the inside/outside preference of the batter, which drove the fastball patterns.

Now we turn our attention to changeups. Here are the overall run value maps.

Changeups are thrown mostly in at-bats when the pitcher and batter have opposite handedness. So I will only present and comment on those images. But you can see the rightie/rightie one here and leftie/leftie one here.

ch_lr ch_rl

Batters swing at changeups more than either fastballs or curveballs, and the swing percentage map is more coincident with the strike zone contours for changeups than for fastballs and curveballs. Meaning batters take fewer changeups for strikes and swing at fewer changeups out of the zone than for the previous pitch types. The highest swing percentage is slightly away and down, rather than up and in for fastballs.

Although batters swing at a lot of changeups and swing at the right pitches (in terms of the strike zone), they whiff on changeups at a relatively high rate. The highest contact rate and run value of contact are both up and in. Contacted pitches have a very slightly negative run value.

The strike zone to RHBs is called away on both the inside and outside edges and high on both the bottom and top edges. To lefties it is called away just on the outside edge and high just on the bottom edge. Again I am not sure these are statistically significant differences.

Finally, looking at sliders, here are the overall run value maps.

Sliders are thrown mostly in at-bats when the pitcher and batter have same handedness. So I will only present and comment on those images. But you can see the other ones here and here.

sl_rr sl_ll

In same handed at-bats sliders are just nasty pitches. Batters swing at sliders slightly more often than fastballs (less than changeups and more than curves). But they are swinging at the wrong pitches, as the swing percentage map is considerably off from the strike zone (almost as bad as with curveballs). The whiff rate on sliders is enormous, considerably higher than any other pitch type. There is only a small part of the zone middle-in with a contact rate of over 85%. And then, even when batters make contact, the result has a negative run value.

.

Wrapping Up

We are now in a position to make some broad statements about what make the different pitch types successful.
  • Fastballs: With the exception of those directly above the strike zone, batters tend to swing at fastballs in the zone and take those out. They also whiff on fastballs at the lowest rate of any pitch. But contacted fastballs have very negative run values, the lowest of all pitches.
  • Curveballs: Batters routinely take curveballs in the strike zone and swing at a high rate at curveballs below the strike zone. They whiff at a moderate rate. But when they make contact the run value is positive and higher than for all other pitches.
  • Changeups: Batters tend to swing at changeups in the zone and take those out of the zone. But batters whiff against changeups at a moderate rate and contacted changeups have slightly negative run values.
  • Sliders seem to have the best aspects of each pitch: the swing rate map is only slightly more coincident with the strike zone than that for curveballs, the whiff rate is higher than any other pitch, and contacted sliders have a negative run value (although not as low as contacted fastballs).

Below I present the overall run value per pitch separated by pitch type in a chart and figure. In the figure I indicate the standard errors.

 Run value per pitch
+------------+------------+------------+------------+------------+
| B/P hand   |  Fastballs | Curveballs |  Changeups |    Sliders |
+------------+------------+------------+------------+------------+
| RHB/RHP    |    -0.0032 |    -0.0009 |     0.0014 |    -0.0057 |
| RHB/LHP    |     0.0030 |     0.0031 |     0.0011 |     0.0056 |
| LHB/RHP    |     0.0034 |    -0.0008 |     0.0012 |     0.0013 |
| LHB/LHP    |    -0.0035 |     0.0005 |     0.0003 |    -0.0092 |
+------------+------------+------------+------------+------------+
error_bar.png

Fastballs and sliders show a statistically significant platoon split: there is a significantly lower run value outcome when the pitcher and batter have same handedness than when they have different. This makes sense with usage patterns for sliders, which are pitched more in at-bats when the batter and pitcher have the same handedness. You can also see here just how nasty sliders are to same handed batters, significantly lower than any other pitch.

Curveballs are interesting, there is no significant platoon split and there is a trend (although not significant) for curveballs from LHPs to have higher run value outcomes than curveballs from RHPs. This is strange as lefties throw curveballs more often than righties.

Changeups show no statistically significant platoon split. Which, again, is in line with what we expect based on their usage pattern. They are mostly thrown in opposite handed at-bats when fastballs or sliders would have a relatively higher run value.

This analysis has some serious limitations. I am using the MLB pitch classifications, which are far from perfect. There has been some work on developing better classification algorithms and I hope to incorporate one such algorithm in my future analysis. The pitches in this analysis are averaged over all pitch speeds and breaks, which is a major limitation. Just recently Dan Turkenkopf looked at how pitch speed impacted at-bat outcomes, and it would be interesting to see how pitch speed affects at-bat outcomes for each pitch type separately. Finally I average over all pitch counts. My next post will begin to address this last concern.

F/X VisualizationsMarch 23, 2009
Deconstructing the Fastball Run Value Map
By Dave Allen

In a previous post I presented a map showing the run value of a fastball based on its location. In this post I will examine that map in more depth. Consider the two locations, A and B, in the figure below.

immarked1

These locations have about the same run value, just below 0, but for different reasons. Taken pitches at location A are called strikes while taken pitches at location B are balls. In order for the two locations to have the same run value pitches swung at in location A must have, on average, higher run value outcomes than pitches swung at in B. Not brain-surgery so far, swinging at fastballs down the middle is better than swinging at fastballs a foot above the strike zone. We could try to intuitively guess at explaining the rest of the above pattern in a similar manner, but why try when we have the data to properly explain it. I will present that data in this post.

The run value of a pitch is determined by the outcome of four events.

  1. If the batter swings at the pitch or not.
  2. If no to 1, whether the taken pitch is called a ball or a strike.
  3. If yes to 1, whether the batter makes contact.
  4. If yes to 3, the run value of that contact.

Below I present a series of three images for each handedness combination that show how the outcomes of these four events vary by location for fastballs. Reading left to right:

  • The first image addresses events 1 and 2. The heat map is the swing percentage by location to address 1. On top of that are three contour lines where 75%, 50% and 25% of taken pitches were called strikes to address 2. So if a batter took a pitch inside the smallest circle it was called a strike over 75% of the time. If he took a pitch in doughnut between the smallest and middle circles it was called a strike between 75% and 50% of the time, and so on.
  • The second image addresses 3 showing the contact percentage of pitches swung at.
  • The final image addresses 4 showing the run value of a contacted pitch (including foul balls).
At the top of each image is the average value over all locations.

fa_rr

There is a lot going on in this series of images, and they might be intimidating at first. My suggestion is to focus on the leftmost image, spend sometime looking at it and once you understand it move on to the next. Do the same with the middle before moving on to the rightmost one.

With these images we can better explain the pattern in the overall fastball run value map. Consider location B in the first graph, the area of slightly negative run valued fastballs above the strike zone. Batters swing at pitches in this location over 50% of the time, make contact only around 70% of the time and the result of that contact is negatively valued. So the swung at pitches will have a quite low negative run value. The taken pitches are almost all called balls (this location is outside the largest strike contour) which have a very high positive run value. The result is the slightly negative value we see in the first image. Similar explanations can be made for any part of the run value map.

The region of highest swing percentage overlaps with the regions of highest contact percentage and run value of contacted pitches, and the 75% called strike contour, but is not entirely coincident with any of these. This means that hitters are not making entirely optimal swing decisions based on their ability to make contact, the value of that contact or how the strike zone is called.1

Contact percentage and run value of contacted pitches both reach their maximum slightly down and in from the center of the zone. But the overall regions of high contact percentage and run value of contacted pitches are not exactly the same. The region of high contact percentage is a diagonal swath from the top-in corner of the zone to the middle of the bottom of the zone. The region of high run value of contacted pitches is a diagonal swath from the bottom-in corner of the zone to the middle of the top of the zone.

Another interesting result is how the called strike zone compares to the rulebook strike zone. The inside and the top of the zone are called fairly well (the 50% contour runs along the rulebook zone on these edges), but the outside edge is shifted away a couple inches (the 75% contour runs along the rulebook zone's outside edge) and the bottom of the zone is shifted significantly up (the 25% contour is ABOVE the bottom edge). In addition, the strike zone is rounded rather than rectangular. These results are not new. John Walsh, David Pinto and Jonathan Hale have each shown all or some of these before, but it is nice to see that my analysis reproduces their results.

fa_lr

For the most part these are quite similar to the righty/righty images. One interesting thing we can address with these images is why RHBs do better against LHPs than RHPs. First, compare the location of the highest swing percentage relative to the strike contours in the RHB vs LHP and RHB vs RHP images. In the RHB vs LHP it is much more coincident along the horizontal axis, although it is still too high along the vertical axis . That means RHBs are swinging at more pitches in the called strike zone and taking more pitches outside the called strike zone against lefties than righties, which begins to explain their success. In addition, RHBs have a higher contact percentage and higher run value on contacted pitches versus LHPs compared to RHPs. So righties are better at each component of the at-bat against LHPs than RHPs.

fa_rl

These are almost mirror images of RHB vs LHP above and the overall averages are very close. It is interesting to see how the strike zone is called differently to LHBs. The top is called well and the bottom is called very high just like to RHBs. The outside edge is shifted away as it is to RHBs, but that shift is larger with the 75% contour extending outside of the rulebook zone. The inside of the zone is also shifted outside a couple inches (the 25% contour runs along the rulebook edge), which was not the case to RHBs. Walsh and Pinto also observed these results.

fa_ll

While LHBs' success against RHPs is very similar to RHBs' success against LHPs, LHBs fare much worse against LHPs than RHBs do against RHPs. Lefties swing at even more pitches outside the called zone, take more pitches inside the zone and make less and poorer contact against LHPs than RHBs do against RHPs.

Overall I was very surprised to see that in every case the average run value of a contacted fastball is negative. This is probably because I included foul balls in this group, but it is still surprising.

With these images one can understand the fastball run value maps in this post. Now if you go back, look at these maps and see something surprising, you can use the images presented here to understand what is going.

In future posts I will present similar images for the other pitch types.



1. Brian Cartwright made the following comment in this post:
One idea I never followed thru on is first identify hr% by location (and pitch type and count), as you have done here, then for each hitter (his favorite zones and pitches to go deep) then finally see how well each player recognizes the mashable pitches - what are the swing% for batters when they see a pitch in the best hitting zone? My opinion is that Barry Bonds and Brain Giles hit a high pct of homers because of superior pitch recognition, and putting the bat on the ball when they swung, not because of hitting the ball an extra-ordinary distance.
This suggests an interesting way of evaluating batters: how well does their swing percentage map coincide with their home run rate map, contact percentage map or run value of contacted pitches map. It would be interesting to see if Giles' region of highest swing percentage is more inline with his region of highest run value than the average hitter, presented above.

F/X VisualizationsMarch 17, 2009
Home Run Rate by Pitch Location
By Dave Allen

So far I have looked at the run value of a pitch based on its location as it passes the batter's plane. Today I am going to take a slightly prosaic break from that and look at everyone's favorite contributor to run value: the home run. Below are maps of HR rate per pitch by pitch location. Again I average over pitch type, count and speed, so there are some obvious limitations to the analysis. The number presented at the top of each figure is the average HR rate per pitch.

These figures confirm a number of assumptions:

  • The highest home run rate is slightly in from the center of the strike zone.
  • The extreme inside of the strike zone has a higher home run rate than the extreme outside.
  • The home run rate is higher above the strike zone than it is below.
  • The home run rate location is determined by the handedness of the batter and not the pitcher (the images are more similar going across a row than they are going down a column).
There are a couple of things that I found surprising.
  • There is a considerable area down-and-away within the strike zone that has a near-zero home run rate.
  • There is a relatively large region in which the HR rate per pitch is over 2.5%, which seems high to me. For pitchers, this reinforces the importance of being able to locate a pitch in a corner of the zone.
As stated above this analysis is limited by the fact that it averages over all pitch types. It would be interesting to see, for example, how the home run rate map differed for fast balls and curve balls. I hope to address this in a future post. Until then the current analysis allows for comparison between a individual hitter's home run map and the composite map.

Since the batter's handedness is more important than the pitcher's I averaged across the rows above to create just two maps, one for RHBs and one for LHBs. Over the composite map I plotted all the home runs for an individual hitter to see how he compares to his peers. Here are the HRs of everyone's favorite HR hitter, Jack Cust, plotted over the composite LHB map. Cust's home runs are, for the most part, where you expect for a left-handed batter: the highest density slightly in from the center of the zone, none in the down and away corner of the zone and more above the zone than below.





I made such images for a number of last year's top HR hitters and most resemble Cust's with the given player's HRs largely mapping to the regions of high home run rate in the composite map. But a handful of batters had quite different maps. Carlos Quentin's HRs are overwhelmingly away and down in the zone, and a large portion of the inside of the strike zone, where the average right handed batter has a high HR rate, is completely devoid of HRs. Since this is aggregated for all pitch types our insight is limited here. It will be interesting to see if players with HR maps very different from the composite map tend to also have a skewed distribution of which pitch types their HRs come from compared to average.

Here are two other batters I thought were particularly interesting. Alfonso Soriano is almost a caricature of a right-handed batter with his highest HR rate region even more down and in than expected. Carlos Pena, on the other hand, mashes outside pitching and the inside half of the zone has surprisingly few HRs. A possible explanation for this pattern could be that Pena just gets very few inside pitches because pitchers know he is a dangerous HR hitter. This shows one problem with my analysis. I am comparing the composite HR rate to a player's raw HRs not adjusted for the number of pitches a player sees in that region. I should be comparing that player's HR rate to the composite rate. For two reasons I did not do this: (1) I am having a hard time creating rate maps for individual players based on so few HRs and (2) even if I had such a map I cannot think of an effective way to overlay the two rate maps (individual player and composite) as nicely as I can overlay the actual HRs on the composite rate map. But it is something I am going to think about and work on in the future.

Oh and I have to assume the home run in Pena's map around (2,4.5) is a mistake.

F/X VisualizationsMarch 16, 2009
Run Value by Pitch Type and Location
By Dave Allen

In my first post, I noted Tango and Lichtman's comment that run value by pitch location analysis was limited when averaged across pitch types and pitch counts. In this post, I will address the first concern by looking at the run value by pitch location of the different pitch types separately (but again averaging across count).

I split the data by handedness of the batter and the pitcher and then split this information into four different pitch types (based on the pitch fx classification). As in the first post, all images are from the catcher's perspective so that a right-handed batter stands to the left of the strike zone and a left-handed batter stands to the right of the strike zone. At the top of each image is the proportion of pitches between the given handedness combination made up of the given pitch type (out of the four pitch types considered). Counting just these four pitch types, 60.9% of pitches from a right-handed pitcher to a right-handed batter are fast balls.

Of the pitches considered, fast balls made up over 60% of pitches in each handedness combination. Thus, the overall run value maps in the first post are largely reflecting the run values for fast balls. But there are some small differences:

  • In the overall maps, there was no region inside the strike zone with the deep blue >.04 run value. But, for fast balls, a bottom corner in each image has >.04 run value. I wonder if fast balls in this region of the strike zone are less likely to be called as strikes than other pitches.
  • The region of negative to neutral run valued pitches directly above the center of the zone is even more pronounced for fastballs. The region of deep red <-.04 run valued pitches above the top of the strike zone is larger than the corresponding region in the overall map.
  • The region of negative to neutral run valued pitches below the zone is much smaller than in the overall map and extends below just one side of the zone. The side to which it extends is determined by the pitcher's handedness not the batter's. In the overall map, this region extended below the entire strike zone not just one side.
  • Fast balls are thrown in roughly the same proportion in all handedness combinations.

Changeups are overwhelmingly thrown when the pitcher is of the opposite handedness of the batter. Additionally, the few times when changeups are thrown when the pitcher and batter have the same handedness may be a highly non-random sample: pitchers with outstanding changeups and good pitcher's counts (this is just speculation). Because of this and the small data size we should not read too much into the same-handedness changeup maps.
  • In opposite handedness at-bats the changeup has a large region of negative to neutral run valued pitches low and away extending far outside the strike zone.


Curves are thrown in relatively constant proportion in all handedness combinations, expect for leftie/leftie where they are thrown a little bit more.
  • Compared to overall, the negative to neutral region for curves is much larger extending down and away predominately.
  • With fewer curves thrown, it is hard to get as good resolution, but it seems that compared to other pitches there is less discernible structure within the strike zone (i.e. there are not as clear large regions of very low run value separated by large regions of larger run value).

Sliders are thrown more when the batter and pitcher have the same handedness (the opposite of changeups), thus the same caveats apply to reading too much into the opposite-handedness maps.

  • A very large region of negative to neutral pitches extends below and away out of the strike zone.
  • Sliders up and in have a higher run value compared to overall pitches up and in.

These separated by pitch type maps allow us to make some additional insights into the overall maps in the first post. The negative to neutral region above the strike zone is mostly the result of fastballs, while the negative to neutral region below the strike zone is mostly the result of non-fastball pitches. Within the strike zone, most pitches have the same overall structure with the center of the zone and down and in having the highest run value, although the pattern is not quite as apparent with curveballs.

F/X VisualizationsMarch 16, 2009
Run Value by Pitch Location
By Dave Allen

[Editor's note: Dave Allen has agreed to join Baseball Analysts. He is a graduate student whose research involves analysis of spatial data and spatially explicit modeling. He also loves baseball. Dave will combine these two interests in the F/X Visualizations series.]

A lot of interesting new sabremeteric work has become possible over the past two years with the availability of the pitch fx data. In this new blog entry, I will continue this analysis and present the results in a simple, yet hopefully effective, visual manner.

This first post builds on work that Joe Sheehan did a year ago looking at the run value of each pitch based on its location. He placed each pitch into one of 25 bins and calculated the average run value in each bin. In the post he suggested that it would be interesting to get rid of the bins and take a continuous approach. A year later, it seems no one has accomplished that so I thought it would be a good way to launch my work.

Using the first table in this post, I assigned a run value to every pitch in the pitch fx database, not just pitches that ended an at-bat, and then averaged the run value of all the pitches in each location. I split the data up by handedness of the pitcher and batter. The number in parentheses is the average run value for all pitches regardless of location. The images are from the catcher's perspective so that a right-handed batter stands to the left of the strike zone and a left-handed batter stands to the right of the strike zone.

alt=""

This method reproduces some of Sheehan's results:

  • Pitches outside the strike zone have a higher run value than those inside the strike zone.
  • Pitches down the middle of the zone have the highest run value of pitches in the strike zone.
  • Inside pitches have higher run values than outside pitches.
  • Pitches down and in have higher run values than those that are up and in.

This continuous approach also gives some additional insights beyond Sheehan's:
  • Of outside pitches, those high in the zone have a slightly higher run value than those down in the zone. This is interesting as it seems hitters prefer inside pitches down in the zone and outside pitches up in the zone.
  • The area of negative to zero to just slightly positive run value pitches (the red, yellow and green colored area) extends well beyond the defined strike zone.
  • This zone of negative to zero valued pitches extends far above the strike zone peaking at x=0 over a foot above the top of the strike zone.

Tango and Lichtman made some important comments on the limitations of Sheehan's original work without splitting the data by swing/taken or pitch type. These critiques apply equally, if not more so, here because I did not split the data by count as Sheehan did.

I hope to address these points in future posts. For example, I assume the peak of negative to zero valued pitches a foot above the center of the zone is mostly the result of 'high heat' fastballs in pitcher's counts. By analyzing the run value of pitch locations for just fast balls in specific counts, I will be able to confirm or deny this assumption.