Finish Him
I've been looking at the run values of different pitch locations for the last couple of weeks and today I wanted to examine the frequency that pitches are thrown to a particular location. The frequency a pitch is thrown plays a huge part in it's effectiveness, and I believe the frequency it is thrown to a certain location is a further refinement on looking at just regular frequency. I found some interesting regarding the success against fastballs in certain areas last week and thought that maybe looking at the frequency could help clarify some of those findings. In order to examine the locational frequencies I created density plots that show how often a pitch is thrown in a certain area. The dots on the plot are individual pitches and are colored based on the local frequency. The color scale follows the standard convention of a density plot, with "hotter" colors representing areas where events are more frequent. Another thing to keep in mind when looking at these graphs is that the scales are relative for each situation. This isn't ideal, because you can't easily compare frequencies across situations, but it works fine for each situational graph individually. Starting in an 0&0 count, lets see how pitchers start right-handed hitters off. The four graphs below show the frequency that fastballs, changeups, sliders and curveballs are thrown in that situation.
Again, you can't directly compare the scales from graph to graph, but you can get a good idea of where the different types of pitches are thrown. One thing that was somewhat interesting, especially after looking at these graphs, was the frequency that pitchers worked inside to RHH. 0&0 is a neutral count, so the pitcher has some choice with where he throws a pitch, but whats interesting is how the locations for different pitches in an 0&0 count compare to the locations for the same pitches in an 0&2 count.
This is pretty neat. The locations are pretty much what we would expect, with more pitches being thrown out of the zone and at the corners than before. You can see that pitchers do go up in the zone with 0&2 fastballs and that 0&2 breaking balls are thrown down and out of the zone. There is a ton more to learn from these graphs and similar pictures, however, I'm not going to be the person who does the majority of that discovering, at least not online. I've taken an internship with an MLB team and this is my last article for Baseball Analysts. Sure the pay is low and the hours are long, but for a 23 year old baseball fiend, there's no cooler feeling than going to work at the ballpark everyday. Working in professional baseball is what I want to do. I'm deeply indebted to Rich for giving me the opportunity and space to write these articles on the pitch f/x system and I'm also in debt to the readers who forced me to be at (or near) the top of my game when I was writing articles. Writing for for Baseball Analysts has been a fantastic experience and I'm going to miss it, but I'm moving on and couldn't be happier with what the future holds. To quote The Boss, "good luck goodbye" (and thanks).
Tidying Up
I had some comments/requests for additional context about the charts I showed last week and other aspects of my linear weights articles, so I wanted to present those and clear up some confusion about the charts from last week. Among others, Richard Aronson commented here last week about my statement that left-handed hitters liked the ball down and in, but mentioned that the linear weights in those areas were still negative. He suggested that I break up the charts by balls in play and balls not in play and see if the statement still held true. The chart below shows how left-handed hitters fared against all pitch types in any count, but only when they swung at the pitch. The chart shows that pitches in the middle of the strike-zone, both horizontally and vertically, benefit the hitter, while pitches on the corners, especially the lower ones, favor the pitcher. In addition to only looking at swings, this chart differs from the one I presented last week in that it looks at all pitch types, not just fast balls. Maybe left-handed hitters are able to hit down and in fastballs very well. We can test that and... crap. They still can't hit pitches in that location very well, and its interesting to see that they are able to hit fastballs on the outside half of the plate much better than they can hit fastballs on the inside. Generally inside fastballs are thought of as places where a pitcher can get hurt, while outside fastballs are encouraged. One reason left-handed batters are able to hit outside fastballs better than inside fastballs could be because of the extra fraction of a second an outside pitch affords the batter. An outside pitch is hit slightly after it crosses the plate, and giving the batter an extra 'beat' to track the ball. In order to be driven, inside fastballs need to be hit in front of the plate, and the batter has slightly less time to react. This probably isn't a meaningful reason for the inside/outside difference, but with a fastball, the extra split-second could help the hitter. The chart below is shows the run value for fastballs that are put in play by right-handed hitters. Looking at all pitch types, right-handed hitters actually hit all down and in pitches very well. I also wanted to quickly go over the way I calculate the run value for each pitch. I take every event that resulted from a pitch being thrown and assigned it a weight, based on the count it occurred in. Different events are worth more in different counts, and for an extreme example, a 3&0 strike isn't worth as much to the pitcher as a strike thrown in an 0&2 count. By the same logic, any base hit in an 0&2 count hurts the pitcher more than the same hit would have in a 3&0 count. The process and weights are explained a little more in depth here. There are some loose ends that I need to tidy up, such as if called strikes and swinging strikes should be weighed the same (currently I weigh all strikes, including fouls with less than two strikes, the same amount), and what to do with pitches that result in a steal or caught stealing (currently I'm ignoring this, but a pitcher is partially responsible for the running game, so his pitches should get some penalty/benefit if the runner steals or is caught stealing.)
Locational Run Values
In the last couple of weeks there have been several great articles written about the run value of different pitches. These articles have explored how much every pitch in baseball is worth on a per-pitch basis, and while some of the math behind the scenes might be slightly different from article to article, the general idea is the same. You need to find out how much every event is worth in a given environment (based on the count, pitcher, stadium, or any other type of environment you're working with), and then multiply those weights by the number of events caused by a given pitch to find the total number of runs above average that the pitch saved. One thing that none of these articles have discussed is exactly how location impacts the value of a pitch. Clearly the location of a pitch matters in determining it's value, but how big is the impact? I split up the strike zone (and the surrounding area) into bins, and in each bin, I found the number of runs above average that were saved per pitch thrown to that area. Below is a chart showing the value of different regions for right-handed pitchers throwing fastballs against left-handed hitters. My calculations are based on the hitter's perspective, so negative values are saving runs compared to an "average location" and are good for the pitcher, while positive ones are the opposite. The most obvious thing I noticed on the graph is the value of the strike zone. Eight of the nine regions prevent runs from being scored compared to an average location, which initially seems high. This actually makes sense though, if you think about how often batters get out and the fact that when a batter doesn't swing at a pitch in the strike zone, it always puts him in a less advantageous position to hit from. In this chart, which is from the pitcher's perspective, you can see regions where, as a group, left-handed hitters are more vulnerable to a right-handed pitcher's fastball. The idea that left-handed hitters like the ball low and inside seems to be backed up a little bit, as the bins in that region of the strike zone have a higher value than the rest of the zone. Using rigid bins isn't the best method for looking at the strike zone because you run into problems with deciding where to put the edges of bins, and a continuous approach is probably the ideal way to do this in the future. Even with this limitation, what else can we learn from this chart? One thing to notice is that left-handed batters are either swinging at pitches low and outside, or umpires are calling this pitch a strike against lefties. Either way, it appears to be an area that pitchers can possibly exploit. Looking at all fastballs thrown by a pitcher-batter grouping is interesting, but exploring how the count and location impact an at-bat is more interesting. The chart below has the same group of batters and pitchers, but is now showing the linear weights per pitch of each section in an 0&2 count (this includes all pitches, not just fastballs). When reading this chart, you need to remember that the weights used to calculate the value of each region are based on an 0&2 count. The middle region being .154 runs means that compared to an "average" location on an 0&2 count, that area allows .154 runs per pitch more. This isn't saying that overall, a pitch down the middle is worth .154 more runs than an average pitch, just on an 0&2 count. With this in mind, the chart makes a ton of sense. You can see the expansion of the strike zone, as virtually all the regions around the strike zone now allow fewer runs than average. The increased ability for a pitcher to work outside the strike zone makes any miss into the strike zone hurt that much more. Using the same logic that a hit in a 0&2 count hurts the pitcher more than giving up the same hit in an 0&0 count, throwing a pitch right down the middle in an 0&2 count is a worse idea than doing the same thing in an 0&0 count. The idea is reversed on a 3&0 pitch, which is plotted below. A pitch outside the strike zone is now a tremendous advantage for the hitter, so the pitcher is forced to throw a strike. Somewhat counter intuitively, even though hitters "know" a strike is coming, pitches thrown in the strike zone in 3&0 counts still favor pitchers. This just speaks to how hard hitting actually is. One other point I wanted to mention is the magnitudes of the impact of location. Using 50 pitches to a type of batter as a rough cutoff point, I found that the best and worst pitches range from roughly -.07 runs/pitch for the best to .07 runs/pitch for the worst. The spread between the best and worse locations varies, and depends on the count, but it can be as large as almost 1 run/pitch. Obviously this will have a huge impact on the value of a pitch, and potentially could negate any value a pitch has. You could have the best pitch in baseball, but if you can't locate it very well, it won't do you any good. Creating these plots for every pitcher could give a good indication of how much location actually helps and hurts a pitcher, depending on the situation.
More Run Values
In the time I've been looking at the pitch f/x data I've occasionally stumbled onto something I thought was so interesting and so cool that I couldn't wait to share it with someone. The run value of different pitches is one of these things and whatever enjoyment you've gained from reading and discussing these articles, you can probably double it for me. The research I did for last week's article was some of the most interesting work I've done with the pitch f/x data, and without any more introduction, here's this week's article. In the comments on last week's article and elsewhere, there were some questions about the methods I employed for calculating the run value of each pitch. There were some suggestions made and while I'm not here to talk about the past and explain how I made the calculations last week, in the interest of transparency, here's what I did this week and will be doing in the future. Starting with the wOBA for every ball-strike count, I subtracted the league average wOBA (.332) from each count to determine how much above or below average each count was for wOBA. Using those wOBA values, I then determined how many runs were added in every count if the pitcher threw a ball or strike. This is the same process I used last week, but now instead of averaging the run values of a ball and strike, this time I kept the data separate, so that a strike thrown in an 0&2 count has a different value than a strike thrown in an 0&1 count. I repeated the same process for balls in play as well, which is something I didn't do last week, and kept them separated by count as well. This way, if the batter is up 2&0, but grounds out, the pitch that created the groundout gets more credit than if he had grounded out in an 0&0 count. When I was done this process I had the value of almost anything that could happen to a pitch after it left the pitcher's hand, and if you're interested, a table with the data is presented below.
Once I knew the values of events by count, I just counted the number of events that each pitch created and multiplied them by their value to get the overall value of the pitch. One huge benefit to finding the value of pitches using this 'by count' method is that it automatically accounts for the usage of every pitch. One thing I neglected to include in the article last week was any information about global averages. There's no such thing as an overall 'average' pitch, but I found the averages for all the different subgroups of pitches I had. Now, when comparing pitches, there's a handy reference for what an average pitch thrown by a certain type of pitcher to a certain type of hitter is worth. The table below has identifying information about the pitch, the frequency that the given group of pitchers threw it to the given group of batters, and the average run value for each type of pitch. The way to read the first line of the table is that of all pitches thrown to LHH by LHP, 14% were curveballs. A LHP to LHH curveball prevents .0117 runs more than an 'average' pitch, and given 100 pitches from a LHP to a LHH, distributed via the frequencies for his pitches, the curveball would prevent .20 runs more than an average pitch.
Not surprisingly, a curveball thrown by a LHP to a LHH has the saves the most runs compared to an average pitch. However, when examining However, without knowing how often Zito actually throws curveballs to left-handed hitters, it's impossible to get a feel for how effective the pitch truly is. It could be a really nasty pitch, but if part of the effectiveness is due to the infrequency that it's thrown, it won't be a great deal of help to the pitcher in preventing runs overall. The Per 100 field incorporates the pitcher's usage of every pitch to gauge how good the pitch is at preventing runs. To calculate this value, I multiplied the frequency a pitch was thrown by it's average value. Multiplying that number by a constant, in this case 100, gives the total number of runs the pitch would have saved compared to an average pitch of that type, for 100 pitches split up by the pitcher's normal pitch selection. I used 100 as the constant to have some internal consistency with Rich's work on strikeouts/100 pitches. 100 is fairly easy to calculate in your head too. Last week I mentioned that collectively, The next step with this type of analysis lies in refining the linear weights value of every event. Adjusting for park is probably the next easiest adjustment to make, and after that, the next adjustment would be for individual pitchers so that every pitcher is his own universe. I think some of those adjustments are overkill based on the amount of data that are in my database right now, but over the course of the 2008 season its something to look for. Properly regressing the pitch values and finding out how much of the value is based on skill and how much is based on luck is another very important adjustment to make. I've roughly regressed the LWTS/pitch values to account for different sample sizes, but actually determining how many of the runs that Kazmir's fastball prevents are due to qualities of the pitch and how many are due to luck is important.
Weighing In
Finding the run value of a pitch is not as hard as I initially thought it might be. Using Tango's linear weights generator I found the run value of a single, double, triple, home-run and out. Using those values, I was easily able to find the value of each pitch for balls that were put in play, but I also needed to account for pitches that weren't put into play. To find the value of an average ball and strike, I converted the wOBA for each count into runs for that count, and then found out how much adding one ball changed those values for every count. I did the same thing for strikes, with the end result being that a ball is worth about .097 runs and a strike is worth about -.124 runs. There's a huge difference in the value of a ball or strike depending on what the count is, but I used these average values for my analysis because I didn't want to slice my already somewhat small sample of pitches into 12 smalled samples. As I continue to sift through this topic, I'm going to have to account for the different counts. Below are the 10 pitches that saved the most runs in the 2007 season. In addition to the run value of each pitch, the Sw% (swings and misses/total swings) and SLGBIP (includes home runs) are also shown. I broke the pitches up by batter hand to give a more accurate portrayal of exactly who is impacted by a pitch.
This list has some crossover from the first list, and the new list confirms that Looking a little closer at Webb's pitch repertoire you can see the effectiveness of each of his pitches. He's tougher on right-handed hitters overall, although lefties have a tough time hitting his curveball. Against righties, his changeup is twice as effective as his sinker, although that could be because he throws it infrequently relative to the sinker.
One thing that piqued my curiosity when looking at this list of pitches was if the 18 runs that Webb's pitches prevented could be something larger. Was Webb 2 wins above average in the starts that he made in Gameday parks? Could those wins be directly attributed to his pitches? Webb's pitches prevented 18 runs over what a set of average pitches would have done, so his pitches could be said to be responsible for 1.8 wins more than an average pitcher. Counting the playoffs, Webb made 16 starts in stadiums with the pitch f/x system in place, pitching 113 innings and posting an ERA of 2.55. 113 innings with a 2.55 ERA in the NL makes a pitcher 5 wins above average in his starts at enhanced parks. Perhaps fielding made up the 3 win difference over this time period, or perhaps Webb leveraged his pitches effectively, throwing strikes when it was important and throwing outside the strikezone when it wouldn't hurt him too much. Exploring this topic in more detail probably deserves a whole column at some point. Getting back to all pitchers, I wasn't very happy with the list of LWTS/pitch that I showed earlier. There were a lot pitches that had great rates but had only been thrown a handful of times, making me wonder if the pitcher had just gotten lucky throwing them. I'm sure
This list makes much more sense. Gabbard's changeup (vs. RHH) remains at the top, which is something that bears watching in 2008. The rest of the list is filled with most of the usual suspects, So where does all this leave us with Santana's changeup against right-handed hitters? Compared to other left-handed changeups thrown to right-handed hitters, Santana's changeup is exactly average, with a regressed LWTS/pitch of 0. Last year, the swings and misses the pitch created were counterbalanced by the pounding the ball took when it was put in play. Against righties the pitch Santana was most effective with was his fastball, which was worth -.03 runs every time he threw it (it also fell just outside the top-10). There are a ton of factors that impact how effective a pitch is, and maybe right-handed batters have started to sit on Santana's changeup more at the expense of hitting his fastball, but for last year at least, his changeup was pedestrian while his fastball was tremendous.
Splitsville: Take 2
Last week I looked at different splits, and found some interesting things about Rivera's cutter is ridiculously effective, especially against left-handed hitters. Nearly every single pitch he throws to a LHH is a cutter, yet they still swing and miss at the pitch. After writing about Rivera's cutter, I wondered if there were other pitchers who approached left-handed and right-handed hitters with only one specific pitch. Somewhat surprisingly, there were other pitchers who, perhaps unwittingly, were going after certain hitters with only one pitch. The table below shows these pitchers and how often they throw that pitch to LHH and RHH. The two columns labeled Freq. show the frequency that a particular pitch is thrown and Diff is just the Freq. LHH column subtracted from the Freq. RHH column.
All of the pitchers on the list would be considered fastball pitchers, but one thing to keep in mind when looking at the table is the different pitches each pitcher has and how that impacts pitch frequency. In I mentioned earlier that I thought it was interesting to look at cases where pitchers drastically altered their pitching style to different handed hitters, and the next step in examining those cases is to look at which pitches had the biggest differential.
These pitches all have different reasons for being thrown so much to hitters on one side. Putz's cutter/2-seam fastball gets a lot of swinging strikes when he throws it against both RHH and LHH, but his regular fastball and changeup aren't as effective against RHH as they are against LHH, which could be causing him to use more cutters at the expense of his changeup and 4-seamer vs. RHH.
I created the list below by eyeballing my list of pitches and picking the ones that had both a high swing and miss rate and a high SLGBIP. The pitches are based on the handedness split, so for the line with Haren's changeup, you would read it as, against right-handed hitters, he threw a total of 819 pitches, 22% of which were changeups. When batters swung, they missed 47% of the time and when the ball was put in play, the slugging percentage was .652. For some perspective, the average amount of misses when the batter swings at a changeup or slider is 25% and the average SLGBIP for those pitches is right around .500.
Wow, there are some good pitches and pitchers on that list. This is partly because half of the criteria to be included is to have a high swing and miss rate on a certain pitch. However, the other criteria is that the pitch is hit hard when it is put in play, so it's somewhat surprising that I have multiple
Splitsville
Several weeks ago, I used similarity scores to compare the movement on pitches. Using those scores, here are the most similar fastballs to Saito's, along with how often the pitches are swung and missed at.
All those pitches look similar, both in terms of speed and movement, but batters miss when they swing (Sw%) at Saito's fastball more often than at the similar pitches. The similar pitches mostly have an above average Sw% (the league average Sw% is 13%), but nobody is close to Saito. Moving outside the top-5 most similar pitches, there still aren't any pitches that can compare to the results that Saito gets with his fastball. The different results that come about from pitches that move almost identically further highlights the importance of the "hidden" aspects of pitching that are slightly harder to quantify, like deception, arm angle and pitch selection. Anyways, lets look closer at Saito, especially his fastball, and how left-handed hitters and right-handed hitters fared against him. The table below shows Saito's splits for his different pitches. For the most part the column headings are self explanatory, but as a reminder, Sw% is swings and misses/total swings, SLGBIP includes home runs, and Tot. is the total number of pitches against that side hitter.
The thing that really stands out here is how effective Saito's fastball is against right-handed hitters. 60% of the time, when a RHH swings against Saito's fastball, he misses it, which is an amazingly high amount of misses, for any type of pitch. Saito's fastball is still really good against LHH, but it's unbelievable (twice as good) against RHH. You can also see how Saito approaches LHH vs. RHH in this chart and it's interesting that while his fastball is so effective against RHH, due to the relative inefficiency of his off-speed pitches against lefties, he actually throws it more often against LHH. Saito's split is cool, but what about other cases where splits are involved. One of my favorite splits to look at is
The thing to notice here is that Rivera throws only cut-fastballs when facing LHH. Of the 188 pitches he threw to LHH, 187 were cutters. Wow. Up in the count, down in the count, with runners on, or with the bases empty, LHH know with almost total certainty that Rivera is coming with a cutter. There is no other pitch in the back of their mind that they might see...yet they still can't hit it. They miss 23% of the time they swing and even when the ball is put in play, it isn't hit with any type of authority. I'm completely mystified at how Rivera is able to be a one pitch pitcher to lefties. I'm open to suggestions, but I think Rivera's cutter to a left-handed hitter is the best pitch in baseball. I'm going to close with Rivera's reverse split because my head is still spinning with how bizarre it is. I think this type of analysis could be extended to examine if pitchers get different types of movement of pitches depending on the batter and different pitching patterns as well. Certain types of pitchers are able to survive with a suspect fastball by replacing fastballs with sliders depending on the hand of the batter. Examining the splits, based on pitch type, is another huge avenue for potential research with the pitch f/x data.
First Things First
The first pitch is thought to be very important in an at-bat. Young pitchers are taught to get ahead in the count and that the balance of an at-bat hinges on whether this pitch is a strike or ball. Throwing first pitch strikes is a mark of a good pitcher, and one of the most infuriating things to watch is a pitcher who can't throw first pitch strikes. Today I want to look at the value of the first pitch and what happens to those pitches after they leave the pitcher's hand. Of the twelve counts, there are six (anything without three balls or two strikes) where the at-bat is guaranteed to continue if the batter does not swing at the pitch. Assuming no swing, here are the chances of seeing a fastball in a subsequent count, based on whether the pitch is a ball or a strike. The chart is based on what will happen in the future based on what happens in the current count. So starting in an 0&0 count, if pitch is a ball, there is a 59% chance the next pitch (in the 1&0 count) will be a fastball, but if the first pitch is a strike, there is a 48% chance of a fastball being thrown in an 0&1 count. The swing of 11% measures how valuable a strike is in each count, in terms of potentially seeing fastballs.
The first pitch of an at-bat sets the tone of the at-bat due to the conditions it creates for ensuing pitches. In terms of seeing a fastball, there is relatively little difference between an 0&1 count and a 1&0 count, but if the first pitch is a strike the pitcher has put himself in a good position as the count progresses. An 0&1 count is a clear pitcher's count and even if he throws a ball in that count, a 1&1 count is still a pitcher's count and the pitcher arrived there through pitcher's counts. However, if the first pitch is a ball, the pitcher is now at a slight disadvantage because while 1&0 is a neutral count, it has the potential to turn into an extreme hitter's count. If the pitcher does throw a strike and evens the count at 1&1, he would have presumably been under more pressure to throw a strike after the first pitch. Sal Baxamusa explores this type of pitch sequencing in more detail here and actually finds that when batters put a 1&1 pitch into play, they do better when the order was strike-ball, despite apparently having an advantage in the other sequence. Anyway, that tangent was just to establish the importance of the first pitch of an at-bat. Now that we have a rough idea of its importance, lets look at what actually happens on the first pitch. The table below shows all first pitches, broken up by pitch type, along with certain measurements about each pitch type. Freq. is how often the pitch was thrown, S% is strike frequency, or strikes balls in play/all pitches, Called% is called strikes/total pitches, Swing% is how often the batter swung at a pitch, Sw% is how often batters swung and missed when they swung, Fo% is how often batters fouled balls off when they swung, and SLGBIP is slugging percentage on balls in ball, including home runs.
Fastballs are thrown slightly more often as first pitches than overall (60% on first pitches vs. 56% overall) which makes sense with pitchers trying to throw a strike and get ahead in the count, but generally, the rates are pretty similar for how often each pitch is thrown as a first pitch and overall. The most interesting thing to me on this chart is how often batters swing at a first pitch curveball. As a batter, a curveball isn't necessarily a pitch you would expect to see at the start of an at-bat, which probably explains the low number of swings because batters would only swing if it were a very hittable curve. This seems like a great example of how not being predictable helps a pitcher tremendously though. By occasionally throwing a curve as the first pitch, the pitcher is sometimes able to get a free strike because the batter swings so rarely. A first pitch slider would also come as somewhat of a surprise from most pitchers, yet batters swing at that pitch relatively frequently. A slider looks more like a fastball immediately out of a pitcher's hand, so perhaps batters are fooled into swinging because of this. This would explain the low SLGBIP, because unlike curveballs where a batter is swinging preferentially at pitches he likes, with sliders, batters are swinging at a pitch they think is a fastball, but are forced to adjust their swing once the slider breaks. Overall, curveballs that are put in play lead to a SLGBIP of .484, but on the first pitch their SLGBIP jumps to .552, similar to SLGBIP for fastballs on first pitches, which supports the idea that batters are good at selecting which curves to swing at on the first pitch. One other interesting thing in the table is what happens when batters swing at certain pitches. Batters rarely swing and miss at first pitch fastballs, but they foul off those pitches so frequently that fastballs are only slightly less likely to be put in play than the other three pitch types. I'm unsure why batters foul off so many fastballs, but it might be because batters are be willing to swing at a wider range of locations and speeds if they recognize the pitch as a fastball. In the past, I've looked at how batters of different quality are approached by pitchers. Using that method again, I wanted to see if there are differences in how these batters were pitched to on the first pitch as well. In the table below, columns labeled with -1 are the frequencies for first pitches while the columns labeled with -R are the frequencies for all other pitches.
I grouped hitters based on their Marcel projected SLG for the 2007 season and while the windows I used to group hitters are wider than in my previous examination, the overall idea is almost identical. Narrower windows would just show a more gradual increase in off-speed pitches as batters improved, but one other thing thats interesting is that it almost is as if there is a plateau for batters with a .400 SLG. A .400 SLG seems to be the level of hitter that prompts a pitcher to alter his first pitch repertoire. Recently, I've been looking at different groups of pitchers and seeing if there are differences in the way they pitch based on their age and the quality of their fastball. I created two group, those pitchers 34 and older and those 24 and younger, and then split those two groups into pitchers with an average fastball speed of more than 91 MPH and an average speed less than 91 MPH. The table below shows just the first pitch fastball frequency for each type of pitcher throwing to each type of hitter, along with the average of all first pitches for each pitcher type.
The same pattern is evident here as well, with the bad hitters seeing a lot more fastballs than the other two groups of hitters. This trend holds regardless of the age of a pitcher or the quality of his fastball and the big difference between groups of pitchers is how many extra fastballs they throw to bad hitters. Even though there isn't a tremendous amount of difference between a 1&0 count and an 0&1 count, the first pitch is a crucial pitch in setting the tone of an at-bat and the importance placed on it is probably justified because of this.
Grouping Madness
Last week, I wrote about different age groups and differences in the way they pitch. I received a couple of comments about certain ways to further create groups and try to isolate the differences I saw, and in doing that, I came up with some interesting new material for this week's article. In last week's article, I had two groups: old and young pitchers. This week, I split my age groups into two groups based on the speed of their fastball. The "young-slow" group was young pitchers who had an average fastball speed of 90.5 MPH or lower, and the "young-fast" group was comprised of the rest of the pitchers originally in the young group. I did the same thing with my group of old pitchers, and ended up with 4 different groups, which are summarized in the table below, along with the groups from last week for perspective.
There are a couple of really interesting bits in the table, the first being the FB% of the old-fast group being lower than the FB% of the old-slow group. One reason for this apparent inconsistency is that the fast group is made up of players who have retained a very effective breaking ball even as they aged (mostly sliders and cutters), which they rely heavily on. Here's a chart that highlights some important features about the sliders in each group. The old-fast group actually has the fastest slider, but the important parts of this table are the last two columns. One quick way for judging the "nastiness" or effectiveness of a pitch is to see how often a pitcher is able to get a swing and miss from it. The final two columns show the swing and miss percentage for sliders and fastballs in each group. These break down pretty nicely along speed lines, with the faster groups getting more swings and misses than the slower ones. What is a little bit surprising, especially in light of the frequency table, is how similar the speed groups are to each other for sliders and fastballs. The pitches move slightly differently for the two fast groups (and slow ones), but there isn't a whole lot of difference in how often batters swing and miss it. The similarity is surprising because of how often the two fast groups throw their fastball with the hard-throwing old pitchers throwing the fewest amount of fastballs with their younger counterparts throwing the most. Some of that difference is explained by difficulty controlling the slider vs. fastball, but it seems like hard-throwing young pitchers are being over-reliant on fastballs as a group. The flip side to this is that hard-throwing old pitchers could be throwing fastballs at closer to the optimal rate and preferentially throwing them when needed.
This possibility of old hard-throwers leveraging their fastballs better than younger ones also shows up in the results as well. The young-fast group had the highest SLGBIP on their fastballs while the old-fast group had the lowest and while this isn't the strongest evidence for the old pitchers picking their spots with their fastballs, but it's a start. Looking at fastball selection either by count or hitter quality is the next step here. I mentioned last week how the younger population was made up of both players who would eventually join the old group and players who wouldn't. This is a "duh" statement, but I think the pitchers who will survive and eventually make it into the old group would tend to come out of the young-fast group. That group can afford to lose some velocity on their pitches and still be effective, but the young-slow group is already on the edge of being very hittable and has nowhere to go if they suffer a drop in velocity. Obviously the attrition doesn't just come from the slow group, but everything else being equal, I would rather bet on a hard thrower having a longer career than a slow thrower. Looking at the list of names in each group reinforces this idea too. The slow group has only 22 names on it, but most of them wouldn't be considered top-prospects. The highlights include
Old Man River
Glavine's willingness to sacrifice walks for a decrease in power provided the spark behind this article, so the first thing I wanted to see was if there was any difference in the location of pitches between the age-groups. Overall, there was very little difference between where the two groups located their pitches, but looking at specific situations some differences could be seen. Hitter's counts are times when nibbling would be especially advantageous, and when you compare the two groups of pitchers in hitter's counts, the differences become clearer. The images below are for extreme hitter's counts (3&0, 3&1 and 2&0) and only include fastballs. I included only fastballs because I wanted to see where pitches were located even when the pitcher "gave in" to the hitter's count and threw a fastball. nbsp; The older pitchers have a higher percentage of fastballs in almost all of the border regions at the edges of the strike-zone. The differences aren't huge in any one area, which is probably more of a result of the fairly large regions used, but the older group appears to be throwing more at the margins. Not surprisingly, older pitchers fared a little worse when balls were put into play, which is one reason they are nibbling more than younger pitchers. Despite the older pitchers throwing fewer pitches in the strike-zone, batters swung at almost the same percentage of pitches from older pitchers as they did for younger pitchers and older pitchers didn't get any more called strikes than younger pitchers. All FB-Hitter's Counts Looking at all pitches in hitter's counts, it's unclear how much nibbling is going on or how effective it actually is. However, if you just look at pitches thrown within a 4 inch window, centered on the black of both sides of the plate, the picture changes. In these windows, which I think is where the nibbling largely takes place, old pitchers dominate their younger counterparts. Not only do they get a higher percentage of called strikes, but the slugging average on balls in play is almost .200 points lower. FB within 4 inches of either corner-Hitter's counts If you expand the chart above to cover all pitches in all counts, but still only look at that limited region, the old pitcher advantage almost completely disappears. Older pitchers still get more called strikes, which could be the older pitchers throwing more to the strike-zone as it is called, but the SLGBIP and BABIP values get much closer, with younger pitchers doing a little better overall. All pitches within 4 inches of either corner-All counts Without a larger sample, I don't think you can make any huge conclusions about the power of nibbling, but there are fundamental differences between the two groups of pitchers. Getting back to the extreme hitter's counts again, the pitchers in the young group threw 79% fastballs in those counts, which is a totally different approach than the pitchers in the older group, who only threw 63% fastballs in those counts. To put those values into some type of perspective, I previously found that in hitter's counts, the amount of fastballs thrown was very dependent on the quality of the hitter, with better hitters seeing fewer fastballs than bad hitters. Hitters with a SLG above .550 saw roughly 61% fastballs, while those with a SLG below .350 saw 74% fastballs. My older group was pitching to every hitter like they were facing The differences in how the groups pitched is at least partially due to differences in the repertoire of the groups. The table below shows the frequency that they threw each pitch, with the big difference being the amount of time they threw fastballs. This is in all counts, not just hitter's counts, but the older pitchers still are more cautious throwing their fastballs than the younger ones are.
One reason for this could be the quality of the pitch. The table below shows the average values for fastballs for each group, (the pfx values are the average of the absolute values to put LHP and RHP on the same scale), and the average fastball for the older pitchers is slower, probably making it a little easier to hit. Another interesting tidbit from this table is that the older group has less vertical drop on their curveball.
It would be interesting to see if there was a steady decrease in velocity or movement as a pitcher gets older, but the biggest problem with having just one year worth of data is that there is no good way to compare a player to himself at a younger age. Dividing them by age is a good start, but I'm really comparing two groups of pitchers, one group made up of players who have survived 10+ years in the major leagues (and possess certain traits that let them survive) and another group that is made up of some players with those traits (who will eventually make it into the old group) and some without those traits. When comparing the groups, I can't say that younger pitchers have certain traits, but rather that the younger group in my sample have certain traits. This selection bias is going to be present in any study that looks at aging (only the players who do well will survive to be included in subsequent samples), but I think that the pitch f/x data is well suited to minimize the problem. If a certain number of pitches (say 100) is enough to establish how a pitch moves, the prior success needed for a pitcher to throw that many pitches in the future is much lower than the prior success needed to throw enough innings to show a realistic portrayal of skill as a pitcher ages. This won't eliminate the problem but in certain cases it could help minimize it.
Winter Wonderland
John Walsh wrote a fantastic piece on Thursday about the differences between fastballs, sliders, changeups and curveballs, and what happens when those pitches are put in play. I've done some research into this area myself and wanted to graphically present some of my findings. One point that John made was fastballs, especially non-sinking fastballs, are hit on the ground the least often of any pitch. You can take this a step further, and look at the impact the location of a pitch has on how it is hit. The graph below looks at the percentage of each pitch type that are hit on the ground at different heights. The most obvious thing is the huge advantage a sinker has in generating grounders compared to any other pitch. (I found sinkers the same way John did, by using all pitches with a pfx_z value of less than 6 inches). This isn't surprising, but what was a little surprising to me is how the groundball percentage of every pitch decreases at almost the same rate with increasing height. I would have thought that certain pitch types, especially curveballs, would have been much better, relative to other pitch types, when they were thrown low in the zone vs. high in the zone. I thought a curve would have a higher ratio of gb% on low pitches to gb% on high pitches than other pitch types did. This wasn't the case, so maybe the idea of a high curveball being a terrible pitch isn't totally accurate. To get a better idea of what happens to high curveballs (and all pitch types), I looked at the slugging percentage for balls in play (including homers) based on which region of the strike-zone the pitch was thrown to. The table below shows those slugging percentages for the three vertical sections of the strike-zone. (The averages at the bottom are only for the pitches in the strike-zone and are higher than the averages in Walsh's article.) FB SL CH CB Sinker | Avg. Top 0.564 0.565 0.692 0.579 0.580 | 0.596 Middle 0.622 0.590 0.612 0.559 0.558 | 0.588 Bottom 0.554 0.496 0.498 0.458 0.481 | 0.497 ================================================== Avg. 0.580 0.550 0.601 0.532 0.540 | 0.561 For pitches low in the strike-zone, batters have the lowest SLGBIP against curveballs, but if a curve is thrown at the top of the strike-zone, batters greatly increase their SLGBIP. Curveballs are hard pitches to hit, but the difference in SLGBIP between a low curve and a high curve is second only to the difference between a low changeup and a high changeup. Everything else being equal (speed, spin, movement, expectations of the batter, if the batter swings, etc.) a pitcher is increasing the batter's SLGBIP by roughly .100 points if he throws a curveball that isn't at the bottom of the strike-zone. A changeup is potentially a great pitch, but changeups that aren't at the bottom of the strike-zone are hit much better than average. Low changeups are hit about as well as low sliders, but as the two pitches are elevated, the changeup gets hit much harder than the slider. A changeup above the knees is essentially a meat-ball and by throwing a changeup that isn't down in the strike-zone, the pitcher is increasing the batter's SLGBIP by at least .115 points.
The Same Things
Every pitch has a unique fingerprint that differentiates it from other all pitches. There are many factors that give every pitch a different identity, such as speed, how much movement it has, the handedness of the batter and pitcher, the location of the pitch, as well as the sequence of pitches that led to the pitcher throwing it. This week I want to look at how similar different pitches are. Do Using the pitch classifications from wmy database, I found the average speed and pfx values for every pitch I had data for. For example,
Webb's sinker is slightly more unique than Lowe's, primarily due to the spin he imparts on the ball (he has the smallest pfz_z number for a fastball and combines it with an large absolute value pfx_x value). One cool thing to notice is that the fifth most similar pitch to Webb's sinker is Accardo's changeup. Changeups typically have a smaller pfx_z value than fastballs, sinking more than a fastball thrown by the same pitcher, and Accardo's mirrors Webb's sinker. Overall though, I would classify the similar pitches in both cases (as well as other similar pitches that fell outside the top-5) as sinkers, giving some confidence that the system is actually finding similar pitches. I wanted to look at breaking balls too. Just from observing the two,
The first thing to realize is that Zito's curve is much more unique than either of the two sinkers. The reason for this is the lack of horizontal spin. Zito throws almost a true 12-to-6 curveball, and as a result of that, a right-handed pitcher's pitch shows up on his list of most similar pitches. I'm not saying that Vanden-Hurk's curve is going to look like Zito's to a batter, but Zito's curve is so unique that there aren't many similar pitches to it, thrown by either LHP or RHP. Hill's curve doesn't show up at the top of Zito's list because Hill's is thrown faster, has a smaller pfx_z value, and has a larger pfx_x value. Zito's curveball is really a unique pitch. Speaking of unique pitches, lets talk about
Again, these aren't necessarily pitches that will look like Rivera's cutter to hitters, but pitches that move like it. The release point a pitcher throws with plays a huge role in what a pitch looks like, but for right now, don't worry about that. I think this is a cool way to look at pitches and see similarities that might have otherwise gone unseen. Right now, the similarity scores I'm using are based more on how the pitch moves, independent of how the batter perceives it, which isn't the ideal solution. In addition to just the movement and speed, the sequencing and location of pitches has a large impact on how they are viewed by the batter. For
The pitch I called his fastball could be 2 different pitches, one of which behaves like a regular 4-seamer and one of which behaves almost exactly like Rivera's cutter. The red cluster in the chart below is what I initially called Burton's fastball and if you look at the far left of the cluster, you can see a somewhat separate cluster that could be a regular 4-seam fastball, with the cutter occurring more on the right. Without having first-hand information about the types of pitches a pitcher throws I wouldn't be comfortable making a distinction between 2 such similar groupings, but it looks like this might be something. I have Burton throwing the cutter around 50% of the time, the 4-seamer 25%, and the slider and changeup being the other 25%...Justin, do you know if Burton throws his cutter that often? If you're curious, here are the values of the 2 cutters...pretty much a dead on match, with Burton's actually having a higher (more "movement") pfx_x value. I would kill for data on Rivera's cutter when he was at his absolute peak though and I wonder maybe if he's lost an inch or two off his cutter since then. Name MPH, pfx_x, pfx_z
Dirty Jobs: part 2
Last week I looked at how pitchers approached each count, based on the amount of fastballs thrown and where they were thrown. Today I'm going to wrap up the topic, looking at what generally happens after the pitcher releases the ball and the hitter has to make a decision. The most basic decision a hitter has to make at the plate (after determining what pitch is coming) is whether to swing or not so the next facet of each count I looked at was how often hitters took a pitch in each count. To remain consistent with the other results I've found, I only looked at fastballs and the table below shows how often fastballs were taken in each count, along with how often the pitch was either a ball or a strike. The most obvious thing is how often 3&0 fastballs are taken, especially for strikes. I realize there are a lot of good explanations/reasons for this behavior, but it seems that hitters are sacrificing a huge opportunity by taking so many pitches in these situations. A 3&1 count is still a hitter's count, so the actual loss of the strike doesn't hurt the batter too much, but they are ceding one their most potentially productive counts by showing pitchers they rarely swing in in it. A generic 3&0 pitch is a strike only 60% of the time, compared with the average across all counts of 63%, but that's not nearly enough of a difference to justify taking 93% of pitches.
If the batter is able recognize a 3&0 pitch as a fastball out of the pitcher's hand he's at even more of an advantage. 3&0 fastballs are strikes 67% of the time, which is higher than the average for fastballs among all counts (64%) and when batters do swing at 3&0 fastballs, they are very successful, posting the highest Slugging Percentage by swings (TB/Total Swings) for any count. I would think that success would encourage more swinging on 3&0, but it apparently doesn't. I know that I'm making this sound overly simplistic, and there are certainly valid reasons why different hitters might not swing at a 3&0 fastball, (among others, they could be looking for a specific pitch or a specific location), but I think there's an element of risk-aversion on the part of the batter to avoid "wasting" a 3&0 count and making a visible out right then. I'm not sure how much more I'm advocating swinging at 3&0 fastballs, but if the whole point of a hitter's count is to force the pitcher into throwing more fastballs, then taking almost all of those fastballs can't be a good decision, especially when the pitch is nearly twice as likely to be a strike than a ball. Taking the pitch might not be as big of a problem as I'm making it out to be because even though a 3&1 count is a (slightly) worse hitter's count than 3&0, in terms of seeing fastballs, the two counts are very similar. This leads to the question, in which count is it worst to take a strike in? The table below has the FB% for each count, along with the FB% for the count that results from taking an additional strike and the difference between the two. Obviously it's suicide to take a called third strike, so those bottom four counts aren't very interesting, What is interesting is the top of the chart. Taking a 3&0 strike leaves the batter in roughly the same position he started in, at least in terms of possibly seeing a fastball. The lack of a "penalty" for taking a strike combined with the potential of getting a walk might contribute to the higher than normal take-rates in 3&0. The similarity in terms of seeing fastballs between 0&1 and 0&2 further emphasizes how important first pitch strikes are for a pitcher. 0&2 is obviously a better pitcher's count because the batter has a smaller margin for error, but in terms of fastball selection, once that first strike happens, the batter has a huge hole to dig out of.
Going back to the first table for a second, another interesting element is how the frequencies of taking a fastball for a called strike organize the counts based on the number of strikes a hitter has. When hitters have two strikes, regardless of the number of balls he has, there is only about a 5% chance of him looking at strike three. When he has one strike, there is about a 12% chance of taking strike two and with zero strikes and zero, one or two balls, a there is about a 28% chance of the batter taking strike one, but in a 3&0 count, that percent nearly doubles to 59%. I mentioned that batters had the best results in 3&0 counts, and I based that on the slugging percentage per swing in each count. This is very similar to slugging percentage for balls in play, except swinging strikes and foul balls are added to the denominator. This is a more granular metric than anything else I've seen and measures the value of a swing. To give a feel for the size of these values, the league average (for all types of pitches) is .273,
The original question that prompted this article asked about classifying 2&2 and 0&1 counts and the way hitters and pitchers approached each count. I would call both counts pitcher's counts but in an 0&1 count, the fewer strikes gives hitters a much bigger margin for error and allows them to be relatively selective about which pitch they swing at. However, an 0&1 count also allows pitchers to be less concerned with forcing a strike than they are in a 2&2 count. 0&1 has some advantages for both batters and pitchers, although the pitcher's advantage is dominant. In a 2&2 count, the batter and pitcher are under different pressures. A batter can't afford to be very selective because he only has one strike left, but a pitcher doesn't want to throw a ball and go to 3&2. The batter is again in a worse spot, making it a pitcher's count, but if 0&1 is a count where both the batter and pitcher are under pressure to maximize their advantage, in a 2&2 count it seems like both players are under pressure not to screw up.
Dirty Jobs
I've looked at pitcher's counts vs. hitter's count before, and prompted by this comment on The Book's blog, I decided to revisit the topic. When doing research of any kind, the hardest thing to do is to find an interesting question to topic to examine, and Tango's comment had a whole lot of interesting questions, so I'm going to tackle some of those, pseudo-blog style, throughout the day. Anyway, without any more introduction, lets see some results. The reason certain counts are considered hitter's counts or pitcher's counts is partially due to the likelihood of a fastball being thrown on that pitch. For most pitchers, a fastball is their least effective strikeout pitch, as well as the pitch they have the most control over. In an extreme example, on 3&0, most of the time a pitcher will throw a fastball to get a strike, but in doing so, gives the batter a better a good pitch to hit. The chart below shows the percentage of fastballs thrown in each count, and gives a slightly different view of what makes up a hitter's count vs. a pitcher's count.
With an average FB% of 59% and the number of pitches thrown in each count, there are four counts that see an "average" number of fastballs, while the others could be grouped into hitter's counts and pitcher's counts. Most of these percents make sense, and the top of the list corresponds very well to the top of the pass-through table in terms of ranking the counts in terms of hitter friendliness. Not surprisingly, hitters see the most fastballs in 3&0 and also have the best results if they pass through that count during their at-bat. The ranking of pitcher's counts doesn't match up as well, with 0&2 surprisingly not seeing the lowest FB%. I'm not sure exactly why this is, but the important thing is that the differences between groups is much bigger than any differences within the groups. The "ownership" of counts changes slightly using FB% as a guide. It makes intuitive sense that 1&1 should be a neutral count, the results of plate appearances that end in a 1&1 count make it a neutral count, the pass-through results say it's a neutral count, yet pitcher's throw fewer fastballs in that count than in other ones. Pitcher's don't seem to agree that 1&1 is actually a neutral count, and have responded by throwing almost as few fastballs as they do for 0&1 and 0&2 counts. 1&0, 2&1 and 3&2 change hands too. Prior to looking at this table, I would have bet any amount of money that there were a lot of fastballs thrown in these counts, making them hitter's counts. All of them have more balls than strikes and it just seems like they favor the hitter. Tango's pass through data labels them as hitter's counts, but pitchers treat them like 0&0 counts, throwing an "average" amount of fastballs. The two gray-area counts that Tango mentions (0&1, 2&2) are both pitcher's counts by this metric.
Now that we know a little about what pitchers throw in different counts, let's look at where they throw it. The table above shows the vertical locations in the strike-zone for fastballs thrown in each count. In a 3&0 count, 27% of fastballs thrown are higher than 6 inches below the top of the strike-zone, 29% are lower than 6 inches above of the bottom of the strike-zone and 44% are thrown between that. This doesn't account for the horizontal position of the pitch and there really isn't anything interesting to see in most cases. 0&2 has the lowest percent of pitches in the middle, which is expected, and it seems that when a pitcher is going to throw a waste pitch on 0&2 and 1&2, it is usually thrown high.
Post-Thanksgiving Quickie
I didn't have much planned today, but I was playing around with these conditional probability plots this week, and thought I'd share them. Conditional probability charts show the probability of an event happening, given one condition. In this case, they show the chance of a ball in play being hit on the ground given the height it crossed home plate. The graph below shows the probability of a fastball (that is put in play) either being hit in the air or the ground, given the vertical height where it crossed the plate. The dark gray region is the probability of the ball being hit in the air, while the lighter region is the corresponding chance of the ball being hit on the ground. The curve is smoothed slightly and the general pattern of low pitches producing more groundballs is what you would expect. This isn't surprising, but what’s cool is that you can see the continuous relationship between height and the chance of a groundball. Moving on, the graph below on the left shows the same thing as the graph above (the chance of a random pitch to be hit in the air or on the ground), but only for fastballs with a pfx_z value of less than 5 inches. This means that the pitch ended up 5 inches higher than a non-spinning pitch would have, and while that value doesn’t mean anything by itself, that’s the cutoff point I used to define sinking fastballs. The graph below on the right is for all fastballs with a pfx_z value greater than or equal to 5 inches and just looking at the two graphs, you can tell that there is a big difference in the chance of a sinker being turned into a groundball compared to a regular fastball.
Very roughly, the strike zone goes from a height of 2 feet to 4 feet, so a sinker at the knees that is put in play has a 65% chance of being a groundball, while a non-sinking fastball at the same spot has a 45% chance to be a grounder if it is put in play. At the top of the strike zone, a sinker has a 40% chance of being a grounder, while a regular, non-sinking fastball has only a 25% chance, so a sinker up in the zone is almost as likely to get a grounder as a regular fastball at the knees. At almost every height, sinkers are 15-20% more likely to be hit on the ground than a regular fastball. There are a ton of other considerations to take into account if you were finding the true chance of a ball-in-play being a grounder, like the horizontal position of the ball and exactly how much a pitch "sinks" (or breaks or spins or whatever you call it), but this is just another illustration of why sinkers can be so valuable for a pitcher. ====================================================================
Predicting Pitches
Last time I checked in, I looked at the percentages of fastballs thrown to different types of hitters based on the count. Toward the end of that article, I threatened to try to predict via regression when a pitcher would throw his fastball and this article is the preliminary result of that threat. What I wanted to do was find whether a pitcher threw a fastball or not, a binary variable, based on a particular list of factors, which was made up of both continuous and discrete variables. Regular linear regression can't handle binary dependent variables, but there is a special type of regression, logistic regression , that is designed for just this type of analysis. Given an dependent variable and one or more independent ones, a logistic regression will solve for the logarithm of the odds that a binary event is going to occur. Unlike linear regressions, where the relationship between the dependent variable and independent variables is somewhat obvious based on the generated coefficients, the coefficients created from logistic regressions are more confusing because they're really referring to the log of the odds of the event happening. The methods of a logistic regression are similar to a linear one, in that it models the relationship between several variables, it just does so in a less straightforward fashion. While that's sinking in, I'm going to backtrack a little. Before getting into the messiness of regressions, I wanted to see if there were any easy correlations to spot. The conditional probability charts below give a good idea of the magnitudes for possible ranges for FB%. Getting back to the regression, the first variable I tested was the 2006 slugging percent of the batter. Clearly there is a relationship between the amount of fastballs a hitter sees and his quality (I've beaten this point into the ground), but how strong is it? The coefficient for SLG was -.77, so for every .010 increase in SLG, the likelihood of seeing a fastball increases by .19 percent. This doesn't seem like that big of an impact, but is still a significant predictor of FB%. According to my regression, the factors that relate to the quality of a pitcher's fastball, the strike% and swing and miss% are also both significant factors if a pitcher threw a fastball. Categorical variables, such as the count or the situation with base-runners are also important. This is again, a very obvious point, but as opposed to just looking at hitter's counts vs pitcher's counts, and saying certain types of batters see more fastballs in each type of count, with the regression, I can estimate what percentage of fastballs any type of hitter will see in any specific count. The chart below, which is a little confusing, attempts to do exactly that and also account for the quality of the fastball being thrown. The green lines represent the estimated FB% in each count over a range of hitter abilities, for a fastball that gets a below average number of swings-and-misses. Looking just at the green lines, there are three relatively distinct bands. The top three lines (roughly starting around .8) are 3&0, 3&1 and 2&0, which are the three biggest hitter's counts. There are actually four separate counts in the next two distinct green lines (starting around .7), 3&2, 1&0, 0&0 and 2&1. The bottom cluster of lines has the remaining counts, 1&1, 0&1, 0&2, 2&2 and 1&2. These groupings end up matching pretty well with the groupings of counts found here. The black lines on the graph are estimates of the exact same thing (FB% in a given count over a range of SLG), but they are for pitches that have a higher than average swing-and-miss%. The ranges of different counts are the same so this just shows the range where most MLB pitchers would lie. The differences I'm looking at right now are mostly marginal, especially at the ranges MLB players perform at. The three bands of counts are distinct in the FB% that pitcher's throw, but within each band, its very tough to see any differences. The next step with this type of analysis is to break down pitch selection based on potential swings in win expectancy. Win expectancy would account for score difference, base-runners, and outs, which are very important in determining how a pitcher pitches. The quality of the on-deck hitter is probably important as well. On an individual pitcher level you could also potentially see more variation within a specific count. If
Pitching to the Hitter
In my previous article, I looked at the decisions pitchers make about what pitches to throw. One thing I didn't look at, and was reminded of by a comment from MGL, was how this pertained to hitters. Do certain types of hitters see more fastballs than other types? I had some slight difficulties trying to determine if pitchers deviated from their normal pitching patterns in certain situations because I didn't have the ability to know what their "true" pitching patterns in different situations were. Since hitting is the reaction to the action of pitching, looking at hitters is much easier. I can look at how pitchers pitched in a given situation against certain hitters and then compare that to how the exact same pitchers pitched in the exact same situations, but against different hitters. Including the post-season, I have 189 pitches in my database when When Ortiz is in a hitter's count, pitchers throw him 66% fastballs, which puts him right at the league average of 67% fastballs seen in those situations. However, Ortiz is far from a league average batter in terms of his power potential. How do pitchers approach other elite sluggers when they find themselves behind in the count? My definition of an elite slugger might be a little loose, but I took everyone with 300 ABs and a slugging average of .550 or higher this year and looked at how pitchers approached them in hitter's counts. Not surprisingly, pitchers gave these hitters fewer fastballs as a group in these situations. Instead of seeing 67% fastballs, elite sluggers only see 61% fastballs when in a hitter's count. Ortiz sees more fastballs than the other hitters in this group, but within a reasonable amount. Teammates At the other end of the spectrum lie
Keep in mind, the FB% listed in the table are only for hitter's counts and while the chart isn't too revealing, I just think it's interesting to see the different ways each hitter was approached. I was surprised to see Soriano see so many fastballs, as he's a hacker, but maybe there's a good reason for it. Braun got a lot of fastballs, presumably even after started dominating offensively, so maybe there wasn't a good scouting report on him yet, although I'm not sure why there wouldn't be. Getting back to how pitchers approached different types of hitters, I split up every batter (with a minimum of 300 ABs) based on their slugging average, and then found the FB% for that class of batters in hitter's counts. The table below shows the number of hitters in each group, the number of fastballs seen and total number of pitches seen in hitter's counts, the average slugging average for the group, and the percentage of fastballs the group saw.
This table has a lot of things going on, but the most obvious one is that in hitter's counts, as the caliber of a batter increases (slugging goes up), FB% goes down. This isn't true for every batter individually, but the overall trend is really clear. I don't know exactly why pitchers are behaving this way, (maybe bad hitters as a group can't hit fastballs very well and there is less of a cost to the pitcher's stamina for throwing a fastball), but they do throw fewer fastballs to each progressive range of hitters. It makes sense that pitchers would avoid throwing fastballs to better hitters and try to fool them with junk, while getting after the weak hitters and not worrying about home runs and doubles. Even though all these batters are in hitter's counts, some got many more fastballs than others. Not every hitter's count is created equal. The last column in the table, PFB%, is the other big thing to see. For every hitter, I found the different pitchers they faced in hitter's counts, and then found out what those pitchers had thrown in all other hitter's counts they were in during the season, regardless of hitter quality. The next table uses the same hitter groupings, but looks at the pitches they saw in pitcher's counts. This chart tells a much different story than the first one. In hitter's counts, pitchers seem to be aware of the type of hitter at the plate and pitch accordingly. Ortiz gets fewer fastballs than
MGL's comment that prompted this article, about whether When looking at the relationship between slugging average and FB% for hitters, I thought about trying to predict the FB% of a pitcher, given any situation. For a pitcher with a given set of pitches, you could possibly figure out how often he should throw his fastball in a situation and then compare how often he actually threw it. I’m not sure exactly what factors I would use to predict this, but I think the quantity of pitches a pitcher throws, the nastiness of those pitches, the batter, and some measure of the pitcher’s control would play a big role. For a batter, I think the FB% that he should see would be primarily impacted by his quality as a hitter, in terms of batting eye, ability to make contact and ability to hit the ball hard, as well as any holes in his swing.
Pitch Frequency
There are many variables that impact what type of pitch a pitcher will throw on any given pitch. The type of hitter, the count, if there are runners on base, what the score is, what pitch was just thrown, as well as the different types of pitches a pitcher has in his arsenal all play a big part in what pitch will be thrown next. Given any situation that a pitcher is in, be it close game or blowout, facing Ryan Freel, in a hitter's count or pitcher's count, there is a certain frequency that he should throw each of his pitches for optimal results. These frequencies are dependent on the situation and pitcher, and even though we don't know exactly what they may be in each situation, they do exist. A pitcher can't let a hitter get too comfortable in any situation, so even if the pitcher has an amazing slider, he is still going to have to occasionally throw a fastball to keep a hitter honest.Last week I looked at the sequencing of pitches in an at-bat and used the overall percentage that a pitcher threw his fastball as a proxy for his true rate of throwing a fastball on any particular pitch. Prompted by these two threads on The Book's website, I went back into my database, and for every pitcher with at least 100 pitches, I found out how often they threw their fastball. I've created lists like this before, but this time I created splits based on the count the pitch was thrown in, either hitter's counts, pitcher's counts, or neutral counts. Using the overall percentage of pitches that were fastballs (FB%) for a pitcher as their true rate of throwing fastballs, I then looked to see if pitchers were throwing a significantly different amount of fastballs in each type of count. I used the frequencies of fastballs thrown because it is the easiest pitch to look at. Every pitcher throws a fastball and while they all don't move the same, fastballs have much more in common across different pitchers than any other pitch does. I have 421 pitchers in my sample, and in hitter's counts 299 of them threw significantly more fastballs than their overall average, while only 4 threw significantly fewer. In pitcher's counts, 286 pitchers threw significantly fewer fastballs, while only 9 threw significantly more. This is pretty much what we would expect to happen. One reason why hitter's counts are considered advantageous to hitters is because they see lots of fastballs (more than the overall average), which are generally easier to hit than breaking balls. Results like that also make me think that the overall fastball frequency of a pitcher isn't a good substitute for his frequency in different counts. In my article last week I looked at
All three pitchers throw a lot of fastballs overall, and two of them throw more fastballs than their overall average when in hitter's counts. This pattern holds true for almost all the pitcher's in my sample, with the average FB% going from 55% overall to 68% in hitter's counts. In light of this difference, using the overall FB% doesn't seem like the best proxy for the true FB% in hitter's counts. One way to estimate the true amount of "skill" involved in an act is to regress it toward the population mean. In this case, I'm looking to estimate the true level of decision making that impacts the FB% in hitter's counts (basically finding the amount of "skill" for a measurement given the observed frequencies, random standard deviation, population average and population standard deviation). Once the regressed FB% are found you've got a much more accurate idea about what to expect in a given count from a pitcher. The overall FB% of a pitcher doesn't really matter to a hitter because a hitter will always find himself in a situation that alters the base frequency. Here's a table showing the eight pitchers who throw the most and least fastballs in hitter's counts.
The first thing I noticed about the list is that the top group are almost all relievers, while the bottom group is almost all starting pitchers. There are other starters besides Cabrera and Wang at the top of the list, but for the most part, relievers are more likely to throw a fastball in a hitter's count. This is probably because they don't usually have a good second or third pitch that they can throw strikes with. Fastballs for relievers are also usually faster than those of starters, so even if the batter knows the pitch is coming, they might not be able to do anything with it. Starters generally have more pitches than relievers, so they become less reliant on one pitch in any count, although as Cabrera shows, this isn't always the case. I wouldn't take too much from that list as there are good and bad pitchers at both ends of the list. However, if you were to take absolute difference between the FB% in hitter's counts and the FB% in pitcher's counts, you would get a list of pitchers who are throwing their fastballs equal amounts in both counts. Name FB%-hitter FB%-pitcher Difference The guys at the top of this list usually have the reputations for being "smart" or "crafty", willing to throw any pitch at any time. Without looking at their other pitches, I can't verify that they will throw anything in any count, but according to this list, they don't alter the amount of fastballs they throw based on the count, which means at least that they throw the same total frequency of off-speed pitches in the different counts. The bottom of the list is populated with pitchers who drastically change the amount of fastballs they throw depending on the count. Someone like Lidge, who just has two pitches, primarily throws fastballs in hitter's counts and sliders in pitcher's counts. Even if Lidge is throwing his fastball and slider at their optimal frequencies in these counts, the difference between frequencies gives hitters very good information about what pitch is coming. Comparing the pitch frequencies for the same pitcher in two different time periods, like
This is more of a backwards looking analysis that explains what happened rather than why it happened or what will happen in the future. Even still, it's fun to look at. I think of the frequencies that pitches are thrown like the slices on a circular spinner. Making the correct decision about what pitch to throw is easy for a pitcher, just spin the Wheel-of-Pitches and throw whatever comes up. Knowing how big to make the slices for each pitch in different situations is much harder than actually deciding what pitch to throw. I didn't really look at this, but I'm curious how much the catcher contributes to setting the frequencies and spinning the wheel. At the top of the list of pitchers who throw fastballs in any count (the "smart" pitchers) were two pitchers on the Red Sox, with a third, Dice-K, just missing the cut.
Pitch Sequencing
I'm wanted to look at pitch sequencing this week and see how pitchers pitch in certain situations. What happens after a certain pitcher starts a hitter off with a fastball? What pitch do they throw for the second pitch? What if they start him off with a curve? Whats the most common first pitch to a batter? Do certain pitchers follow predictable patterns of pitches? Of the 1016 pitches that PITCH f/x has recorded for Beckett, he has thrown 67% fastballs, 27% curves and 6% changeups. He throws his fastball more than an average pitcher does, partly because he only has three pitches and partly because his fastball is such a good pitch. On the first pitch to a batter, Beckett pretty much throws his pitches at their normal frequencies (69% FB/23% CB/8% CH). It gets a little more interesting after he has thrown one pitch though. If Beckett starts the hitter off with a fastball (and the batter doesn't put it into play), the second pitch that Beckett throws is slightly more likely to be another fastball. Of the 155 pitches he has thrown after a first pitch fastball, 73% of them have also been fastballs. When Beckett throws a curve on the first pitch (and it isn't put in play, which happened on 61 pitches), his second pitch is a fastball only 53% of the time. This is where I start to get a little hazy with the math, but if the decision to throw a fastball or not or every pitch were independent and Beckett has a 67% chance of throwing a fastball on any pitch, then given 155 pitches, you would be 95% confident that the range of fastball frequencies would be between 61-73%, which is what happened for the pitches after a first pitch fastball. However, when looking at the same 95% confidence interval for the pitch after a first pitch curveball (61 pitches), you get a range of 57-77% fastballs, but he actually only threw his fastball 54% of the time in those situations. Beckett significantly deviates from his "normal" pattern of pitching and throws fewer fastballs after he starts a hitter with a curveball. This is easier to understand in a table, so here's a table with all the information from the previous paragraph. The numbers quoted above were frequencies that he threw different pitches. The way to read the table is that after a first pitch fastball that wasn't put in play, Beckett threw 155 pitches, 73% fastballs, 21% curveballs and 6% changeups.
There are plenty of obscure relationships between Beckett's pitches, such as what happens when he starts a batter off with two fastballs or curveballs, but before looking at those relationships, I need to make sure that my assumption of independence between pitches isn't going to be a problem. There are plenty of reasons why Beckett would throw more curves and change ups on the second pitch to a batter that he started off with a curveball. If a batter had a tough time hitting off speed pitches, it would make sense that Beckett would give him several in a row. In fact, if he starts a hitter off with two curveballs in a row, the chance that the third pitch is a fastball is 58%. The assumption that his decision to throw a each pitch is independent isn't totally realistic, because the situation and type of hitter will impact his decision about which pitches to throw, but it doesn't really impact my results. The distributions will be different depending on the situation (I'd be more surprised if they weren't), but I'm more interested in how he changes his pitching patterns in certain situations, rather than if he changes or not. Is he throwing more fastballs on the first pitch than is expected? Does he follow up fastballs with curveballs? What does he throw after a fastball is fouled off? The assumption that he has a static 67% chance to throw a fastball on any pitch might end up being more of a problem, but I think that can be fixed with some regression toward the average values in each situation.
It seems to me that pitchers would be most effective if they didn’t fall into tendencies regarding pitch sequencing. Beckett, Sabathia and Maddux are all essentially three pitch pitchers who throw fastballs more than average. They all throw slightly different amounts of fastballs, but on the first pitch of an at-bat, Sabathia and Maddux throw proportionally more fastballs than they do overall. Hitters are already probably looking for a fastball from these pitchers, but they can afford to look even more on the first pitch. On the first pitch of an at-bat, Sabathia and Maddux don’t exactly become 1-dimensional pitchers, but they do remove some of the uncertainty regarding pitch selection from a hitter’s mind, although they could be varying the location enough on the first pitch to make up for it. Beckett is much more in line with his overall pitch frequencies on the first pitch. He does throw 67% fastballs, so hitters should still be looking fastball on the first pitch, but no more than at any other time they face him. The next step in this vein of research is to expand from looking at just three pitchers to all pitchers. Ideally, I would know what the average fastball (and other pitches) frequency is in the different sequencing situations I looked at, maybe split by hand orby type of pitcher. In addition to seeing if the pitch frequencies differed from a binomial distribution, I could also see how much they differed from the average frequencies in those situations. Using a static value for the frequency a pitcher throws a pitch is also not totally accurate and with average values for each situation, I could regress each pitcher’s situational frequency and get a better approximation of his true frequencies.
Beckett vs. Sabathia
With the ALCS starting tonight, I wanted to take a quick look at the Game 1 starting pitchers, Here are two charts, showing the difference between each pitch and a non-spinning version of that same pitch, which compare Beckett and Sabathia.
Beckett
Sabathia
There are some basic differences between the two pitchers, such as Beckett's curve having more downward movement than Sabathia's (which is probably closer to a slider in terms of movement), but overall, the way their pitches move are relatively similar. The biggest difference, besides throwing hands, is that Beckett throws his fastball more often and is pretty much a two pitch pitcher, while Sabathia uses three pitches. Another graph I thought was interesting in my analysis of Peavy was the pitch frequency by inning.
One neat thing on Beckett's frequency chart is that he throws his fastball much less as the game goes on, almost following a linear pattern. The 6th inning is the only inning that deviates from this pattern, and rather than saying Beckett must throw a lot of curves in the 6th, I would think that this inning is when he would usually face the best hitters in the lineup for a third time, so he throws fewer fastballs than he otherwise would. For what its worth, the 6th inning has been one of Beckett's least successful innings this year. Sabathia appears to follow a similar pattern for fastball usage as Beckett does, but he has more off-speed pitches to work with. You can see from his chart how, unlike Peavy, he doesn't show the dramatic increases in certain pitches every couple innings. Sabathia throws his off-speed pitches more frequently as the game progresses, but it's a gradual increase, as opposed to the sharp transitions of Peavy. Be sure to check back later for a Baseball Analysts staff preview of the series.
ALDS Preview: New York Yankees vs. Cleveland Indians
The Indians won the AL Central this year with a record of 96-66, accomplishing what many had been predicting of them for several years, and today will host the first playoff game in Cleveland since 2001. The Yankees used a furious second half charge to win their first wild card since 1997 and extend their streak of reaching the playoffs to 13 years in a row. The Indians have some great pitching and the Yankees have the best offense in baseball, so it could be an interesting series in terms of conflicting styles. I've gathered some information about the series and each team, and then have two guest writers, Earl from Pinstripe Alley and Ryan from Let's Go Tribe to break down the series, position by position. **************************************** Hi, I'm Ryan Richards of Let's Go Tribe. After going through the late-season collapse of 2005, it was nice to have a boring last week of the season thanks to an early clinch. It's only been six years since the Indians were in the playoffs, but that was long enough for Kenny Lofton to play for eight teams before coming back to Cleveland.
* if necessary
HOME ROAD TOTAL NYY 52-29 42-39 94-68 CLE 52-29 44-37 96-66 Head-to-head results: The Yankees swept the season series, 6-0.
RUNS AVG OBP SLG OPS OPS+ NYY 968 .290 .366 .463 .829 123 CLE 811 .268 .343 .428 .751 105
RUNS AVG OBP SLG OPS ERA+ NYY 777 .268 .340 .417 .757 96 CLE 704 .268 .322 .407 .729 109
Earl says: I would have said Martinez last season, but Posada has just been unbelievable in 2007. Edge to Yankees. Ryan says: Even. Posada has the better rate stats, while Victor has the counting stats (thanks to some time at 1B) and a better arm.
Earl says: This one isn’t close. Big edge to Tribe. Ryan says: Advantage Indians, assuming Doug Mienkiewicz comes back to earth.
Earl says: Cabrera is a nice player, but this one isn’t close either. Big edge to Yankees. Ryan says: This one's easy: Yankees
Earl says: Tough to pick against the Captain in October. Edge to Yankees. Ryan says: Advantage Yankees.
Earl says: You have to ask? This is A-Rod’s year. Edge to Yankees. Ryan says: Yankees.
Earl says: This is one pretty even. Lofton has had a good year back in Cleveland and Damon is healthy again and playing well. Ryan Says: Lofton's provided what the Indians need, but Damon's been better. Point to the Yankees.
Earl says: Melky on his best day doesn’t compare to Sizemore on his worst day. Big edge to Tribe. Ryan Says: Indians in a no-brainer.
Earl says: I thought Abreu was finished in May. He really turned it around and is a major cog in that lineup. Edge to Yankees. Ryan says: Yankees
Earl says: Matsui struggled in September because of his knee barking, but tends to hit well in October. Hafner scares me. Edge to Tribe. Ryan Says: Even in a down year, I'll take Hafner over Matsui. Indians.
Others: Earl says: The bench was a big problem for the Yanks earlier in the season. Not anymore. Edge to Yankees. Ryan says: The Yankees have the more useful bench.
C. C Sabathia (19-7, 3.21) has increased his strikeouts and dropped his walks for the fourth consecutive year. He’s become very efficient on the mound, and averaged seven innings a start. He hasn’t faced the Yankees since 2004 and a long break between starts like that usually favors the pitcher.
Earl says: The 1-2 punch of Sabathia and Carmona certainly beats Wang and Pettitte. But the rest of the rotation on both sides pose a lot of questions marks. Edge to Tribe. Ryan says: Ryan says: Sizable advantage for the Indians.
Others: Phil Hughes (fifth starter)
Jensen Lewis (1-1, 2.15, 5 HLD) was brought up in mid-July, and has worked his way into Others: Earl says: Mo over Borowski is obvious and Joba and Betancourt seems like a wash to me. Nonetheless, the rest of the Tribe pen is deep and much more stable. Edge to Tribe. Ryan says: Even with
Scouting Jake Peavy
With the season winding down, and as a bit of foreshadowing for the playoff preview I'm writing next week, I wanted to use my PITCH f/x database to look at
Here's a chart of Peavy's pitches, showing the differences between his pitches and their non-spinning equivalents. One thing that sticks out in this graph is the large group of sliders and when I looked at Peavy before I was unsure how to handle that group of pitches. If you look close enough, you begin to make out two groups, although overall they appear to be variations on his slider rather than two unique pitches.
Peavy's fastball is thrown hard and hitters will see it as having slightly less "drop", relative to an average RHP fastball. The smaller amount of drop gives the illusion of a rising fastball, which is reflected by a higher than average pfz value. His fastball also has more arm-side movement than an average RHP fastball does. Almost every pitcher throws their fastball the most and will throw their "out" pitch, if they have one, the next most, usually far more than average for that pitch type. Peavy is no exception to this pattern, and 35% of his pitches are sliders, compared to the average for RHP, which is 19%. His slider breaks away from RHH and drops less than an average slider, although in both cases, not by very much. It is thrown at roughly the same speed as his changeup, although they move in opposite horizontal directions. The wide range of possible slider movements makes comparisons against an average slider a little less precise than for fastballs. Sliders are essentially what's left over after fastballs, curveballs and changeups have been identified, the scrapple of pitches, and there is more variation among sliders (and cutters) thrown by different pitchers than any other pitch. Peavy's changeup has similar movement to his fastball, except it travels 10 MPH slower. He doesn't throw his changeup much (6% of pitches) and his curveball even less (2%).
If hitters are curious about which pitches to look for, here's a graph showing the frequency that Peavy throws each of his pitches, by inning. You can really see Peavy's reliance on his fastball and slider from this graph. 70% of Peavy's first inning pitches are fastballs (Avg. RHP throws 60%) and while there are any number of reasons why his first inning fastball percentage is higher than average (he doesn't want hitters to see his slider early in the game, he's trying to make sure he has command of his fastball, he's trying to establish his fasbtball as a pitch) it could just be because he really only throws two pitches. As the game moves along, he throws fewer fastballs and focuses more on his slider, which is consistent with how most pitchers operate. I only have data for 9 pitches of his in the 8th inning, so there probably isn't anything to the fact that only 3 of them are fastballs. Now look closer at the staggered increases in his changeups and curveballs in the third and fifth innings respectively. These changes mirror the increase in sliders in the second inning. One possibility to explain these changes is that Peavy might not want to show his full arsenal of pitches to hitters early in the game. In the first inning he throws mostly fastballs, then adds his slider to the mix in the second innnig. It looks like he begins throwing his changeup in the third inning and adds his curveball in the fifth. These changes are pretty subtle and might just be artifacts, but if they’re real, it gives hitters another piece of information about what pitches to look for at various stages of the game. That graph gives an overall pattern for Peavy, but which pitch does he throw when he needs a strikeout? In situations where the win value of a strikeout is the same as the run value of a ball that is put in play, you would expect a pitcher to not worry about getting a strikeout vs. a regular out, while when the value of a strikeout is high, you would expect a pitcher to try for a strikeout. One important thing to note is that I used the run value of a strikeout as opposed to the win value when splitting up situations. Using the win values of strikeouts, which is the correct way to do this, would cause my already small samples to shrink even more, and using the run values ignores the possibility of pitching in a blowout, where a pitcher would want to avoid walks and just get outs, even in situations where a strikeout might normally be needed. There's some error built into these values as a result. With that disclaimer out of the way, Peavy gives the batter almost even odds on seeing a fastball (54%) and a 37% chance of seeing a slider when the value of a strikeout is the same as a ball-in-play out. However, in situations when the value of a strikeout is greater, Peavy throws more fastballs (61%) and fewer sliders (33%). Because he relies so heavily on two pitches, Peavy throws both of his slider and fastball more than average in each situation, but increasing the percent of fastballs when he needs a strikeout doesn't make sense, given his great slider (although it has worked for Peavy). Another idea to consider is that perhaps his fastball is actually his best strikeout pitch. Which pitch does he get the most swings and misses from? The table below shows the overall frequency that Peavy gets swings and misses from each of his pitches.
For each pitch, Peavy generates more swings and misses than average, but it seems that his slider is being underutilized in big situations. There's undoubtedly a game theory element to his pitch selection in pressure spots, and perhaps his slider would lose some of it's effectiveness if he threw it more often, but it would appear that he's making his fastball less effective than it could be by throwing it so much in important situations.
The final frontier when examining a pitcher is what actually happens once he throws a given pitch. The chart below shows where Peavy throws his fastball. Ideally, I would split this chart by batter type, and show where he throws all his pitches to LHH and RHH, as well as how they hit them. However, with so many splits you start running into sample size problems, and there just isn't enough PITCH f/x data to give this the treatment it deserves yet. One thing you can notice from this chart is that Peavy doesn’t throw his fastball low, but challenges hitters with it in the middle of the strikezone. This is counter to conventional wisdom, but again, Peavy has been effective with it. The next best thing to showing how different hitters fared against different types of pitches is showing how hitters did overall against Peavy.
These charts show Peavy’s pattern of pitching to RHH and LHH. One quick thing you can see from the graph is that he works both groups of hitters outside more than inside, with the outer third of the strikezone and right off the plate being his primary targets. There seems to be a little bit evidence that Peavy pitches low in the strikezone, but he still throws a lot of pitches in the middle of the strikezone.
These BABIP graphs for LHH and RHH are probably not very accurate because of the small amount of ball-in-play data, but with a larger sample, they could be valuable for showing hot/cold zones.
Peavy’s fastball and slider are his two best pitches and he throws them the majority of the time. His slider has a couple different types of movement and could actually be two different pitches, although it looks more like the differences are variations of the same pitch. He also has a changeup and curveball that he throws much less frequently, and which aren’t as good. In pressure situations, it appears that he relies a little more on his fastball than normal, even though his slider creates more swings and misses. In the first inning he throws more fastballs than an average RHP, and throws a lot more fastballs than sliders, relative to how he pitches the rest of the game. He doesn't let the other team see all his pitches in the first inning and introduces his slider in the second inning, his changeup in the third inning and cuvrevball in the fifth inning. *****************************************************************
"Breaking" Away
When the PITCHf/x system debuted last year, the first thing I wanted to know (besides how hard The first attempt to quantify break using PITCHf/x debuted during the 2006 playoffs and compared the actual pitch to a pitch thrown without spin. The system would capture the flight path of a pitch, then create a hypothetical pitch that was thrown with the same initial velocity and release point, but with only gravity and drag acting on it. The difference between where this pitch would have ended up and where the actual pitch ended up was given as the "pfx" of the pitch. There are a couple problems with this definition, the biggest being that The next try at quantifying break arrived this season and is more in line with how people imagine break. This version of break is defined as the greatest distance between the path of the pitch and the straight line path from the release point to home. A 12-to-6 curve will have a large value, while a regular fastball will have a small one. It's confusing to think about this definition, so if you're having trouble understanding it, imagine holding a bow from one of the ends with the other end held away (and slightly down) from you. The end you're holding is the release point, the other end is where the ball crossed home, the string is the straight line path, while the ball would travel along the bow itself. If you rotate the bow around the string at given angle, you get the actual path of the pitch and break as given by PITCHf/x. (Thanks to John Walsh for the bow analogy). This break value becomes even more valuable (at least to me) when you break it up into x and z components and Dr. Alan Nathan's website has some (more) helpful equations that allow you to calculate break-z and break-x values. To visualize break-z, imagine keeping the endpoints constant and rotating the bow around the string until the bow was above the string and perpendicular to the ground. Break-x is the same thing but the bow is parallel to the ground (don't worry if the bow is to the left or right of the string just yet). The break values are vary similar to the pfx values, except they are in reference to an imaginary straight line, something that is easy to visualize. If the break-z value is 17 inches for a Once you understand and are comfortable with the break values, they act pretty much the same as the pfx values, with the benefit of meaning something. Comparing the two
Negative break-x values mean movement away from a RHB, and you can see that Zito's pitches typically move away from a RHB. This type of horizontal movement (toward the arm-side) is what you would expect for a fastball and change-up from any pitcher. Zito's curveball breaks slightly away LHB, which is how curveballs from LHP are "supposed" to break, but the magnitude of Zito's horizontal break is less than normal. The table below shows other similar curveballs from LHP, sorted by their vertical break.
Zito's curveball actually has the biggest vertical drop of any pitch thrown this year, and comparing it to the other pitches in the chart, you see that the horizontal break is much lower. Zito has historically fared better when throwing to RHB than LHB (669/730 career OPS ) so maybe his unique curveball is the reason why. It's reasonable to think that because the curveball doesn't move away from LHB as much as normal, they would have an easier time hitting it. The only pitcher with a similar curveball is DiNardo and he too shows a reverse split (792 OPS career vs. RHB/814 OPS vs. LHP). On The Book's blog this week, there was a discussion about comparing
None of these pitches match Rivera's cutter very well and Meche is the only one of these pitchers to have a reverse split for his career. One idea I had as I was looking at Zito and Rivera is that uniqueness in horizontal movement might cause reverse splits. Rivera throws a fastball that breaks horizontally like nobody else's in baseball. Zito's curve is unique not due to it's vertical break (although it is large), but it's lack of horizontal break. I had two topics I wanted to cover this week and while the second one is important to me, it's probably a little less interesting for other people, but I'm using a new algorithm to categorize pitches. It works better than applying a set of logical rules to each pitch and takes less time to run too. As far as the nuts and bolts of the system, for each pitcher, the algorithm calculates the distance between each pitch using the their break and velocity. Once it has the distances between each pitch, it combines the two pitches that are closest together, recalculates the distances between that new cluster and the remaining pitches, and combines the next two objects that are closest together. It repeats this process until it reaches a certain level of difference between groups. Once the algorithm has run for an individual pitcher, all of their pitches are assigned to a certain group, and using some of the logical statements from my original filter, as well as other patterns regarding the speed and break of different types of pitches, I can label each group (and all it's members) as a specific pitch type. Labeling pitches by group membership is better than applying a set of static rules to every individual pitch in the database because it allows me to compare different pitches to the rest of that pitcher's repertoire and not worry about how it compares to a global rule. One problem with my old filter was that I had to find a way to get While some of the kinks are still being worked out of this classification system, I can still generate a list of fastballs (for pitchers who have thrown at least 500 total pitches) and see which ones have the greatest vertical break.
Look familiar? Instead of saying Webb's sinker ends up 3 inches higher than a non-spinning pitch, while a 4-seam fastball ends up 6 inches higher (or whatever the numbers were), now you can say that Webb's sinker has a 7 inch downward break.
The Other Side of the Pitch
The majority of analysis performed on the PITCH f/x data has been from the perspective of the pitcher. This makes sense, as it is really interesting to see how a certain pitch from a specific pitcher moves and how it is put into play. It's much easier to classify pitches from the pitcher's perspective, and there are a host of other "pitcher" things to look at. However, there is another half of the data that hasn't been covered as in depth. Looking at the PITCH f/x data from the hitter's perspective could yield some interesting nuggets of info, so today I'm branching out, spreading my wings, and looking at the hitter's version of the data. The easiest visual to create for a hitter is a chart showing how pitchers have approached him this season. Below on the left is a chart showing where I think these types of charts are fascinating and give you a good idea of a hitter's swing. You can easily pick out where batters feast on pitches and where they struggle. With a bigger sample than what I have right now, you could even have some confidence in your conclusions about those zones. Speaking of bigger samples, here is a chart that shows the BABIP for all RHB this season. Now instead of having 10 balls in play for a box, there are 10,000, which lets you say that low and away pitches appear to give most RHB trouble, not just Guerrero. Below on the right is a BABIP chart for I say that pitchers haven't figured out Kendall's strength yet and avoided throwing him low pitches, but (assuming I'm correct with my assessment of his weaknesses and strengths) do pitchers ever figure out these types of patterns vs. a hitter? How necessary is it to know, and pitch to, a hitter's weaknesses and strengths? Game theory might say that pitching too often to a hitter's weakness would eventually give him an advantage because he would have a good guess on the the location where the next pitch was coming. Whether that advantage would be offset by his inability to hit the pitch is unknown, but you are dealing with Major League hitter. If you gave most hitters the location of the pitch and let them focus primarily on that spot, even if it were a spot where they otherwise had trouble, I think they would be successful. Pitchers have to vary their locations, both in and around the strike zone, to avoid giving the hitter an advantage (duh). In the case of Kendall, and every other hitter I've looked at, pitchers appear to be somewhat varying their locations, although for Kendall, pitchers have thrown more low pitches than high pitches, which cues Kendall to look for more low pitches, and enhances his only strength. Now with some idea of where pitchers throw to certain hitters and how the hitters respond, lets look at what pitchers throw different hitters. Building on my pitch filter, and some of the earlier work done by Dan Fox, ultxmxpx and Josh Kalk I went through my database and attempted to label every (currently only the ones tracked from 50 feet) pitch in it . Any automated process that attempts to classify pitches is going to have mistakes and mine is no exception, but after comparing the filter's results on individual pitchers to the results I got from manually clustering pitches, I was generally pleased with the results. The filter remains a work in progress (it can't differentiate between a split-fingered fastball and curveball or a 2 and 4 seam fastball and has trouble with certain pitcher's change ups) but the results are pretty good overall. Here are the MLB averages for how frequently different pitches are thrown. This is for all pitchers vs. all batters in all situations, so it isn't the most telling statistic, but it gives a general sense of how often a fastball (or change up) is thrown.
Without further ado, here are the batters who have seen the highest and lowest frequency of each pitch, with frequency being the number of a given pitch divided by the total number of pitches that hitter has seen. (Min. of 80 total pitches tracked by the PITCH f/x system.)
The players who have seen the most fastballs are hardly surprising. Names like Bloomquist, Ausmus, Podsednik strike such fear into the hearts of pitchers across the league that pitchers are afraid to throw any off speed pitches to these batters. Or not. These hitters are awful, so pitchers don't waste their good pitches on them because they can get them out with fastballs. If I had included pitchers hitting on the list, they would have filled the top-10. I was a little confused by the inclusion of Thomas and Willits on the list, both of whom are having good seasons, but perhaps advance scouts have seen something in their swings that suggests they can't hit fastballs (or that they hit off speed pitches better than fastballs). Here's the same chart as above, but for curve balls.
It isn't earth shattering that bad hitters will see more fastballs than good hitters, or that I'm closing with a chart showing batters who have seen the highest and lowest frequency of sliders. Compared with fastballs and curve balls, there isn't as big a difference between the extreme frequencies and the average frequency for sliders , but its still fun to look at who sees the most sliders.
That Sinking Feeling: Part Deux
Sinkers have been a popular topic for research with the PITCH f/x data so I'm going to that well once again and try to determine why sinkers are hit on the ground. One explanation given for why sinkers turn into ground balls is that sinkers are ordinary fastballs thrown low in the strike-zone, and pitches low in the strike-zone are more likely to be hit on the ground. This would mean that It is pretty easy to test whether there is something unique about the fastballs thrown by pitchers with low ground ball percentages (the amount of ground balls divided by all balls in play or GB%). I order to do so, I created three different groups of pitchers, based only on their GB% (the groups were pitchers with GB%>=.49 GB%<=.35, and all others) and looked for differences in their fastballs. After I had the pitchers grouped, I removed anyone I didn't have at least 450 total pitches worth of data. 450 pitches is a round, arbitrary number, but from eyeballing it, that was about the point where pitchers with only a couple of starts in Enhanced capable parks began to show up. The chart below shows a comparison between each group's average fastball.
As a reminder, the pfx_x/z values are the horizontal and vertical differences between the actual pitch and a hypothetical pitch without spin. For ground ball pitchers, their fastballs end more than 5 inches higher than a spin-less fastball would, which might seem counter intuitive, except that every fastball ends up higher than a non-spinning pitch would, due to the backspin on a fastball. Fastballs thrown by neutral pitchers end 9 inches higher than a hypothetical pitch, so hitters are conditioned to seeing a pitch drop a certain amount between the mound to home, a distance that corresponds to ending 9 inches above a spin-less pitch. When a sinker is thrown, it drops 4 inches more than a "normal" fastball, so there is definitly something unique about sinkers and it makes sense that hitters would hit the top half of the ball and pound it into the ground. If you followed that explanation, check out the chart again. If ground balls result from hitters expecting a pitch to be higher than it is, and hitting the top of the ball, fly balls seem to come from the opposite case. Rising fastballs, which are the opposite of sinkers, are fastballs that don't drop as much as "normal", due to higher amounts of backspin. A hitter will have an opposite reaction to a rising fastball compared with a sinker, as it will drop an inch less than a normal fastball does. The batter will usually hit the bottom of the ball, resulting in either a line drive or fly ball, but not a grounder. The actual values in the chart need to be taken with a grain of salt, due to tracking differences at different stadiums, but the overall pattern is there. Now that we know sinkers are a unique pitch, it's time to test some of the other ideas from the first paragraph. Even though it is a unique pitch, the sinker could be thrown low in the strike zone, causing the ground balls. Below on the left is a chart showing what percentage of the 7385 sinkers in my sample were thrown to specific areas. There seems to be a slightly higher percentage of sinkers that end up low in the strike zone, compared both to all other sinkers and 'normal' fastballs (from the neutral group), but the differences don't seem to be anything too big, and pitchers with high GB% don't appear to throw their fastballs low in the zone any more than other pitchers. However, in order to say that a sinker at the top of the strike zone results in less than an average amount of ground balls, you need to know what the average GB% is for each area. The chart below on the left shows the GB% of normal fastballs, which can serve as an average. This chart follows the same pattern as the sinker chart, where the height of a pitch influences the result and you can see that in every region, sinkers have a higher GB%. Even though the GB% for a sinker varies depending on its location, (and the percents are influenced by the small amount of balls in play), in every region sinkers are 20-30% better at getting ground balls than normal fastballs, as illustrated by the chart on the right. In fact, it looks like if you shifted the sinker chart down one set of boxes, it would line up pretty well with the normal chart. A sinker that ends belt high gets the same GB% as a regular fastball does when it ends at the knees. So far we've looked at the PITCH f/x values of a sinker and what happens to it when it is thrown to certain areas. A sinker is a pitch with unique flight characteristics and is frequently thrown low in the strike zone, both of which contribute to very high ground ball percentages for sinkers. However, ignoring location for a second, the optical illusion that fools a batter into hitting the top of a sinker is only effective if it doesn't become the norm...so how does Lowe only throws three pitches, a sinker, change-up and curve, and looking at the GB% for each pitch in the chart below, it appears that he has the highest GB% with his change-up. He gets more total grounders from his sinker, but on a percentage basis, his change-up is better at getting grounders. This is based on a sample of just 33 change-ups in play, so the numbers could be totally wrong, but if this phenomenon is real, it means that Lowe's change-up is really his ground ball pitch.
Assuming for a second that Lowe's change-up is really his ground ball pitch, it might partially explain why hitters are unable to adjust to the sinker and keep pounding that pitch into the ground. Lowe's change-up has a vertical drop of 4.23 feet from release point to home, compared to a drop of 3.71 feet for his fastball. Does this 6-inch change result in hitters again being tricked into thinking a pitch was going to break less than it actually did and hitting the top of the ball? I don't know, and while most pitcher's change-ups have a greater vertical drop than their fastballs, not all pitchers get a higher GB% from their change-up than the fastball from that same pitcher. Unfortunately the sample sizes in all these cases are very small, so the jury is still out. I am still curious though about how the Lowes of the world continue to get such a high percent of ground balls from their sinker. Wouldn't hitters eventually realize what's happening with the movement of a sinker and adjust their swings? MLB hitters are good as a group, so there has to be some reason for them to continue hitting sinkers into the ground. The location of any pitch when it crosses the plate is related to what happens when it is put in play, and sinkers are no exception. Low sinkers are hit on the ground more frequently than high sinkers. However, regardless of where they are thrown, sinkers are hit on the ground more frequently than an average pitch in that same location. If I were to speculate, I'd say that the movement of a sinker is more important than the location because wherever a sinker is thrown, its gets more grounders than a normal fastball. I think batters have a tough time adjusting to the break of a sinker, and if the pitch is thrown low, it just increases the chances of a ground ball. *********************************************************************************************
While most pitchers have a similar overall GB% and fastball GB%, I created another table along with the ground ball table that shows the percentage of fastballs that were swung at and not put in play (the batter either missed the pitch or fouled it off).
I made this chart just for fun, but eventually I want to be able to look through all pitch types and find who has the most unhittable (or ground ball inducing) pitch, rather than just fastballs. With that list, you can get more nuanced results and really compare things like whether Saito's fastball or Santana's change-up gets more swings-and-misses.
And Now for Something Completely Different...
Rich wrote an article last week about I try not to complain unless I have a solution (yeah, right), and after Rich's article prompted me to start to playing around with the XML files that support MLB.com's hit chart, I made my own hit charts that added my features. I think adding these features will make the hit charts much more informative and valuable, and you can get an more accurate idea of a hitter's hitting pattern and potentially visualize some other cool things. Looking at an individual player is a good place to begin examining the new hit charts and below are two charts for The results of his at-bats, shown in the chart on the right, confirm that Millar doesn't have much success hitting to right field. On this chart, the black circles represent all outs, while the green dots are singles, the yellow dots are doubles, blue are triples, and red are home runs. You can see when he does go the other way, it is usually not very well hit, and results in an out. If there were ever a right-handed hitter to use an over-shift against, Millar is the perfect candidate. (I had a problem adding legends to the charts, so any chart using just red, blue and black dots is showing how each ball was hit, regardless of if it was a hit or not, while any graph with red, blue, yellow and green dots shows the result of a ball in play, such as a single or double.)
The next step in analyzing where balls are hit to is to look at what pitches were hit to certain areas. In order to answer this question I needed to merge my hit location database with my pitch database. With this "super-database", I can show hitting charts based on any conceivable split. Want to see how and where balls have been put in play against There are some problems with the MLB.com hit location data, primarily that the balls are marked based on where they are picked up by a fielder, not where they first hit the ground or where they go through the infield. By marking where a ball was picked up, you lose the information about where it should have/could have been fielded. Knowing where an outfielder picked up a ground ball is nice, but knowing exactly where that ground ball went through the infield or where a fly ball actually landed would be better. Another possible problem with the data is the ability of the scorekeeper to really know where the ball landed. There aren't any landmarks in the outfield to gauge where a ball was picked up which makes it harder to accurately plot the data. These hit charts can help create informative profiles on hitters, pitchers and stadiums and on a large scale they can even help visualize player's defensive ranges. One big advantage with the hit location data as opposed to the pitch data is that the hit chart data is complete for all stadiums for the whole year. Scorekeepers manually enter this information for every ball in play, and it even goes back for several years, allowing for possible comparisons across years.
Park Differences and Reaction Distances
If you have been following the PITCHf/x data this season, you've probably realized that the system has been implemented in more stadiums since the All-Star break, and is in 23 stadiums now. You've also probably noticed that the data provided from each stadium is slightly different. The velocity isn't very consistent between starts by the same pitcher in different stadiums, the movement of pitches seems to change and the release point has been shown to jump around as well. The release point differences are the most important because as I learned last week, there are only nine parameters captured for each pitch. The three dimensional location of the ball, as well as acceleration and initial velocity, are all captured by the camera system, with the rest of the values that are shown, either in Gameday or the xml itself, being calculated from those nine values. Any discussion about how parks affect the speed or movement of pitches has to begin with a look at the data captured at release point. Below is a table that has the average release point height (in feet) for a team's staff, both at home and in all road stadiums. The way to read the table is that the average release height for all pitchers on the Red Sox while at Fenway was 5.30 feet, and was 6.08 feet for Red Sox pitchers on the road. One problem with using this method is that it doesn't use exactly the same group of pitchers for home and road, which is due to a lack of data, but it gives a rough idea of the release point height at each stadium.
Most of the home heights are within .2 feet of their road data, with the exception of Boston, Colorado and San Francisco. However, even among these three stadiums, Fenway stands out, with the release point being .78 feet lower than the road. Every Red Sox pitcher had at least a .40 foot higher release point on road and looking at the average starting velocity of a pitch at each stadium, Red Sox pitchers throw 6.5 MPH faster on the road than at home. Clearly something is going on with the PITCHf/x system at Fenway and to a lesser extent at Coors and AT&T, and could be going on at other stadiums as well. Until we have confidence in the release points being tracked at every park, comparing data gathered at different stadiums without adjusting it will give misleading results.
Looking at individual pitchers for the Red Sox, you can see how Fenway's camera system impacts the different pitchers. X0 and z0 are the coordinates for the release point, measured as a distance from the pitcher's body and from the ground respectively, and the release point is lower at home for all the pitchers. Almost all of the pitchers also get a smaller pfx_z value at home, which would seem to indicate that their pitches have more sink at Fenway, but is actually a result of the lower release height combined with the fact that, overall, the average height when a pitch crosses the plate at Fenway is similar to the height at other parks. The initial velocity is vy0, measured in feet/second, and is slower in every case. I didn't break this chart up by pitch, which is fine for examining the release points, but when looking at the velocity it gives an average that doesn't really mean anything. Getting back to making an adjustment, the z coordinates of the release points are all roughly 10% too small at Fenway. If the Fenway x values were increased by 10% they would be a closer match for the release points on the road. However, once you make that adjustment, you need to adjust each of the other 8 parameters so that they are "measuring" at the new, adjusted release point, rather than the low release point. If you say that Fenway lowers the release point for every pitcher by 10%, and apply these adjustments to every pitch thrown at Fenway, here's what happens for Josh Beckett.
Even through the adjusted numbers match the road numbers, I'm not very confident in using this method to make large-scale adjustments. For one thing, the road numbers could be off too. For Beckett I'm looking at one road start, made in Safeco, so I could be making too big of an adjustment. The lack of a large sample of road starts for pitchers is a major weakness of the type of separation I used in the home/road charts, but once there are more starts made in stadiums with the pitch f/x system, that hopefully can change. I think any true park factors are going to need to wait until there is more data captured at all stadiums.
Here are two graphs of a randomly selected Beckett fastball and curveball at Fenway and Safeco, as viewed from the first base line. You can really see the difference that the release height makes from this view. There appear to be some differences in how the curveball moves at the different stadiums, but the fastball follows virtually the same path, just at different heights, in both cases. Each dot represents the ball's position in .05 second intervals, which segues nicely into my last section. I received a comment yesterday on my article from last week that suggested a better way to quantify the speed of a pitch was to determine how far away the pitch is when the batter has to decide whether to swing. It probably is even more intuitive to think of it like this compared to how many seconds the ball takes to arrive, so I went ahead and calculated some distances. You can test your reaction time here, and after some extensive research (emailing the link to five friends) I think a rough proxy for an MLB reaction time is around .2 seconds. If a pitch takes .513 seconds to reach the plate, as a Wakefield knuckleball does, then the hitter can let the pitch travel for .313 seconds out of Wakefield's hand before making a decision. The pitch is 19.75 feet from home plate at .313 seconds, so the hitter can wait until Wakefield's knuckleball is about 20 feet from him before making a decision. A hitter has to make a decision on a fastball on a Beckett fastball 27 feet from home, while on a The hard part of finding these numbers is determining the reaction time. The test above only involves clicking a mouse button, which is nearly instantaneous, but swinging a bat takes much longer. Even if the hitter had a reaction time of .2 seconds, once he recognized the pitch and reacted, actually swinging the bat would take some time as well. If you add on another .1 second to account for the swing, the distances are pushed back to 29 feet for Wakefield's knuckleball, 41 feet for Beckett's fastball, and 31 feet for Hill's curve. I have no idea if the .1 second swing time is accurate, but at 41 feet from the plate most pitches look very similar. Hill's curveball hasn't began to break yet and it looks very similar to Beckett's fastball. If you had a reaction time of .2 seconds and a swing that lasts .2 seconds after the reaction time, you would need to artificially speed up your reaction time and decide whether to swing at Beckett's fastball before he even released his pitch. If he were throwing his curveball or changeup instead...well, Beckett does have 148 strikeouts this year. I believe there is some overlap on reaction time and when the swing begins, which lowers the overall time used, and I think there is also some element of "Blink" involved here, where good hitters "know" to swing at a pitch before they realize why they are swinging at it. Either way, hitting is hard.
May I have Seconds?
Despite playing with the PITCHf/x data since the playoffs last season, I didn't have a very firm understanding on how the values were captured until earlier this week when I was alerted to Alan Nathan's fantastic website on the physics of baseball. The whole site is good, but I was particularly interested in the section on the PITCHf/x system. In addition to Nathan's analysis on pitch data, this section contains a treasure trove of general information about the system as well as specific definitions for each data field. Using several of Nathan's equations, I was able to quantify where a pitch is in space at any time from release until it reaches home, and using these locations, I was able visualize the entire trajectory of each pitch, similar to what is shown for each pitch in the Gameday window. The equation for finding the x position of a pitch is x(t)=x0+vx0*t+0.5*ax*t^2, where t is time, vx0 is the pitch's initial velocity in the x direction and ax is it's acceleration in the x direction. Vx0 and ax are provided in the xml, so finding the x coordinate of a pitch is as easy as plugging in a value for t. The y and z coordinates of a pitch are found using the same equation, but with the appropriate initial velocity and acceleration values. Here's the path of a
If I were really good I would have a 3 dimensional graph here, but it looks the same as the path they show in Gameday for the pitch. Each coordinate is measured in feet, with 0,0,0 being the back part of home plate and y=1.42 being the front of home plate. X measures left and right, from the catchers perspective, with negative numbers being on his left, y is the distance from the pitchers mound to home plate and z is vertical distance from the ground. This curveball ended in the high, inside quadrant of the strike-zone for a right-handed hitter. The first thing I noticed in the chart is that the pitch reached the front edge of home plate in .49 seconds. Using radar guns to measure the velocity of a pitch is established practice throughout baseball, however, the speed of a pitch varies based on where the gun is aimed, so saying a pitch is 71 MPH doesn't really mean anything. Was it 71 MPH out of the pitcher's hand? Crossing the plate? "Fast" gun? "Slow" gun? You could get four correct, but different radar readings for the same pitch. What really matters is the time a batter has to react to a pitch. Saying Hill's curveball takes .486 seconds to travel from release point to home (from y=50 to y=1.417) while his fastball takes .387 seconds shows a clear, tangible difference between the pitches. For a rough comparison, a Here's a list of the 10 pitches that have reached home fastest this season, along with the corresponding release point radar reading. (For simplicity, I only used pitches that were tracked for 50 feet, which is why Zumaya does not appear on the list.) Looking at the list and the rest of the fast pitches in my database, it appears that there might be a little bit of a park factor involved with the results, although the names are who you would expect.
Getting back to Hill, graphing the trajectory of his fastball and curveball shows the differences in flight paths. This graph is drawn as if you were looking down from above, showing movement in the x-direction, with the release points at the top right of the graph and home plate in the bottom middle. From the graph, you can see the different routes the pitches take. For the first 10 feet, Hill's curve looks very similar to his fastball, although after that the curve begins to break, moving away from left-handed hitters. The dotted line is a rough guess at the sight line for a left-handed hitter and illustrates how difficult it is for a left-handed hitter to hit a good curve from a left-handed pitcher. While both pitches begin at around the same location, the curveball actually goes behind a left-handed hitter's field of vision and appears that it will hit him for a split-second. This graph is a side view of Hill's pitches, viewed from the first base line. Again the differences between the pitches are pretty clear to see, with the curveball taking a longer route to cover the same distance as the fastball. One thing to notice on this graph is that the curveball actually goes up after Hill releases it. It's not a big movement, but the pitch reaches it's maximum z-value .05 seconds There has been research done that shows the release points measured by the PITCHf/x system are not very consistent for different stadiums, so any research that uses the release point information needs to take that into account. However, according to Dr. Nathan's website, the only values in the xml files that are observed directly are the accelerations and initial velocities and positions, all of which are based of the release point. Every other value in the xml, including where the pitch crosses the plate and the break values, are calculated from those nine observed values. This opens the door to all kinds of problems if the release points are still as inconsistent as they were at the beginning of the year. This could also help explain the park factor I mentioned with times, because if the release point is slightly off it will directly impact the time calculations. There are a number of cases where pitches are badly tracked, and another problem with the system is that it occasionally picks up a ball transfer between the umpire and pitcher. I haven't done any digging into this, so this is pure speculation, but knowing more about how the values are calculated, I think perhaps these two problems are related. If the initial values are somehow wrong (they correspond with the ball exchange), the x,y coordinates for where the ball crosses the plate are going to be calculated correctly for the ball exchange, but will not match the reality of the pitch. ******************** I referred to Alan Nathan's website countless times while I was writing this article and his kinematic equations are the basis for this article. I also want to thank him for helping answer some questions I had about the data and his equations. I highly recommend checking out his site, particularly his analysis on the PITCHf/x data.
Makin' a Filter
In each appearance by a pitcher, I found the average speed of his pitches as they crossed the plate, and then divided the velocity of each pitch in that appearance by the average, which gave me a value for each pitch, standardized for that day. I then classified each pitch as a fastball or off-speed, using only that standard value. Obviously this isn't a perfect method for classifying pitches, and there is some level of inaccuracy with the labels, but it's simple, relatively accurate for fastballs vs. off-speed pitches, and I think it's a good start in automating the classification process. Testing the method on individual pitchers, the results generally agreed with a visual inspection of their pitch chart, but the algorithm I used to classify pitches had problems with certain types of off-speed pitches. To fix the problems I used a cut-off point of the standard value to separate fastballs from everything else. Generally speaking, a pitch that was faster than the average speed was usually a fastball and anything slower was off-speed. This was the case for every type of pitcher I examined, which will be important. Some pitches are going to be improperly classified with this method as well, but the problem is smaller compared to using the algorithm and because of the similarity between different types of pitchers, this method worked better than the algorithm when classifying pitches for multiple pitchers. Here's a pitch chart from One thing to keep in mind, and it's shown clearly in Halladay's graph, is that I didn't make any attempt to separate 2-seam and 4-seam fastballs for pitchers that throw both pitches, which will slightly skew the results for those pitchers. Once I was automatically classifying individual pitchers, I went back and classified every pitch in my database as either a fastball or an off-speed pitch. Before I looked at when pitches were thrown though, I needed to establish some baselines. Of all the pitches in my database, 62% have been fastballs. Some basic splits are in the table below.
It seems that pitchers throw more fastballs to same-side hitters, but overall 62% looks pretty good as an average. Here's a list of the 10 pitchers who throw the highest and lowest percentage of fastballs (min 100 pitches).
This list is pretty interesting and the full list it came from might be even more interesting. First of all, In a previous article, I examined the pitch selection of
In every case, the percentage of fastballs thrown is lower when the pitcher needs a strikeout, which is what we expected going in (and saw in the case of Peavy and Haren). The differences between situations aren't severe, but in the 'overall' case especially, the sample size is large enough that the differences are real. Below is a table showing the pitchers who have thrown the highest and lowest percentage of fastballs when they need a strikeout (min 20 pitches). It is a little misleading to just compare the percentage of fastballs a pitcher throws when he needs a strikeout to the league average and say anything less than the league average (more breaking balls) is good while anything higher is bad. A pitcher should throw whatever pitch he has that can get the most swings-and-misses in a high K situation, and for some pitchers, their best swing-and-miss pitch happens to be their fastball. Pitchers rely on their fastballs generally, but certain pitchers should and do use it even more in situations where they need a strikeout.
I've covered some of the flaws in the methodology I used to separate pitches, but overall I was quite happy with the results. When I compared the overall fastball percentages for individual pitchers to Inside Edge on ESPN and my own individual pitcher graphs, the percentages were close in all three cases. The next step in this type of analysis is to separate out the different off-speed pitches that I lumped together, which adds another layer of information about pitchers and pitch selection. A changeup and curveball are two very different pitches and could be used for very different purposes by a pitcher. I'm going to close with one last table, this one showing the fastball percentage on extreme pitcher's counts (0&2 and 1&2) and extreme hitter's counts (3&0, 3&1).
I should have separated the 3 ball counts by the cost of a walk, but it seems amazing that pitchers are so afraid of walking a hitter in those counts that they become Zumaya-esque in terms of pitch selection, but without the amazing fastball to back it up. In a count that already favors the hitter, hitters see almost all fastballs, which is one big reason why hitters have a .630 SLG in 3&0 and 3&1 counts this year.
Not an Article about Pitching at Altitude
This entry was supposed to be about how pitches moved and behaved at different altitudes. I briefly wrote about differences in pitch movement for a Weekend Blog in May and I was planning to revisit the topic when there were more stadiums supplying the data. After the All-Star break, several new stadiums went on-line with the pitch f/x system, including Chase Field in Arizona, the stadium with the second highest elevation in baseball, and I thought I was in business. I examined how pitches moved at Chase Field (or Turner Field, the third highest stadium in baseball) compared with how they moved at parks closes to sea level, such as Petco, Safeco or McAfee, but I found virtually no changes in how pitches moved at the different altitudes. This didn't seem right intuitively and it wasn't. To make a long story short, I had forgotten to account for the distance traveled by the ball. MLB.com has varied the distance they begin tracking the pitch, called y0, and although it appears to have recently stabilized around 50 feet, it began the season at 55 feet and after June 4th varied from 40-55 feet depending on the game. Needless to say, where the pitch is initially picked up is going to make a huge difference on the distance it breaks and after going back and looking at my results again, I didn't have enough pitchers who had the same y0 value at both a high-altitude and low-altitude park. That pretty much shot the column idea, so this post turned into a catch-all, with some updates and cool graphs that I haven't had a chance to post yet. ********** Despite not writing about differences due to altitude, I wanted to share two conflicting results I got when looking at altitude differences. The first result is about ********** This is a pitch chart for Not many pitchers have a graph this "clean", with no pitches thrown in a 10 MPH range. (81-91 MPH) **********
This chart is for **********
These graphs show the Batting Average on Balls in Play (BABIP), broken up by batter/pitcher splits. I ran these in one of my first posts and had been updating them every couple of weeks since then. As a reminder, they are from the catcher's perspective, so the right hand side of the graph is inside for a LHH. For the most part, they've stayed pretty constant for the duration, but there are a couple of changes of note. In the RHH/RHP graph, the middle of the strike zone now has the highest BABIP, which wasn't the case the first time I showed the graphs. Another interesting note is the difference between the BABIP on high-inside and outside pitches. This is particularly noticeable for LHH against LHP, but all hitters have a higher BABIP on high-outside pitches compared with high-inside pitches. This connects with Perry Husband's invention of "Effective Velocity", a theory on hitting and pitching. He writes why certain pitches are tougher to hit than others, and if you click on his name and go to the bottom of that page, there is a graphic explaining it. He found that, everything else being equal, a fastball thrown high and inside looks 4 MPH faster than the same pitch thrown outside. The MPH difference isn't the only thing that goes into hitting a ball solidly, but it is interesting to think about. I'm not sure where he came up with the 4 MPH, but Husband's philosophy makes intuitive sense. In order to hit an inside pitch, the hitter needs to react quicker and meet the ball in front of the plate, leaving less reaction time, which serves the same purpose as an increase in MPH. It's interesting when two people arrive at similar conclusions using different processes. Also, I haven't done this yet, but it would be interesting to see what these breakdowns look like using the strike-zone as it is actually called by umpires. ********** That's it for this entry. I promise that next time I have a good idea for an article, I'll make sure all the data are correct
Under Pressure
Here's a chart showing Peavy's start on May 27th vs. the Brewers. He threw all four of his pitches in this start and you can see the different breaks that they have. His fastball and slider both break toward a right-handed hitter, while his changeup moves away from righties. His curve is a standard curve from a right-handed pitcher and runs away from a right-handed hitter. There really isn't anything particularly special in this chart, and I put it in to get a feel for his pitches. In this article I'm going to examine when during a game Peavy throws his pitches, and specifically, does he pitch differently in high pressure situations or low pressure situations? Before I could look at when Peavy throws his pitches I needed to classify them. As I was classifying the pitches it appeared that he had five pitches. However, when I looked at data from individual games, I could only find evidence for four pitches, the fastball, slider, changeup and curveball. I was pretty confident that he only threw those four pitches, but there was clearly another group in the season graph. This wasn't a case of stadium variation, as all these games were in San Diego, and they were all prior to June 4th, which was when MLB.com began varying the "release point" distance. After another round of looking at the data from individual games, I found two problems. One was pitches that weren't classifiable. These pitches had similar movement to Peavy's fastball, but the velocity was much slower. I'm not sure what caused this, and I removed them from the data set, but it just serves as another reminder to be careful with these data. The second problem I ran into was that Peavy had some serious variation in how his pitches moved from start to start. This is pretty different from what I found in my last article, and I'm not sure what it means. There were some patterns where every pitch in certain starts varied the same amount, which would indicate a camera change, so the differences might just be another reminder to be careful. However, for the purposes of this article, variations between starts don't matter as long as each start is consistent with itself, which was the case for the starts I examined. I ended up using Peavy's starts from 4/30, 5/11, 5/16, 5/22, 5/27, 6/7 and 6/19. Here is a table showing the percentage of each pitch that Peavy throws overall.
The chart is very basic, but one thing that stuck out to me was that Peavy throws his fastball 61% of the time, which initially seemed like a lot of fastballs. However, after comparing him to other hard throwing right-handers, such as Once I had all Peavy's pitches classified I matched them to the Leverage Index that they were thrown in. I assigned the Leverage Index (LI) at the beginning of a plate appearance to any pitch thrown during that plate appearance, with steals and other runner advancements during the play being accounted for. I split up Peavy's pitches into those that he threw when the LI was greater than one and when it was less than or equal to one. One is defined as average LI, so I'm splitting Peavy's pitches into above average (high) pressure situations and below average (low) pressure situations. Here are Peavy's LI splits according to his pitches.
You can see from the table that when the pressure is mounting, Peavy relies less on his fastball and much more on his slider, throwing it 32% of the time in high pressure situations, compared with just 19% in low pressure situations. Nearly half of all sliders that Peavy threw in my sample have come in high pressure situations, while just one-third of all his fastballs came in high pressure situations. In every game that I examined, Peavy's ratio of fastballs to sliders was smaller in high pressure situations compared to low pressure situations, as he threw 3.4 fastballs for every slider in low pressure situations, but only 1.8 fastballs per slider in high pressure situations. I was a little surprised that Peavy used his slider so much more in pressure situations. One reason for the difference could be that in low pressure situations, Peavy is more focused on getting quick outs and using more fastballs to do so. The fact that he used his slider more in pressure situations isn't surprising, but I was surprised by the magnitude of the shift. However, without someone similar to compare him to I wouldn't know if he really went to it more or if that was a pattern all pitchers shared. I used the other starting pitcher in the All-Star Game,
Whatever Peavy is doing with his slider in pressure situations, Haren is doing something very similar with his slider and splitter. Haren threw 28% splitters and 26% sliders in high pressure situations, compared with 19% and 21%, respectively, in low pressure situations. The ratio of Fastballs/Sliders and Fastballs/Splitters shows the same inverse relationship with pressure for Haren that it did with Peavy. One thing that really jumps out from these splits is the "out" pitch for each pitcher, not necessarily their best pitch, but the one they rely on to get outs. Looking at the basic chart, Peavy threw 61% fastballs, which makes it seem like that was his out pitch. However, he went hog-wild with his slider in pressure situations because that is his true out pitch. Haren relied on both his slider and splitter in pressure situations and used both of them for outs. Both Peavy and Haren have different patterns that they follow when pitching in high and low pressure situations. Both pitchers use their off-speed pitches more in high pressure situations than in low pressure situations. This seems like it would be the norm in the Major Leagues, as pitchers would rely more on fastballs in low pressure situations, possibly to avoid walking batters and turning low pressure situations into high pressure one, and possibly to avoid showing their out pitches to batters. However, I can't know for certain whether Peavy or Haren throw a relatively high percentage of fastballs in low pressure situations because I don't have the Major League average for fastballs thrown in low pressure situations. That would need to be calculated before this type of analysis goes much further. With the MLB averages for the types of pitches thrown at different levels of pressure, game theory could be applied to the analysis, and statements like "Jake Peavy throws too many (or too few) fastballs in high pressure situations" would have real meaning.
Is There Something in the Way it Moves?
Why do pitchers struggle in some starts? Without thinking too hard, I would guess poor starts are based on some combination of bad luck, bad location and bad stuff. Everyone can see when a pitcher is missing his spots and bad luck can be reasonably quantified with DIPs, but what about bad stuff? Frequently an announcer will say that a pitcher "didn't have his best stuff tonight" as a reason for his poor showing. What does that statement really mean, and is there any truth to it?
There are a million possible reasons why Halladay could have dominated in one start and been dominated in the next. Although the White Sox and Devil Rays are both weak offensively, there may have been a subtle difference between them that allowed the Devil Rays to have success. The Blue Jays defense might have played extra hard against the White Sox and taken the night off vs. Tampa. The mound might even have been raked differently or the balls were shinier in one start. The point is it could have been anything. Probably though, it wasn't something as small as the dirt on the mound and perhaps Halladay, fresh off the DL and 20 days of rest when he made his start against the White Sox, wasn't ready to resume pitching on a normal four days rest. He might not have been able to locate his pitches where he wanted and if he did locate them, the pitches themselves might not move like he wanted. Here are two charts showing the movement of his pitches in each start.
There aren't many differences in how his pitches moved between starts. The PITCHf/x system has a margin of error of plus/minus an inch, and only two parameters have differences of more than two inches, so most of the differences could be just noise. (There's also the problem with having a small sample for each pitch) The difference in vertical movement between starts on his curveball and fastball was more than two inches, with both pitches having greater drops on June 5th. It would seem that more movement on a pitch would be preferable, but Halladay's added movement didn't help him on June 5th. One other difference between the starts was that Halladay threw a lower percentage of curveballs on June 5th than on May 31st. I don't know if the difference means anything just by looking at these two starts, but it's interesting to note that the difference in curves was made up by throwing more fastballs. For whatever reason, on June 5th Halladay got no swings-and-misses with his curveball, but on May 31st, he had six swings-and-misses with his curve. Perhaps Halladay realized he couldn't get the results he needed with his curve on June 5th and went to his fastball, or he might have focused on his fastball if he thought he was going to have trouble throwing his curve around the strike zone. The consistency of Halladay's pitches, regardless of the quality of his start, is striking. The table below details three starts he made in Toronto prior to going on the DL. Just from looking at the movement data, can you tell the difference between Halladay's 10 inning performance against the Tigers, a nine inning complete game where he allowed five hits, and the game when he allowed seven earned runs in five innings of work, possibly with appendicitis? There are a couple of differences among pitches between starts, mostly with his fastball and curve again, but nothing earth shattering. In fact, in cases like the vertical movement of the fastball, the value for the bad start is in between the values of the good starts. The start on April 13 was the 10 inning complete game and April 30 was the five-hitter. The start on May 10 was the stinker and his last start before going on the DL.
While his movement remains the same in good starts and bad, how effectively does Halladay locate the ball in both types of starts? Here are two charts, from the perspective of the catcher, that show the location of Halladay's pitches in his good starts (left) and bad starts (right).
There doesn't appear to be a lot of difference between the top groups of data. A higher percentage of the pitches are around the strike zone in his good starts and he throws a couple more pitches up in the strike zone and inside to right handed hitters in his bad starts, but the differences don't appear to be major. The location differs slightly depending on the quality of the start and that probably helps make a good start better or a bad start worse (although there is a chicken and egg question about whether location causes a start to be good or if location is good because a start is good). The movement on Halladay's pitches stays very uniform from start to start. He might not have the same success each time, but it doesn't appear to be as a result of not having good stuff each time he takes the mound. However, while the small differences that I did see could be explained away because of the limitations of the technology, they might be real and contribute to his success or failure on any given start. More importantly, and something I didn't touch on at all is the interplay between the different possible movements for a pitch and how that impacts the rest of a pitcher's repertoire. If his fastball is moving in a particular fashion, does he throw certain percentages of curves and cutters? If his fastball is sinking more, how does that impact the horizontal movement on it? How much can he control the movement of a pitch? Can he tell how his pitches are moving to make those adjustments? Halladay has had success with a range of horizontal and vertical movements on his fastball, so perhaps his pitches all work in harmony to create the effect having a constant amount of movement on his fastball (or curve or any pitch). I haven't looked at other pitchers besides Halladay to see if the pattern of consistent movement across starts is true in general. Obviously Halladay is a very good pitcher, so it makes sense that he is able to maintain his skills for many starts in a row, and is more likely to get shelled because of bad luck than because his pitches are suddenly flat. I would guess that a less skilled pitcher would experience more of a change in the movement of their pitches in a good start vs. a poor start.
Ch-ch-ch-ch-changes...
Hamels has had two starts tracked by Gameday this season, both of which were in Atlanta. Additionally, the starts both took place in May, once the system had been operational for several weeks, so any differences in positioning of the camera systems should be minimal. Looking at a chart showing Hamels' pitches, there are two possible reasons why his changeup is so nasty. The first is the speed difference. His median velocity on his fastball is 92 MPH compared with his 82 MPH changeup. That 10 MPH difference means that his changeup takes (very) roughly an extra .05 seconds to reach home. .05 seconds obviously isn't much time, but when the reaction times for hitters are in the range of .4 to .5 seconds, maybe .05 seconds means more. It could be the difference between just fouling off a pitch and hitting it squarely. However, without looking at other changeups, its impossible to say whether a 10 MPH difference between his fastball and changeup means anything. The other feature of Hamels' changeup that jumps out at me is the movement. His fastball has about four inches of movement in to a left-handed hitter (positive horizontal values on these images represent movement toward a left-handed hitter, while negative means movement in the opposite direction). However, his changeup doesn't move in exactly the same way as his fastball. The changeup breaks in on left-handers more than his fastball does, and looking at the vertical movement relative to the fastball, you can see it has some sink on it as well. One way to pick out changeups when reading these graphs for most pitchers is to look for pitches that break similarly to the fastball, but are slower. Here's an extreme example of a changeup that moves almost the same as a fastball. This chart for Beckett and others have succeeded with changeups that mirror the movement of their fastballs, but that little extra movement that Hamels gets might make his pitch that much harder to hit. The difference between Beckett's fastball and changeup was seven MPH, which is close to Hamels'. I wanted to compare Hamels to other pitchers with great changeups and the first name I thought of was
Hoffman and Hamels have never won a Cy Young (although in the future Hamels will win one Cy Young…and 11 Cole Hamels'.) but Here's a chart from Santana's starts on April 8 at Comiskey Park and May 22 at Texas. I looked at both starts separately before combining them and the pitch regions were similar in both ballparks. Santana's fastball is thrown around 93 MPH while his changeup is thrown at 83 MPH, giving him a difference of 10 MPH, the same as Hamels achieved in his starts. Santana also gets different movement on his changeup compared to his fastball, although the magnitude is smaller than the four inches that Hamels and Hoffman were able to achieve.
This table shows the differences between the changeups and each pitcher’s fastball. Both types of pitches moved to the arm-side of a pitcher, but the for the same pitcher, changeups moved more than the fastballs did and also had less vertical break.
The changeup can be a very effective pitch if used properly. It seems that changeups tend to move more toward the arm-side of a pitcher and have less vertical break than that same pitcher’s fastball. This movement is one way to pick out changeups from the Gameday data. Pitcher’s with good changeups have different amounts of movement and velocity, so there are obviously multiple ways to be effective with the pitch and I think the most important factor in determining the success of a pitch is how it relates to the other pitches in a pitcher’s arsenal. However, Cole Hamels doesn't even need his changeup; he once struck a man out looking. Literally. Cole just gazed at him and the batter was retired on strikes.
Dangerous Curves
Watching First off, here's a chart showing Before I move onto Even after accounting for the different versions of technology used, in 2007 Zito still releases the ball close to the middle of the rubber, but he's not as extreme as he was in the playoffs and other pitchers actually have similar release points. Getting back to Hill, here's a chart showing his pitches from his start on May 22nd in San Diego. Despite striking out eight batters, Hill allowed five runs in six innings and took the loss in this particular start. Hill throws three or four pitches, clearly a fastball and curveball, as well as possibly a changeup and slider. I have the same uncertainty with classifying Hill's changeup and slider as I did with Zito and in the end I called one of the groups his changeup, while calling the other unknown. Looking at Hill's fastball, it has similar horizontal movement in toward left handed batters as Zito's had, although Hill's had a wider range of breaks. Even though Hill's fastball was faster, Zito's fell less vertically, possibly indicating greater backspin on the ball for Zito. Hill's curveball is tremendous. The biggest difference between his curveball and Zito's is that his has more horizontal movement. In addition to breaking 12 inches down, Hill's curve moves roughly seven inches away from left handed hitters. Zito has the same drop on his pitch, but only gets three inches of movement away from left handed hitters. With everything else (pitch speed, release point, and vertical break) being just about equal, a curve that breaks laterally as well as vertically is harder to hit than a curve just moving vertically. The horizontal break also helps classify the curveballs from the hitter's point of view, with Zito's being 12-to-6, while Hill's is more of a 12-to-7 or 8.
The chart above shows the median values for several variables that describe Zito and Hill's curveballs. Do other pitchers throw similar types of curveballs? After looking at some pitchers who throw curveballs (and doing a little fishing in my database) I found several pitchers that threw comparable curves, but Zito and Hill were still unique. The chart below shows the median values for the pitchers I looked at.
Obviously Hill and Zito have very unique curveballs. Even after looking for pitchers with the greatest vertical drops, I couldn't find other pitchers with similar curveballs. One thing I would like to look closer at is which pitches Zito and Hill actually get their fly balls on. Are the fly balls a direct result of curveballs or are they the result of a general pitching pattern? I don't have enough curveballs in my database from Zito and Hill to really get a good read on it yet, but I would guess the fly balls are more a result of a pitching pattern than actual pitches. I had a couple of things I wanted to mention before I finished. I have more data for sinker ballers now, with Webb having a couple of starts and Wang making his debut in an Enhanced stadium. I'm going to look at sinkers again in the future, and hopefully should have something new to say. I also noticed that Wakefield had several starts in Toronto, also an Enhanced stadium, and looking at his pitch charts, its not surprising that nobody can hit him when his knuckleball is working as the break values on his knuckleball look virtually random. Certain hitters also finally have enough enhanced pitches that I can look at batting average on balls in play from the hitter's perspective and have it mean something.
Location, Location, Location
The location of a pitch is one important factor in determining its fate. If a batter swings at a pitch thrown low in the strike zone, he has a good chance of hitting a ground ball, while if he swings at a higher pitch, there is a greater chance of him hitting the ball in the air. A difference in location of a couple of inches can be the difference between a home-run and a shattered bat. Pitchers need to be able to throw to precise locations and hitters need to be able to recognize if a ball is going to be hittable. As you can probably guess by now, this article is going to focus on the location of pitches, in and around the strike zone. Before I continue writing though, I need to mention something. John Beamer wrote an really interesting article earlier this week about the accuracy of the Enhanced Gameday data. Based on his examination of John looked at the spread of pitches and thought they were random enough not to worry too much about a stadium bias, but I can do a little checking too. Enhanced Gameday provides an x,y location, tracked by the camera system, of pitches as they cross the plate, as well as an x,y location entered by a human stringer. The stringer enters the location where he thinks the ball crossed the plate. Here's a plot of the X coordinate for the computer generated values vs. the human entered values. As you can see, it's a pretty good match overall. I'm not looking for a 100% match, and I don't totally trust human entry on this either, as it's pretty tough to actually tell where the pitch was when it crossed the plate, so I'm comfortable using the camera-tracked values in this case. Getting back to the article, lets look at where right handed pitchers throw to right handed hitters. Of the 11,109 pitches I have from these confrontations, here is where they all ended up. The strike zone is the red box in the middle and the graph is from the catcher's perspective. The numbers in each grid are simply the number of pitches thrown in that region. I didn't convert these into percents because the raw numbers give a sense of the number of pitches I have for each split. The chart is cropped on the sides and the bottom to focus on pitches that were near the strike zone. It's nice to see that most of the pitches are located in the strike zone. This seems obvious, but it serves as another quick check on the accuracy of the data. I liked the simplicity of this layout and some basic trends pop out right away. Right handed pitchers work away from right handed hitters, and when they work outside the strike zone, it's typically low and away. They throw below the strike zone more than they throw above it. Digging a little deeper, the three regions just off the plate on both sides (three inside and three outside) are interesting. At each height, there were more pitches outside than inside, but as the height increases the number of inside pitches remains relatively constant and the amount of outside pitches decreases. I have no idea if this is an artifact or an actual pattern, so here's the same graph, but for left handed hitters. For left handed hitters, pitchers again threw more pitches outside, and were more inclinded to throw pitches below the strike zone than above it. As the height increases with a left hander at the plate however, there is more of a chance of an outside pitch. Do these trends exist when left handed pitchers are on the mound? Here are the two charts for left handed pitchers, but there doesn't appear to be much of a continuation of the trend. The other trends about working outside and below the strike zone also don't seem as clear, if they exist at all.
It's nice to know where pitchers threw the ball, but what actually happened to those pitches when they reached home? Focusing on right handed pitchers throwing to right handed hitters, here is a chart showing the percentages of pitches in each region that are swung at. Right handed hitters swing at anything in the strike zone, except pitches down and away. Those pitches are strikes but hitters will swing at them only half the time, similar the frequently they chase pitches in regions abutting the strike zone. My guess is that right handed hitters as a group are unable to drive the low and away pitch, so they don't swing at it. They can afford to take the pitch if they don't have two strikes. However, right handed pitchers have figured out that right handed hitters don't frequently swing at that pitch and consequently throw to that region more than any other region. Hitters may not swing at pitches in that region because they feel they are balls, although of the 406 pitches not swung at in that region, 69% (282) were called strikes. When hitters put pitches from that region into play, they had a .298 batting average on balls in play, which surprisingly isn't the lowest BABIP for pitches in the strike zone. Perhaps low and away isn't a utopia for pitchers after all. If fewer than half of right handed hitters swing at a strike, the only hitters who do swing at that pitch must be confident they can get a hit out of it, resulting in the average BABIP. One surprising item on this chart is that the BABIP for pitches right down the middle is not the highest. Three corners are all hot zones for right handed hitters as a group. One explanation for the lower than expected BABIP is if 70% of pitches down the middle are swung at, a lot of those swings will be taken by bad hitters, swinging because of the location, as opposed to the pitch low and away, where the only hitters who swing at it know they can hit it. The swing percentage and BABIP charts for left handed hitters facing right handed pitching are below. When left handed hitters face right handed pitchers, they think they can hit the pitch that is low and away, but despite swinging at it 59% of the time their BABIP is only .238. The location must be especially tempting for left handed hitters to get those results and continue swinging at it. Not surprisingly, right handed pitchers threw the second most number of pitches to that region. Lefties also appear to be vulnerable up and in, but right handed pitchers haven't targeted that area yet. Another interesting detail on the swing percentage charts is that despite a difference in the distribution of swings, both left handed hitters and right handed hitters swung at 63% of pitches in the strike zone.
Before I wrap up the article, I should mention that I do have the left handed pitching versions of the Swing Percentage and BABIP charts, but I don't have enough pitches in each region to draw any real conclusions from them, so I didn't include them. Even with the graphs I did use, I would feel more comfortable making the statements I made with a full season of data to back me up. I learned a couple of interesting things while writing this article though. I had no idea how frequently batters swing at pitches in different areas of the strike zone. I knew roughly how much batters swung, but to actually see where they swing at pitches is pretty cool. With enough data, I would like to expand those charts, and do them for individual players. I would love to see what
That Sinking Feeling
This week I wanted to look more in-depth at the aerodynamic fingerprints of different pitches, particularly sinkers. A sinker is a two-seam fastball that drops as it approaches the batter and is frequently pounded into the ground by a hitter. Pitchers who throw good sinkers tend to rely heavily on the pitch and don't need to worry as much as a "normal" pitcher about changing speeds. The whole point of changing speeds and throwing different pitches is to induce weak contact (or strike-outs), but when a sinker is thrown properly, a batter generally makes poor contact and hits it on the ground anyway. Armed with detailed information about each pitch, I looked at three sinkerballers and made some interesting observations about each of them.
Another thing to notice on this chart is Lowe's curveball. He throws the pitch infrequently, but both the horizontal and vertical break (compared to a pitch with no spin) are around zero inches. According to the data his curve ends up almost exactly where a pitch with no spin would, and with a speed of 82 MPH, appears to be a meat-ball. Fortunately for Lowe, this isn't the case. The pitch has some movement, measured by the length of the break (defined as the measurement of the greatest distance between the trajectory of the pitch at any point between the release point and the front of home plate, and the straight line path from the release point and the front of home plate) which is 11.5 inches, the greatest of any of his pitches. The hump in Lowe's curveball creates enough deception to allow him to throw it on occasion without getting burned. Colorado's
Comparing the horizontal break on Cook's sinker to Lowe's reinforces how consistent Lowe was. The chart on the right shows Lowe's start on April 13, and even though Lowe had a more consistent break (a tighter bunching of clusters) for all of his pitches compared to Cook, the horizontal break of the sinker was especially consistent. Cook's curve has a break pattern that is typical of a curveball, with the pitch ending up lower than would be expected with a non-spinning pitch. Compared with Lowe's curve, the vertical break is on the left of the horizontal break in the chart, which I believe is a graphic indicator of a curveball. Despite these differences in the way their sinkers moved, Lowe and Cook both had excellent starts in the games I examined, so there are clearly multiple ways to skin a cat here. I wanted to look at another NL West sinkerball, Excluding Webb, the only other true sinkerball I had a reasonable amount of data for was The table below shows some interesting information about the three sinkers examined. The numbers measuring the pitches are all median values as opposed to mean values. Silva relies on his sinker more than Cook or Lowe, but his sinker has less of a downward break, measured by both the vertical break compared to a non-spinning ball and the length of the break, which is the number I used to describe the hump in Lowe's curveball. Silva's average sinker ended roughly nine inches higher than a non-spinning pitch would have, while Lowe's and Cook's pitches ended roughly four inches above the imaginary terminus. The backspin on a pitch is what causes it to end up higher than a non-spinning pitch would, so Silva's sinker must have more backspin than Lowe's or Cook's. When a hitter hits a sinker with too much backspin, he still hits a grounder, but as An average sinker from Silva reached its high point roughly seven inches above an imaginary line from release point to home plate, compared to roughly nine inches for Lowe and Cook, leading to a smaller vertical drop for Silva. These observations seem to jive pretty well with reality, as Lowe and Cook are both thought to have better sinkers than Silva, and one thing that could lead to a more effective sinker is getting more downward movement on the pitch.
I was very pleased to discover that pitches were able to be identified using just the horizontal and vertical break values from Gameday. In the future, I'd like to continue looking at different pitches and see the differences between say, |