More Run Values
In the time I've been looking at the pitch f/x data I've occasionally stumbled onto something I thought was so interesting and so cool that I couldn't wait to share it with someone. The run value of different pitches is one of these things and whatever enjoyment you've gained from reading and discussing these articles, you can probably double it for me. The research I did for last week's article was some of the most interesting work I've done with the pitch f/x data, and without any more introduction, here's this week's article.
In the comments on last week's article and elsewhere, there were some questions about the methods I employed for calculating the run value of each pitch. There were some suggestions made and while I'm not here to talk about the past and explain how I made the calculations last week, in the interest of transparency, here's what I did this week and will be doing in the future. Starting with the wOBA for every ball-strike count, I subtracted the league average wOBA (.332) from each count to determine how much above or below average each count was for wOBA.
Using those wOBA values, I then determined how many runs were added in every count if the pitcher threw a ball or strike. This is the same process I used last week, but now instead of averaging the run values of a ball and strike, this time I kept the data separate, so that a strike thrown in an 0&2 count has a different value than a strike thrown in an 0&1 count. I repeated the same process for balls in play as well, which is something I didn't do last week, and kept them separated by count as well. This way, if the batter is up 2&0, but grounds out, the pitch that created the groundout gets more credit than if he had grounded out in an 0&0 count.
When I was done this process I had the value of almost anything that could happen to a pitch after it left the pitcher's hand, and if you're interested, a table with the data is presented below.
Count wOBA Runs/PA ValB ValS Val1B Val2B Val3B ValHR ValOut 3&0 0.570 0.207 0.131 -0.070 0.287 0.583 0.861 1.200 -0.496 3&1 0.490 0.137 0.201 -0.076 0.356 0.652 0.930 1.269 -0.426 2&0 0.443 0.097 0.110 -0.062 0.397 0.693 0.971 1.310 -0.385 3&2 0.403 0.062 0.276 -0.351 0.432 0.728 1.006 1.345 -0.350 2&1 0.372 0.035 0.103 -0.071 0.459 0.755 1.033 1.372 -0.323 1&0 0.371 0.034 0.063 -0.050 0.460 0.756 1.034 1.373 -0.323 0&0 0.332 0.000 0.034 -0.043 0.494 0.790 1.068 1.407 -0.289 1&1 0.314 -0.016 0.050 -0.067 0.510 0.805 1.083 1.423 -0.273 2&2 0.290 -0.037 0.098 -0.252 0.530 0.826 1.104 1.443 -0.252 0&1 0.283 -0.043 0.027 -0.062 0.537 0.832 1.110 1.450 -0.246 1&2 0.237 -0.083 0.046 -0.206 0.577 0.872 1.150 1.490 -0.206 0&2 0.212 -0.104 0.022 -0.184 0.598 0.894 1.172 1.511 -0.184
Once I knew the values of events by count, I just counted the number of events that each pitch created and multiplied them by their value to get the overall value of the pitch. One huge benefit to finding the value of pitches using this 'by count' method is that it automatically accounts for the usage of every pitch. Scott Kazmir's fastball (to righties) does very well in this analysis, but last week, when I looked at which pitches had prevented the most runs overall (which is slightly deceptive because certain pitchers had more games in pitch f/x enabled ballparks), Kazmir's fastball prevented 5.47 runs compared to an average pitch. However, this week, when I factored in the count, Kazmir's fastball to righties prevented 9.99 runs over an average pitch. Without thinking too hard, factoring in the count helps Kazmir's fastball because it's a pitch he uses to get swings-and-misses when he needs them. Other pitches, like Brandon Webb's sinker (13.28 RAA last week vs. 13.36 RAA this week) or Kason Gabbard's changeup (7.72 RAA last week vs. 7.67 RAA this week) were unaffected by the calculation change. Overall, the changes were not that big, but using the value by count is the correct way to account for situational pitching.
One thing I neglected to include in the article last week was any information about global averages. There's no such thing as an overall 'average' pitch, but I found the averages for all the different subgroups of pitches I had. Now, when comparing pitches, there's a handy reference for what an average pitch thrown by a certain type of pitcher to a certain type of hitter is worth. The table below has identifying information about the pitch, the frequency that the given group of pitchers threw it to the given group of batters, and the average run value for each type of pitch. The way to read the first line of the table is that of all pitches thrown to LHH by LHP, 14% were curveballs. A LHP to LHH curveball prevents .0117 runs more than an 'average' pitch, and given 100 pitches from a LHP to a LHH, distributed via the frequencies for his pitches, the curveball would prevent .20 runs more than an average pitch.
Pitcher Pitch Batter Freq. Avg. Per 100 L CB L 0.14 -0.0117 -0.18 L CH L 0.09 0.0000 -0.01 L CT L 0.03 -0.0081 -0.02 L FB L 0.55 0.0018 0.02 L SL L 0.17 -0.0033 -0.08 --------------------------------------------- L CB R 0.11 -0.0035 -0.05 L CH R 0.21 0.0062 0.11 L CT R 0.03 0.0143 0.04 L FB R 0.55 0.0072 0.31 L SL R 0.10 0.0076 0.07 --------------------------------------------- R CB L 0.10 -0.0022 -0.03 R CH L 0.16 0.0001 -0.02 R CT L 0.06 0.0006 0.00 R FB L 0.56 0.0056 0.23 R SL L 0.11 -0.0008 -0.02 --------------------------------------------- R CB R 0.10 -0.0032 -0.04 R CH R 0.07 0.0012 0.00 R CT R 0.06 -0.0051 -0.03 R FB R 0.56 -0.0017 -0.18 R SL R 0.20 -0.0049 -0.12
Not surprisingly, a curveball thrown by a LHP to a LHH has the saves the most runs compared to an average pitch. However, when examining Barry Zito's curve to LHH, I'm not interested in an 'average' pitch, I'm interested in other curveballs thrown by LHP to LHH. These averages let me make that comparison, and compare pitches to the baseline of an 'average' pitch of that type (RHP CB to RHH, RHP CB to LHH, etc.), rather than to an 'average' pitch. For the most part, the adjustments are small, but, again, its the right way to make the calculations, and gives a better indication of the actual value of the pitch.
However, without knowing how often Zito actually throws curveballs to left-handed hitters, it's impossible to get a feel for how effective the pitch truly is. It could be a really nasty pitch, but if part of the effectiveness is due to the infrequency that it's thrown, it won't be a great deal of help to the pitcher in preventing runs overall. The Per 100 field incorporates the pitcher's usage of every pitch to gauge how good the pitch is at preventing runs. To calculate this value, I multiplied the frequency a pitch was thrown by it's average value. Multiplying that number by a constant, in this case 100, gives the total number of runs the pitch would have saved compared to an average pitch of that type, for 100 pitches split up by the pitcher's normal pitch selection. I used 100 as the constant to have some internal consistency with Rich's work on strikeouts/100 pitches. 100 is fairly easy to calculate in your head too.
Last week I mentioned that collectively, Brandon Webb's pitches were 18 runs better than average and wondered if this sum would correspond to his wins above average. In my calculations last week I accidentally compared Webb to a replacement-level starting pitcher as opposed to an average pitcher, and got an answer that didn't make sense. I have 113 innings of pitch f/x data for Webb, and in that time he posted an ERA of 2.55. That works out to 2.8 wins above average, while Webb's pitches collectively were 26.9 runs better than average. Assuming roughly 10 runs/win, that's a pretty close match. I threatened to write a full article on this subject last week and I'm going to follow through on that threat once I get a better handle on the full data-set, but I just wanted to make this correction this week.
The next step with this type of analysis lies in refining the linear weights value of every event. Adjusting for park is probably the next easiest adjustment to make, and after that, the next adjustment would be for individual pitchers so that every pitcher is his own universe. I think some of those adjustments are overkill based on the amount of data that are in my database right now, but over the course of the 2008 season its something to look for. Properly regressing the pitch values and finding out how much of the value is based on skill and how much is based on luck is another very important adjustment to make. I've roughly regressed the LWTS/pitch values to account for different sample sizes, but actually determining how many of the runs that Kazmir's fastball prevents are due to qualities of the pitch and how many are due to luck is important.