Touching BasesDecember 31, 2009
Pitch Counts and Pitchf/x
By Jeremy Greenhouse

I remember Randy Johnson throwing 99 to finish a complete game. Back in their day, Nolan Ryan and Bob Feller probably did that on a regular basis (if you were to ask them). There's a lengthy list of early 20th century pitchers who pitched complete games in both ends of a doubleheader. So what's the driving force behind the pitch count craze? Are we going soft?

I don't think there's some grand scheme to baby pitchers. I do think that pitchers nowadays exert exponentially more effort on each pitch than pitchers of yesteryear, but our contemporaries could still probably hold up past the hundred pitch mark. The main reason pitchers get pulled before they reach their limit is because there's little incentive not to pull them. Take a look at baseball reference's splits. Pitchers allow a .726 OPS the first time through the order, then the OPS jumps 40 points the next time through and another 40 points after that. So managers make the correct decision to insert a reliever who has the advantage of facing batters for the first time. With eight-man bullpens, there's no reason not to go to a reliever early. So the question becomes not if, in the current environment, we should continue to adhere to pitch counts, but why? Does the pitcher lose effectiveness, or does the batter adjust to even the fastest of fastballs having already seen in in his three previous plate appearances?

With pitchf/x data, you can tease out the pitcher's part in the pitcher/batter matchup. A pitcher really controls five things:

-Where the ball is released
-Where the ball lands
-How hard the ball is thrown
-How much the ball spins
-What direction the ball spins

Here, I will concern myself with the final three components, which I believe define what we call a pitcher's "stuff." For example, the average fastball from a right-handed pitcher (92 MPH, nine inches of rise, seven inches of run) is worth about half a run below average per 100 pitches. I will call that its StuffRV. The following graph demonstrates the average StuffRV (per 100) and a smoothed out actual run value (per 100).

There's a lot going on here.

-Our main concern is with a pitcher's endurance with regards to his stuff. The takeaway from this graph, then, is that from a pitcher's 10th pitch to his 60th pitch, his stuff will deteriorate by about a 10th of a run per 100 pitches.

-My methodology grades out fastballs as inferior to breaking balls. You can tell by looking at the very first mark on the graph. A pitcher's first pitch of the day is a fastball about 80% of the time, while in total, pitchers throw fastballs 60% of the time. On an 0-0 count otherwise, pitchers throw fastballs just under three quarters of the time. Same as on pitches two through ten: 70-75%. For some reason, pitchers like to start their outings off with a fastball.

-A pitcher's success is, of course, largely dependent on the batter, and you can see when each lineup spot tends to hit by following the true run value curve. Pitchers face the eighth and ninth batters in the order generally during their 25th to 35th pitches and again their 60th to 70th pitches. The two peaks of the True RV line occur when starting pitchers are generally facing the 4th and 5th batters in the lineup.

-Relievers have better stuff than starters. The section from 1-15 pitches is composed mostly of relievers, and that's the lowest trough in the StuffRV curve.

-Those pitchers who managers leave in past the 100-pitch mark are well above average, and their stuff continues to be above average. I'll account for this survivor bias another time. For now, I'd rather do brief case studies of one pitcher who maintains his stuff throughout the game, and another who does not.

I correlated every pitcher's pitch count with his StuffRV on that pitch. Brett Anderson seems to pick up steam the deeper he goes into a game. I classified his pitches into four clusters: fastball, slider. changeup, curveball So the first thing I did was look to see trends in his velocity and movement. Well, nothing really stood out. His slider gains almost an inch in movement by the end of the game, but I don't think that's it. Then I remembered that Anderson's slider was the most valuable slider in baseball last year, and it edges out Zack Greinke's as the *nastiest* starter's slider in baseball by my rankings.

Pitches FB SL CH CU
1-25 67% 23% 6% 4%
26-50 51% 28% 14% 8%
51-75 43% 31% 12% 14%
75+ 39% 38% 11% 12%

So there you go. He challenges hitters with fastballs the first time through the lineup and then switches to mainly off-speed pitches, which are his bread and butter. Hence, you might say, he improves his stuff as the game goes on.

Jered Weaver, on the other hand, has worse stuff by my calculation as the game goes on. Weaver throws his fastball 68% of the time in his first 25 pitches, compared to 52% from his 51st pitch on, and in exchange his changeup usage increases from 10% to 23%. Not only is there a difference in Weaver's pitch selection, but there's also a notable change in his pitch quality. Here are the characteristics of his fastball as the game goes on:

Pitches Velocity StuffRV True RV
1-25 90.0 -0.19 -0.29
26-50 89.7 -0.13 0.06
51-75 89.2 -0.10 0.63
75 89.0 -0.07 0.16

But pitchers who have a changeup as good as Weaver's don't rely on stuff to get by. Weaver's all about deception. And that stuff I don't know how to measure.

Comments

Hey this is real cool stuff Jeremy. How exactly is the stuff rv calculated? Are you using the LOESS technique or some kind of bins?

Wow, Jeremy. I really like it.

It's going to be interesting to see where you go with this from here - Things like accounting for the survivorship bias, and some other refinements you hinted at.

One huge thing would be to account for the batters faced, to flatten that curve out a bit! That would give a much clearer picture, but I'm sure would require some pretty severe and nasty data munching to get right.

Good luck!

And thank you.

Awesome stuff as always, Jeremy.

Nick, all I know how to do is local regression. It seems to work for most things. Chris Moore uses other techniques for this type of stuff though, and he's much smarter than I am.

Patrick, I'm going to use the delta method next so it will look like an aging curve, and then I'll look into what MGL did in his recent aging study to account for survivorship bias, which I think is an issue here.

Sully, thanks.

So how do you equate "stuff" throughout the game with results? For example, you said Anderson "seems" to pick up steam the deeper he goes into the game from a stuff standpoint. By OPS, though, he starts at .722 OPS first time through the order, improves some to .686 the second time, then regresses to .734.

Greinke, on the other hand, starts at .704 first time through, then improves a lot to .606 second time, then an amazing .512 the third time. Does his stuff get better? And if yes, is it safe to assume that it gets better at a better rate than Anderson?

ecp, I tried to make stuff independent of results for Anderson and Weaver. My numbers show that there's little correlation between Greinke's pitch count and his stuff. I think he showed last year that he's a masterful pitcher, and those numbers you throw out are astounding. Clearly, he's able to keep hitters off balance better than Anderson regardless of how they maintain their stuff.

Jeremy: You know how much I love this stuff. I don't know if it's the pitcher's stuff or your stuff that is the best stuff on Earth.

We talked about some of the problems with run values as we both both know RV are biased in favor of breaking balls over fastballs based on usage depending on the count. As you told me, it would take a better pitch classification system to really nail this one down. Absent that, we should at least all be aware of the bias in StuffRV.

Lastly, you might consider separating starters and relievers to see if there is much difference in StuffRV by pitch count.

Keep up the great work. I'm a big believer in process as well as results but think the former generally is more in our control than the latter. As such, I really like breaking down pitchers in this manner. Thanks.

Hmm. I'm trying to reconcile this in my head. I guess I didn't get that you were trying to keep stuff independent of results, expecially since you seem to start out by saying that you were trying to decide if pitchers lose effectiveness the deeper they get into games. Or is this just the beginning of the process you are using to get to the answers you want?

Thanks Rich. separating starters and relievers would have a lot of value, not only in seeing how well each group maintains their stuff, but to see the difference in each group's stuff keeping pitchers constant.

ECP, this is supposed to be the beginning, but I'm having a whole lot of trouble making progress today.

Jeremy:
This is cool stuff, but I don't know that we've fully established how much information there is in pitch properties about performance (this applies to my work as well, of course). Is the change in stuffRV reflecting changes in pitch selection? Or selection bias (better pitchers throwing more pitches in the 10-25 range, or 100+ range)? Or does it reflect changes in the underlying variables that affect performance?

We both have some work that suggests that pitch properties can help us evaluate pitchers, in aggregate, but I am still a little skeptical that changes in pitch properties *within* pitcher can tell us about changes in his effectiveness.

Along this theme, I think it would be really interesting to see whether changes in pitch properties predicts dips in performance or starters getting pulled from the game. That would be worth a sabermetric Nobel imho.

Location is part of a pitchers stuff. When you watch a pitcher like Verlander, Halladay, etc. go deep in a game you will see his velocity slowly drop. But then in his last inning the velocity will spike. This is the pitcher over-throwing to finish off his start. This will come with a loss in control. Hence control is part of stuff.

I believe you could either try to correct the curve to remove the velocity spike at the end. Or possibly you could check to see if the velocity spike correlates to a loss of spin from overthrowing. Maybe the spin and/or movement would be a better indicator?


Also, the effectiveness of each pitch is very deceiving. For example most pitchers don't throw the slider in the strike zone and only on a two strike count. Thus it would rarely be put in play. If you truly want effectiveness then some other measure would be needed. Maybe throwing out all that don't start in the zone? Or don't end in the strike zone?


Further, when talking about late in the game it's not just good pitchers. It's any pitcher that happens to be having a good game.


It would also be interesting to know how fast a pitchers stuff is lost. For someone that starts to lose it early but slowly it's easy for the manager & pitching coach to know when to pull him. But for someone that loses it exponentially then the pitching coach must be ready to pull that pitcher immediately.

comments aside, that is fascinating work.

Chris,

The changes in stuffRV reflect changes in all three things you mentioned--pitch selection, sample selection, underlying performance. Next time I'll use separate samples of only starters and relievers, and I've been working on my own pitch classifications, so hopefully I can detect more of the underlying performance and less of the sampling issues.

I'm positive that changes in pitch properties within a pitcher changes his effectiveness. Again, separating the signal from the noise is the problem.

I agree that if some of this stuff could tell you in real time whether a pitcher was ready to be pulled, that would be an impressive contribution. John makes the same point.

John, I don't follow your logic about control being part of stuff.

There are certainly issues with using run values to measure the effectiveness of pitches, but I myself am still trying to wrap my head around why breaking balls have lower run values than fastballs. The conclusion that I keep coming back to is that pitchers don't throw enough breaking balls.

Jeremy, what I'm getting at is pitchers typically throw 90% effort all game. When they reach back for something extra late in the game their control will suffer. I think that part is straight forward. If you look solely at velocity you might be fooled by the spike at the end. That is where possibly looking at location or spin might be able to properly tell you where their stuff is really at.


Re:breaking balls. Many pitchers rely on breaking balls in pitcher friendly counts, that is huge. hmmmmmm.

Why not compare pitch types based on the count? That could be huge. Obviously a 2-0 fastball will fare worse than an 0-2 slider. But how does a 2-0 change fare versus a 2-0 fastball? IMO that would be a much better indicator.

Jeremy,

After thinking about it for a few minutes, here's my best thought/guess on the different run values.

Start with this fact - Pitchers throw more breaking balls in pitchers counts, and more fastballs in hitters counts. This is them trying to avoid walks and make hitters strike out or hit the ball poorly, of course.

I think the disparity comes here: The average run value of a pitch thrown in pitchers counts would, I would think, always be higher than that of the same pitch thrown in a hitters count. Just as a reflection of the count.

So a 3-0 fastball is a worse pitch, by run values, than an 0-2 fastball, at least by this guess. (You'll notice I'm avoiding saying a pitch has a higher or lower run value, because to say that you have to make clear - RV for the pitcher or the hitter? Etc. Saying a pitch is "better" seems clear... Not appropriate for actually running the numbers, but it clarifies my little non-numerical mumblings!)

So then that means because more breaking pitches are thrown in pitchers counts, they're worth more than fastballs. You could look at average RV by pitch type and count, and see what you find.

Still... Complicated!
And perhaps you're right and pitches should throw more breaking balls. I'm going to turn my brain off and go back to lunch.

Gentlemen,

Run values are calculated in such a way that the count should not matter. Pitchers are set up to succeed on an 0-2 count, so if they do succeed, the reward is not great. Pitchers are set up to fail on a 3-0 count, so if they fail, the cost is not great.

If a pitcher has a fantastic breaking ball and crummy fastball, that shouldn't really be reflected that much by the true run values. He should be throwing his awesome breaking ball so often that hitters are able to time it, but by doing so, his fastball will not be so crummy, and will catch hitters off guard. There's a break-even point that I'm guessing most pitchers haven't reached.

Ok, so it takes into account the count.

Does it only account for pitches where there is some sort of result? ie walk, hit, out? Or do pitches taken for balls and/or strikes also have a run value?

I would think breaking balls would create many more balls and put more counts into hitters counts.