F/X VisualizationsMay 09, 2011
Does the Ump Care How Long the Game Is?
By Dave Allen

During this weekend's Boston-Minnesota series there was another Joe West kerfuffle and the play-by-play guys brought up Joe West's history with Boston. They mentioned West's comments last year that he did not like the Boston Red Sox and New York Yankee style of play, particularly those teams' long games. Setting aside one's own opinion game length and how appropriate it is for an umpire to criticize particular teams, I am sure that umpires — like everyone else — notice when games drag on. But unlike everyone else they are in a unique position to do something about it. So based on West's comments I wondered whether umpires expand the strike zone during long games to speed things along.

To look at this I used the, conveniently time-stamped, pitchf/x data. I collected all pitches made in the sixth through eighth innings and the looked at how long into the game each was made. For example, the average pitch in the sixth inning was made 1 hour and 48 minutes after the start of the game. Then I formed two subsets of these pitches, those in the top 5% of length for their half inning, and those in the bottom 5% of length for their half inning. For example the 'long' group included pitches from the bottom of the eighth inning that were thrown 3 hours and 14 minutes or more after the game started. Pitches from the bottom of the eighth inning were included in the 'short' group if they were thrown before 2 hours and 1 minute since the game started. And similarly for other half innings. The top and bottom of the inning were done separately so that pitches from the top of innings didn't over represent in the 'short' group and bottom of the inning pitches in the 'long' group.

So these pitches come from situations where the game has already gone on for a very long or short time when they were thrown. Now we are interested in how the strike zone was called on these two groups. Unfortunately there might already be a sampling bias in the data. 'Long' games might have umps with smaller strike zones, that being why the game has gone on so long. So a more clever WOWY approach would be preferential, but I couldn't come up with one.

With that limitation in mind let's see how the strike zones of the two groups compared. To first see how the top and bottom of the zone were called I considered taken pitches that were clearly in the zone horizontally ( -0.5 > px < 0.5), and looked at their called strike rate by normalized pitch height.
Effectively no difference. The top and bottom of the zone were called at close to the same spot for both samples of pitches (and very close to sz_bot and sz_top, showing that the stringers do a pretty good job with these values).

Turning to the horizontal zone I similarly looked at pitches that were clearly in the zone vertically (pz in the middle half of the interval between sz_top and sz_bot), and in this case separated by batter handedness.
Again there is almost no difference. And if anything there is a very slight difference on the right edge of the zone (from the umpire's perspective), with the 'long' zone slightly smaller. The opposite effect if the umpire was trying speed the game up. Although the difference is tiny.

So overall, at least by this methodology, there is no difference in how the zone is called in long versus short games. If the umpires are annoyed by having to call a game going into its third hour in the seventh inning they don't seem to let it affect their strike zone. Score one for the boys in blue.


Good job, Dave. When I was playing in high school, an umpire who made a bad call that went in favor of the team winning late in the game was greeted derisively with, "The chicken is getting cold!" Your study suggests that umpires are just as happy waiting for their next meal than not.

Thanks, Rich. Yeah I was surprised I thought that I was going to see an effect.

Can you segment the analysis it by umpire? Say look at Joe West's strike zone in long and short games instead of averaging all the umpires together? I can envision a scenario in which most umpires are consistent but a few stray significantly. This scenario wouldn't show up in the average.


Yeah that is definitely possible to do. The only issue there is sample size. I am not sure whether there are enough pitches in each senario (long v short games) for any specific umpire to make a firm conclusion.

Interesting that you find this. I actually have been working with some of the similar data, and umpires are more likely to call strikes in Inning #9 than any other inning by about 1 percentage point more than the first inning and between 1 and 2 percentage points as compared to other innings.

However, after the 9th, this isn't the case. I'm guessing because the game is close. In the 9th, for the most part, they know who's going to win and that slight increase in strike calling won't have a real effect as it may in a tie game.

The model includes controls for batter height, pitcher and batter performance, home vs. away, a semi-parametric pitch location smoothing, home and away team quality, count, pitcher and batter age, umpire fixed effects, pitch type, and pitch velocity.

I don't have any interactions between the umpire id dummies and the inning, though, as BB suggests.


Interesting, looking forward to your post on the results of that model.

Certainly Joe West has enough games under his belt for an accurate graph. I'm curious to see. He certainly lets his emotion effect his press conferences. :p

Nice one, Dave!

As a high school ump, I've heard that "late for dinner" stuff before. I usually respond by saying that it's a nice day and we're playing ball, so what's the hurry? Heck, even on a cold day, I'm probably the warmest guy on the field with Under Armour, long johns and all the gear.

Not sure how far into the minutiae you wish to get, but I know how I call a game. So, I'm curious as to how MLB umps call 3-0 and 0-2 pitches. Has little to do with the speed of the game, but I know I widen the zone on 3-0 and a pitcher has to earn it on 0-2.

As an aside, there was only one time when I genuinely rigged up a call. Top of the seventh in a one-sided game (home team winning)...pinch hitter for the visitors with two down. I saw this kid in a scrimmage a couple weeks before and knew he'd be lucky to make any decent contact. First pitch gets fouled off my mask and I saw stars. Both benches were really good and asked if I was OK. Took about a minute to get my wits about me--catcher went out to the mound and all. So, I say to myself that, since this kid doesn't stand a chance against the kid pitching, who's throwing in the high 80s, I'd guess, that anything close is going to be called strike two. After all, he's going to get to two strikes soon and very likely strike three; I just thought I'd make it quicker . Next pitch is out of the zone but close enough. "Striiiiike." Next pitch? Swinging strike three. Sayonara.

Anyway, I'd love to see the way MLB umps call marginal pitches in certain situations, 3-0, 0-2, when he's taken a pop off the mask, that sort of thing.


Unfortunately, I can't post the full results at my site, as the underlying part of the work is with colleagues here for an academic paper.

Interestingly, when I plot the 9th inning vs. other innings, the difference isn't really visible to the naked eye (looks just like yours above). Most likely because the change so tiny if it in fact exists.

Without doing a full transformation, I can say the coefficient in the semiparametric model is (in log-odds form) about 0.08 for the probability of making a strike call. For some context, this is ever-so-slightly smaller than the HFA I find (in the opposite direction) on strike-calling.

At this point, that is not interacted with whether it is the top or bottom of the 9th, so there could be more going on with this and HFA.

Perhaps more interesting is the fact that the only other statistically significant Inning coefficient is on the 10th inning, in which the umpire is less likely to make a strike call. This isn't the case for any of the other extra innings--though since there are 22 levels of the factor, it could just be statistically significant by chance.

Lastly, temperature seems to have very little bearing on strike-calling from what I see.

After an ump calls about 2 million strikes, he couldn't change even if he wanted to.

I was trying to look up an old friend ty Paul, LB umpire, because of what happened at Lakewood-Edison game today, to see if he was there and what he thought. Lakewood, my school, does the Houdini, hidden ball trick with the tying runs on first and second with two out in the last inning, pitcher spinning around to pick off at second and then a ruckus pretending the ball goes into center field. Except the center fielder is now picking up a ball shaped piece of white paper.
Can we now start throwing fake balls of paper as well.

It was a great game, and I'm proud to be part of the Lakewood program/folklore but if I was the head ump and saw that, it was subtle, I might I might want to throw someone out of the game.