Touching BasesSeptember 09, 2010
Another Quantitative Approach to Studying Release Point Consistency
By Jeremy Greenhouse

Jeff Sullivan in this very space on January 19, 2006:

We know an awful lot about pitchers. We know how hard they throw, how many batters they strike out, what kinds of pitches they have, and whether their deliveries are fluid and easy or violent and rough. This is all objective and indisputable information that has a lot of value when it comes to projecting a pitcher's future health and success.

One thing we don't know much about, though, is the consistency of a pitcher's release point. The fact that we don't have a good way of measuring what's arguably the most important part of being a good pitcher is one of the more ironic twists of modern analysis.

Well, by 2007, PITCHf/x had become all the rage. The data is available now, but I'm not sure how widely release points have been studied.

PITCHf/x estimates the ball's location at a mark 50 feet from home plate. Pitchers often shift their spot on the rubber, resulting in variations of the horizontal component of the release point. This doesn't doesn't necessarily mean that the pitcher isn't repeating his delivery, though. Therefore, I decided to only look at the vertical component. Furthermore, some pitchers use different arm slots for different pitch types, and curveballs have a higher initial trajectory than fastballs. My methodology was the find the standard deviation of a pitcher's vertical release point for the fastest 20% of his pitches. Since cameras are calibrated ever so slightly differently in every ballpark, and even in every series to some extent, I looked at pitchers at both the season and game level.

While intuitive reasoning would suggest release point consistency is automatically a positive, I didn't immediately notice anything that would allow for such a broad claim. Still, I did see how release point consistency correlates with some other things.

Pitchers with lower arm slots have more trouble with release point consistency. This makes sense because pitchers with low arm angles tend to be less skilled and practiced than more traditional over-the-top pitchers. The sidearm motion could be naturally harder to repeat. It could also be a PITCH/x issue. Higher variance in release points coincide with higher variance in movement and velocity as well.

On to some examples. Javier Lopez, a sidearmer, is the worst at maintaining a consistent release point.

Lopezrelease.gif

Perhaps he's changing his arm slot intentionally. Jose Contreras has been an effective pitcher who deals from multiple release points. Unlike Lopez, though, Contreras has separate, consistent release point clusters, which makes it easy to see that it is part of his approach.

Contrerasrelease2.gif

And now for something different, Alberto Castillo:

castillorelease2.gif

David Huff is a good example of a pitcher who has a very consistent release point.

Huffrelease2.gif

In fact, Frank Viola said in 2009, ""Huff has textbook mechanics. Everything is right there. His release point is consistent with all his pitches."

Chris Carpenter and Kevin Slowey have highly consistent release points, too.

Baseball BeatSeptember 07, 2010
The Top 100 K/100P Leaders
By Rich Lederer

While strikeouts per pitch hasn't caught on as hoped when I introduced the idea in February 2006, there is no disputing the fact that this metric explains runs better than strikeouts per inning or strikeouts per batter faced.

As detailed in Strikeout Proficiency (Part Two), K/P has the highest correlation in each of the five run measures (ERA, R/G, ERC, FIP, and DIPS). K/BF has the second-highest correlation and K/IP has the lowest correlation. In any other words, K/P > K/BF > K/IP.

To give K/P more utility, I multiply this decimal by 100. Not only do we now get a real number out of this exercise but the standard of measurement is almost exactly the average number of pitches per start during recent years. In an era of pitch counts, it seems more instructive to me to measure starters by the number of K/100 pitches than K/9 IP.

(For context, among those who are currently qualified for the ERA title, the average pitcher has thrown 100 pitches per start and completed 6 1/3 innings. The average number of K/100P is 4.88.)

With the foregoing in mind, let's take a look at this year's leaders. Interestingly, there are 100 pitchers who have averaged at least one inning per team game, which is the minimum to qualify for the ERA title. (The stats were compiled yesterday evening in real time and may not include the entire results for late games.)

K%3A100P%201-50.pngK%3A100P%2051-100.pngAs shown, Brandon Morrow is leading the majors with 7.06 K/100P. He is averaging 97 pitches and 6.85 Ks per start. While Morrow leads MLB in K/100P, K/9, and K/BF, the 26-year-old righthander is 13th in strikeouts due to the fact that he is only averaging 5 2/3 innings per start. Aside from Morrow's dominating one-hit, 17-strikeout, complete-game shutout last month vs. Tampa Bay when he was allowed to throw 137 pitches, his starts, innings, and pitch counts have been managed closely by Cito Gaston and the Toronto front office. Along these lines, he was shut down for the season after making his last start on Friday against the New York Yankees. Although the former first round draft pick out of Cal will fall short of the required 162 innings to qualify for the ERA title, it makes little or no difference given that his 4.49 mark currently ranks 36th in the American League (out of 46 pitchers). However, it is worth noting that he has a bigger gap (1.31) between his ERA and FIP (3.18) than any starter in the big leagues.

Francisco Liriano ranks second with 6.87 K/100P. Like Morrow, Liriano's ERA (3.27), while excellent, understates his defense-independent pitching prowess this year as the lefthander tops the majors in FIP at 2.31 due to a strong strikeout rate, a better-than-average walk rate, and a home run rate (0.16 per 9) that is more than twice as low as the closest challenger (Josh Johnson, 0.34). While Liriano's HR/FB of 2.6% is probably unsustainable longer term, his xFIP (3.01), which normalizes the home run/fly ball rate to league average, still places him first in the AL and second in MLB (behind only Roy Halladay, 2.93).

Jon Lester ranks in the top five in the majors in strikeouts, K/100P, K/9, and K/BF. He is tenth in the AL in ERA and fourth in FIP and xFIP. The 26-year-old southpaw has produced three consecutive superb seasons and must now be regarded as one of the top five pitching properties in baseball.

With Stephen Strasburg sidelined through 2011, is there a better 22-year-old (or younger) pitcher than Mat Latos? The San Diego righthander is two months older than Brett Anderson and three months older than Clayton Kershaw, the other contenders for this mythical title. Latos (6.54) and Kershaw (6.39) rank fourth and sixth, respectively, in K/100P. Both starters play for teams in the NL West so they generally face similar competition. Although Latos' home ballpark is more friendly toward pitchers than Kershaw's, the former (.188/.247/.310, 2.36 ERA) has outperformed the latter (.241/.325/.350, 2.86 ERA) on the road this year. In the department of be careful when analyzing (over analyzing?) the effects of home ballparks, please note that Latos has pitched 99.1 IP on the road and just 56.1 IP at home this year. In other words, he has only thrown 36 percent of his innings at Petco Park, which means he hasn't benefited from the 87 park factor as much as one might believe without examining the facts. Oh, and it just so happens that Latos and Kershaw are the scheduled starting pitchers tonight when the Padres host the Dodgers.

At 6.41 K/100P, Jered Weaver is sandwiched between Latos and Kershaw. Weaver ranks among the top five pitchers in the majors in Ks, K/100P, K/9, K/BF, and K/BB. He is 8th in ERA, 6th in FIP, and 5th in xFIP among AL pitchers. The 6-foot-7 righthander also ranks 5th in Wins Above Replacement (WAR) and 3rd in Win Probability Added (WPA) in the junior circuit. While the Angels' ace lacks the gaudy win totals and winning percentages of CC Sabathia and David Price (and others), he has clearly been one of the five most effective starting pitchers in the league this season. Weaver can take the next step by pitching deeper into games as he is without a complete game and has only worked more than seven innings three times, primarily due to the fact that he leads the majors in pitches per plate appearance (4.17).

A lot has been written and said about Tim Lincecum's up-and-down 2010 but the fact remains that the two-time Cy Young Award winner is seventh in the majors and third in the NL in K/100P. His fastball velocity and movement have declined this season, yet he is getting more batters to swing at pitches outside the zone than ever before. In the aftermath of a poor August, the 26-year-old righthander beat the Colorado Rockies with a strong performance (8-5-1-1-1-9) on September 1. I would be slow to give up on this extraordinary talent.

Felix Hernandez leads the majors in strikeouts and ranks eighth in K/100P. He deserves to win the AL Cy Young Award as much as anybody, yet may be hurt if voters hold his mediocre win total (11) and W-L % (.524) against him. Both can be easily explained by the fact that Felix has received the lowest run support (3.90) in the AL this season. According to Lee Sinins, Hernandez would be 15-6 if he had received average run support. Sure, Sabathia is 19-5 but he has been supported by an average of 7.59 runs from his Yankees teammates. Similarly, Price (16-6) has received an average of 6.72 runs. Even Clay Buchholz, whose 15-6 record and league-leading 2.25 ERA will draw considerable attention, has been backed by 7.06 runs per nine. The truth of the matter is that Hernandez is 2nd in ERA, 3rd in FIP, 3rd in xFIP, 3rd in WAR, and 1st in WPA. No other pitcher matches those rankings.

Cole Hamels has also pitched much better than his 9-10 W-L record would suggest. He has received the fifth-lowest run support (4.92) in the NL. Teammates Roy Oswalt (3.72) and Roy Halladay (4.68) rank first and fourth, respectively. Meanwhile, the 26-year-old lefthander ranks 4th in the NL in K/100P, 7th in K/9, and 8th in K/BB and xFIP. No team wants to face the Phillies' Big Three in the postseason.

Yovani Gallardo ranks 10th in the majors in K/100P. While the Milwaukee ace can frustrate writers, analysts, and fans at times, it is hard to argue against the following NL rankings: 1st in K/9, 4th in FIP, and 6th in xFIP and HR/9. While Gallardo needs to improve his control to reach his potential, he has been victimized by the fourth-highest BABIP (.337) and the eighth-lowest LOB% (69.2%). I mean, let's give the guy a break — he's only 24 years old.

There are a number of other pitchers having superb seasons, including the next four on the list: Adam Wainwright, Cliff Lee, and the previously mentioned Roy Halladay and Josh Johnson. Along with Ubaldo Jimenez, Wainwright, Halladay, and Johnson are probably the leading favorites to win the NL Cy Young Award in 2010. An argument could be made for all four at this point. Although Lee and Halladay aren't thought of as strikeout types, both have posted strong K/100P marks in part due to their pitch-count efficiency. Lee is 3rd among qualified MLB pitchers in P/PA (3.49) and 2nd in P/IP (14.0), while Halladay ranks 6th (3.58) and 3rd (14.2) in these two measures.

Lastly, I would be remiss if I didn't point out that Strasburg (92 Ks and 1,073 pitches) averaged 8.57 K/100 pitches in 12 starts spread over 68 innings. That, my friends, is 1.51 K/100P more than the leader among all qualified pitchers!

F/X VisualizationsSeptember 03, 2010
PITCHf/x Summit 2010 Recap
By Dave Allen

A week ago today I was on my way to San Francisco for the 3rd annual PITCHf/x summit. The summit is put on by Sportvision, the company that developed the PITCHf/x system. I went last year, when I had a great time and was looking forward to this one -- it did not disappoint.

PITCHf/x summit is a bit of a misnomer because at this point Sportvision is expanding its f/x-family and this summit was largely centered around Sportvision's new FIELDf/x system. This camera-based system aims to track the the movement of all players on the field as well as the ball in play and throws between fielders. The system has been running on a test basis at AT&T park since April and Sportvision hopes to have the system in all MLB parks by next year. The availability of this future data to the public is at this point not known as Sportvision works out the business side of the project.

As part of this year's summit Sportvision released 13 games of the FIELDf/x data from AT&T to a limited number of analysts to analyze and present on at the summit. Although Sportvision is working on tracking the ball with the FIELDf/x system, that is still a work in progress and they released 'just' the player tracking data. About half of the talks at the summit were based on the FIELDf/x data and the other half on other topics. Here I present a brief recap of these talks. The presentations should be available to download in the future, and looks like they will be here when they are.

Part 1 non-FIELDf/x

Matt Lentzner and Mike Fast started off. Matt said that he has always been troubled by how movement numbers are reported, citing the often reported fact that according to PITCHf/x's spin deflection numbers (pfx_x and pfx_z) fastballs have a lot of spin deflection, or movement, while sliders have very little. Matt suggested the difference between these data and our expectations is because the spin deflection is defined, as Matt put it, from the perspective of the ball, while we think about movement from he perspective of the batter. Matt suggested that it would be useful to define two new values, the horizontal (x) and vertical (z) velocity of the pitch just as it crosses the plate. These value are affected not only by the pfx_x and pfx_z of the pitch, but also its trajectory, and could better represent the movement of a pitch as it is observed by a batter.

Matt had Mike run the numbers to see how well these metrics correlated with swinging strike rate, and also presented the leader and laggard boards for starters' fastballs' vertical plate-crossing velocity. The results were preliminary but very cool. Hopefully Mike and Matt will continue to develope this idea and share more results with us in the future.

Up next were Glenn (Doc) Schoenhals and Fred Vint of Scientific Baseball. Scientific Baseball is looking to "close the gap between the science and the game." They have leased the pitchf/x system, installed it in a training facility in Oklahoma, and combined it with a number of cameras that capture the motion of the pitcher at a high number of frames per second. They use this for player evaluation and development with players of all ages. Doc talked about the challenges of dealing with lots of PITCHf/x data, combining it with some of the visual data from the cameras, and finding a way to communicate all of that to young players, their parents and coaches who might not familiar with measures like horizontal spin deflection. Doc also has a very accurate pitching machine which he can use to fire pitches just on the edge of the strike zone, using that and the pitch/x system he has held little league (?) umpire training and practice sessions.

At that point Matt Lentzner was back up talking about an interesting pitch he has seen from Hideki Okajima. It is referred to as a rainbow curve, but is not held like a curve and does not have the movement of one. In fact, the pitch has pfx_x and pfx_z values close to zero: Matt thinks that it is a gyro ball.

Next up was Alan Nathan, who with Peter Jensen organized the summit. Alan presented the results from a series of experiments he conducted to measure the spin rate of batted balls. The pitchf/x system calculates the spin rate of pitched balls based on the fit trajectory, but not is much is known about the spin of the batted ball. This spin plays a large role in making the ball drop faster on line drives (front spin) or stay in the air longer on some fly balls (backspin). It also makes the ball slice towards the foul line (side spin). Alan directly measured the spin on the ball by firing a marked baseball at a cylindrical piece of wood bolted to a wall at 100mph and taking pictures of the ball as it came off.

Alan found a number of interesting things. The spin direction of the ball off the 'bat' was largely independent of the spin direction of the incoming ball (Alan varied the spin direction of the incoming ball). Also it in the moments when it hit the bat the ball experienced sheer deformation, causing it to 'grip' the bat. As I could understand it this stopped the spin of the ball which is why the spin of the incoming ball did not play a big role in determining the spin of the ball coming off. This 'gripping' and deformation caused the ball to come off the bat with a huge spin rate: Alan observed balls coming off with over 4000 rpm, much higher than previous estimates. Alan was very surprising by how high these values were. He is hoping to incorporate these results into a model of the bat-ball collusion.

Part 2 FIELDf/x

Vidya Elangovan, a sportvision engineer, introduced us to the fieldf/x system and some of the technical challenges of capturing the data. As noted the system is up and running at AT&T and has been since April, the hope is to have the system in all parks by the 2011 season. Vidya said that the full tracked and recorded data is ready within 20-30 minutes after the game, but at this point is not completely 'real-time' like the pitchf/x system.

The system has two to four cameras placed up high above the field and trained on the entire field of play. At AT&T they use two cameras, one between 1st and home, and the other between 3rd and home, both very high, it seems placed on stadium lights. The cameras are higher resolution than the pitchf/x cameras and take pictures every 15th of a second. A computer algorithm picks out the players, coaches and umpires, turns them into a blob and finds the center of mass of each blog and attaches a location to that point. The system also records events: pitcher releases the ball, batter hits the ball, fielder gains possession of a ball (fields it, or catches it from a throw) and fielder throws the ball. The time of each of these events is recorded along with the identify of the fielder. In the future the system will also track the location of the ball in play and throw, although those data were not released with the 13 games.

Vidya highlighted a number of the technical challenges. Shadows over part of the field during day games are challenging because they push the limits of the dynamic range of the cameras to pick up both shadowed and non-shadowed areas. Shadows of players can also artificially increase the size of player blobs, resulting in incorrect player centers. Green uniforms blend in with the grass, tricking the algorithm that picks out players from background. Similarly if players stand too still for a long time the algorithm can lose them. Finally the system picks up ridiculously large amounts of data. If Sportvision kept all those high-resolution pictures taken every 15th of a second for every game of a MLB season they would end up with petabytes of data. With just the location data for all players every 15th of a second they get one million lines of data a game. Effectively storing, transmitting and analyzing this data will be a huge challenge.

Maybe the bloggers could give us some hope.

Peter Jensen showed how he took this huge quantity of data, moved it into a databased and then into an excel-based simulation which could replay the movement of the players and ball (extrapolated from player events). Peter's simulation was well done and while it ran it also displayed some of the important pieces of information (throw speeds, distance between base runners and the next base, etc.). Whoever gets this data, teams bloggers, etc. will need to do something like Peter did to make sense of this data.

John Walsh spoke at the beginning of the data, by Skype because he was in Italy, but his talk fits in better here. John analyzed grounders. Since we had just 13 games worth (and only bottom halves of innings) and less than a month to work with the data it was hard to do more than just descriptive looks at the data. Still the descriptive look was very cool. John calculated how long each fielded grounder took to get to the fielder: the average play to 3B took about 1.5 seconds, while those to SS or 2B took about two seconds. So middle infielders get, on average, about half a second extra to get the ball. John also showed that with the data it is possible to break down the time it takes to make a double play into its consistent parts: time it takes for the ball to be fielded, time the fielder holds the ball, the time it takes for the ball to get to the next fielder, and so on.

At that point I was up. I looked at fielders' routes to balls in the air. With the data you could see how direct, or not, paths to the ball were. I showed some plays where the paths were particularly direct and some where they not so direct. Ultimately I showed a graph of hang time versus distance the fielder was from the ball for fielded balls in the air. With the trajectory of non-fielded balls as well we could add those to this graph, adding how far a fielder was to the ball and how long he would have had to get there. I noted that this would be a great basis for a fielding metric, Greg will talk more about this in his talk.

Next up was Mike Fast, who analyzed base runners. First he showed the base-running trajectories for a number of plays. When players go between two bases they take roughly the straight line between the two, but when they are going for two bases they take a rounder, almost circular approach. Based the on data Mike looked at he didn't see a lot of variability between the paths take between different players taking two bases. Mike also looked in depth at two runners, plotting their instantaneous speed at each 1/15 second interval. He showed how the runner sped up or slowed down when the pitcher started his windup, released the ball, the ball was hit, and so on. One of the runners Mike showed got up to a top speed of 18 mph.

Baseball Analyst Jeremy Greenhouse was up next. He presented two models he had parameterized with the FIELDf/x data. The first was a model to predict stolen base success probability based on a number of parameters: length of lead, amount of time it takes the base runner to get to the next base, pitch type, pitch speed, catcher pop time (time between when the catcher gets the ball to when he throws it), amount of time it takes the catcher's throw to get to second (or third). Jeremy noted that his model would not account for the baserunner's sliding ability or the fielder's tagging ability. The released FIELDf/x data had only four steal attempts so a complete parameterazation of his model was not possible, but with a larger set of data it would be very cool to see what this model would show. Jeremy had a similar model for estimating the success of fielding a fly ball.

Matt Thomas uses a DSLR to take pictures of the field of play from the press box at Busch Stadium in St. Louis. From what I understand he captures the initial position of players as each play begins and then the position when the ball is fielded. It is very cool to see the amount and level of data that Matt can collect with a consumer-level camera and his photometry skills. Matt showed distributions for the initial locations of fielders for each position based on batter handedness, batting order, inning and a number of other game states. He also showed the probability that an infielder fields a grounder based on the difference between the angle where the fielder is positioned and the angle of the grounder, it follows a relatively nice Gaussian centered just off of zero.

Max Marchi, all the way from Italy by way of NYC, Cooperstown, Syracuse, Buffalo, South Bend and Chicago, gave us examples of how you could use PITCHf/x, HITtf/x and FIELDf/x to scout players. He had a number of examples from the blogoshpere (his work, Jeremy's work, my work). It was a very cool talk to see all of the ways these data can be used to measure players' abilities.

Greg Rybarczyk was up next. Like me he looked at fielders playing balls in the air, but he added the next step to the analysis. He went through 13 innings and looked at all balls in the air and found the landing location and hang time of balls that dropped in for hits. With this he could do want I wanted to do and plot both hits and fielded balls in hang time/distance between fielder and ball space. With enough data points one could assign a probability that the average fielder fields a ball based on these two values (another value that Greg noted was important was the angle the player had to go to get the ball). Then each fielder could be assessed based on the probability the average fielder makes plays that he made or didn't. Most agreed this would be more accurate than the current zone-based methods, but it is still a question whether this method would make fielding metrics converge any faster than current methods

All presenters did a tremendous amount of work in their presentations and this is just a small sample of each presentation. If you are interested further I suggest you download the slides and look over them. Also if I mis-stated anything here please note any corrections in the comments.

If you are looking for more recaps or liveblogs you can check out Colin's, Ben's, Rob's or Dan's.

I had a great time at the summit, it was lots of fun to see some of the other members of the PITCHf/x-community. Thanks to Sportvision for putting on the conference and Alan and Peter for helping to organize it.