The Baseball Analysts: PITCHf/x Summit 2010 Recap

PITCHf/x Summit 2010 Recap

By Dave Allen

A week ago today I was on my way to San Francisco for the 3rd annual PITCHf/x summit. The summit is put on by Sportvision, the company that developed the PITCHf/x system. I went last year, when I had a great time and was looking forward to this one -- it did not disappoint.

PITCHf/x summit is a bit of a misnomer because at this point Sportvision is expanding its f/x-family and this summit was largely centered around Sportvision's new FIELDf/x system. This camera-based system aims to track the the movement of all players on the field as well as the ball in play and throws between fielders. The system has been running on a test basis at AT&T park since April and Sportvision hopes to have the system in all MLB parks by next year. The availability of this future data to the public is at this point not known as Sportvision works out the business side of the project.

As part of this year's summit Sportvision released 13 games of the FIELDf/x data from AT&T to a limited number of analysts to analyze and present on at the summit. Although Sportvision is working on tracking the ball with the FIELDf/x system, that is still a work in progress and they released 'just' the player tracking data. About half of the talks at the summit were based on the FIELDf/x data and the other half on other topics. Here I present a brief recap of these talks. The presentations should be available to download in the future, and looks like they will be here when they are.

Part 1 non-FIELDf/x

Matt Lentzner and Mike Fast started off. Matt said that he has always been troubled by how movement numbers are reported, citing the often reported fact that according to PITCHf/x's spin deflection numbers (pfx_x and pfx_z) fastballs have a lot of spin deflection, or movement, while sliders have very little. Matt suggested the difference between these data and our expectations is because the spin deflection is defined, as Matt put it, from the perspective of the ball, while we think about movement from he perspective of the batter. Matt suggested that it would be useful to define two new values, the horizontal (x) and vertical (z) velocity of the pitch just as it crosses the plate. These value are affected not only by the pfx_x and pfx_z of the pitch, but also its trajectory, and could better represent the movement of a pitch as it is observed by a batter.

Matt had Mike run the numbers to see how well these metrics correlated with swinging strike rate, and also presented the leader and laggard boards for starters' fastballs' vertical plate-crossing velocity. The results were preliminary but very cool. Hopefully Mike and Matt will continue to develope this idea and share more results with us in the future.

Up next were Glenn (Doc) Schoenhals and Fred Vint of Scientific Baseball. Scientific Baseball is looking to "close the gap between the science and the game." They have leased the pitchf/x system, installed it in a training facility in Oklahoma, and combined it with a number of cameras that capture the motion of the pitcher at a high number of frames per second. They use this for player evaluation and development with players of all ages. Doc talked about the challenges of dealing with lots of PITCHf/x data, combining it with some of the visual data from the cameras, and finding a way to communicate all of that to young players, their parents and coaches who might not familiar with measures like horizontal spin deflection. Doc also has a very accurate pitching machine which he can use to fire pitches just on the edge of the strike zone, using that and the pitch/x system he has held little league (?) umpire training and practice sessions.

At that point Matt Lentzner was back up talking about an interesting pitch he has seen from Hideki Okajima. It is referred to as a rainbow curve, but is not held like a curve and does not have the movement of one. In fact, the pitch has pfx_x and pfx_z values close to zero: Matt thinks that it is a gyro ball.

Next up was Alan Nathan, who with Peter Jensen organized the summit. Alan presented the results from a series of experiments he conducted to measure the spin rate of batted balls. The pitchf/x system calculates the spin rate of pitched balls based on the fit trajectory, but not is much is known about the spin of the batted ball. This spin plays a large role in making the ball drop faster on line drives (front spin) or stay in the air longer on some fly balls (backspin). It also makes the ball slice towards the foul line (side spin). Alan directly measured the spin on the ball by firing a marked baseball at a cylindrical piece of wood bolted to a wall at 100mph and taking pictures of the ball as it came off.

Alan found a number of interesting things. The spin direction of the ball off the 'bat' was largely independent of the spin direction of the incoming ball (Alan varied the spin direction of the incoming ball). Also it in the moments when it hit the bat the ball experienced sheer deformation, causing it to 'grip' the bat. As I could understand it this stopped the spin of the ball which is why the spin of the incoming ball did not play a big role in determining the spin of the ball coming off. This 'gripping' and deformation caused the ball to come off the bat with a huge spin rate: Alan observed balls coming off with over 4000 rpm, much higher than previous estimates. Alan was very surprising by how high these values were. He is hoping to incorporate these results into a model of the bat-ball collusion.

Part 2 FIELDf/x

Vidya Elangovan, a sportvision engineer, introduced us to the fieldf/x system and some of the technical challenges of capturing the data. As noted the system is up and running at AT&T and has been since April, the hope is to have the system in all parks by the 2011 season. Vidya said that the full tracked and recorded data is ready within 20-30 minutes after the game, but at this point is not completely 'real-time' like the pitchf/x system.

The system has two to four cameras placed up high above the field and trained on the entire field of play. At AT&T they use two cameras, one between 1st and home, and the other between 3rd and home, both very high, it seems placed on stadium lights. The cameras are higher resolution than the pitchf/x cameras and take pictures every 15th of a second. A computer algorithm picks out the players, coaches and umpires, turns them into a blob and finds the center of mass of each blog and attaches a location to that point. The system also records events: pitcher releases the ball, batter hits the ball, fielder gains possession of a ball (fields it, or catches it from a throw) and fielder throws the ball. The time of each of these events is recorded along with the identify of the fielder. In the future the system will also track the location of the ball in play and throw, although those data were not released with the 13 games.

Vidya highlighted a number of the technical challenges. Shadows over part of the field during day games are challenging because they push the limits of the dynamic range of the cameras to pick up both shadowed and non-shadowed areas. Shadows of players can also artificially increase the size of player blobs, resulting in incorrect player centers. Green uniforms blend in with the grass, tricking the algorithm that picks out players from background. Similarly if players stand too still for a long time the algorithm can lose them. Finally the system picks up ridiculously large amounts of data. If Sportvision kept all those high-resolution pictures taken every 15th of a second for every game of a MLB season they would end up with petabytes of data. With just the location data for all players every 15th of a second they get one million lines of data a game. Effectively storing, transmitting and analyzing this data will be a huge challenge.

Maybe the bloggers could give us some hope.

Peter Jensen showed how he took this huge quantity of data, moved it into a databased and then into an excel-based simulation which could replay the movement of the players and ball (extrapolated from player events). Peter's simulation was well done and while it ran it also displayed some of the important pieces of information (throw speeds, distance between base runners and the next base, etc.). Whoever gets this data, teams bloggers, etc. will need to do something like Peter did to make sense of this data.

John Walsh spoke at the beginning of the data, by Skype because he was in Italy, but his talk fits in better here. John analyzed grounders. Since we had just 13 games worth (and only bottom halves of innings) and less than a month to work with the data it was hard to do more than just descriptive looks at the data. Still the descriptive look was very cool. John calculated how long each fielded grounder took to get to the fielder: the average play to 3B took about 1.5 seconds, while those to SS or 2B took about two seconds. So middle infielders get, on average, about half a second extra to get the ball. John also showed that with the data it is possible to break down the time it takes to make a double play into its consistent parts: time it takes for the ball to be fielded, time the fielder holds the ball, the time it takes for the ball to get to the next fielder, and so on.

At that point I was up. I looked at fielders' routes to balls in the air. With the data you could see how direct, or not, paths to the ball were. I showed some plays where the paths were particularly direct and some where they not so direct. Ultimately I showed a graph of hang time versus distance the fielder was from the ball for fielded balls in the air. With the trajectory of non-fielded balls as well we could add those to this graph, adding how far a fielder was to the ball and how long he would have had to get there. I noted that this would be a great basis for a fielding metric, Greg will talk more about this in his talk.

Next up was Mike Fast, who analyzed base runners. First he showed the base-running trajectories for a number of plays. When players go between two bases they take roughly the straight line between the two, but when they are going for two bases they take a rounder, almost circular approach. Based the on data Mike looked at he didn't see a lot of variability between the paths take between different players taking two bases. Mike also looked in depth at two runners, plotting their instantaneous speed at each 1/15 second interval. He showed how the runner sped up or slowed down when the pitcher started his windup, released the ball, the ball was hit, and so on. One of the runners Mike showed got up to a top speed of 18 mph.

Baseball Analyst Jeremy Greenhouse was up next. He presented two models he had parameterized with the FIELDf/x data. The first was a model to predict stolen base success probability based on a number of parameters: length of lead, amount of time it takes the base runner to get to the next base, pitch type, pitch speed, catcher pop time (time between when the catcher gets the ball to when he throws it), amount of time it takes the catcher's throw to get to second (or third). Jeremy noted that his model would not account for the baserunner's sliding ability or the fielder's tagging ability. The released FIELDf/x data had only four steal attempts so a complete parameterazation of his model was not possible, but with a larger set of data it would be very cool to see what this model would show. Jeremy had a similar model for estimating the success of fielding a fly ball.

Matt Thomas uses a DSLR to take pictures of the field of play from the press box at Busch Stadium in St. Louis. From what I understand he captures the initial position of players as each play begins and then the position when the ball is fielded. It is very cool to see the amount and level of data that Matt can collect with a consumer-level camera and his photometry skills. Matt showed distributions for the initial locations of fielders for each position based on batter handedness, batting order, inning and a number of other game states. He also showed the probability that an infielder fields a grounder based on the difference between the angle where the fielder is positioned and the angle of the grounder, it follows a relatively nice Gaussian centered just off of zero.

Max Marchi, all the way from Italy by way of NYC, Cooperstown, Syracuse, Buffalo, South Bend and Chicago, gave us examples of how you could use PITCHf/x, HITtf/x and FIELDf/x to scout players. He had a number of examples from the blogoshpere (his work, Jeremy's work, my work). It was a very cool talk to see all of the ways these data can be used to measure players' abilities.

Greg Rybarczyk was up next. Like me he looked at fielders playing balls in the air, but he added the next step to the analysis. He went through 13 innings and looked at all balls in the air and found the landing location and hang time of balls that dropped in for hits. With this he could do want I wanted to do and plot both hits and fielded balls in hang time/distance between fielder and ball space. With enough data points one could assign a probability that the average fielder fields a ball based on these two values (another value that Greg noted was important was the angle the player had to go to get the ball). Then each fielder could be assessed based on the probability the average fielder makes plays that he made or didn't. Most agreed this would be more accurate than the current zone-based methods, but it is still a question whether this method would make fielding metrics converge any faster than current methods

All presenters did a tremendous amount of work in their presentations and this is just a small sample of each presentation. If you are interested further I suggest you download the slides and look over them. Also if I mis-stated anything here please note any corrections in the comments.

If you are looking for more recaps or liveblogs you can check out Colin's, Ben's, Rob's or Dan's.

I had a great time at the summit, it was lots of fun to see some of the other members of the PITCHf/x-community. Thanks to Sportvision for putting on the conference and Alan and Peter for helping to organize it.