Thoughts on In Depth Baseball
I like baseball heat maps. Really like them. They have captured the heat map that is my heart. I feel I should get that out of the way before I provide my thoughts on In Depth Baseball, TruMedia's baseball analytics platform.
During the 2010 postseason, I became aware of a new baseball analytics blog that specialized in such heat mappery. Behind the blog was one Rafe Anderson. Anderson had been a Boston Red Sox employee for six years before moving to TruMedia Networks, where he holds the titles of President and CEO. Now, Anderson has, along with programmer Jeff Stern, developed an analytics platform being marketed to MLB teams. I've had the opportunity to speak with Anderson on a couple of occasions, and he was generous enough to offer me a demo of In Depth Baseball (IDB).
IDB enters the marketplace in the same year as Bloomberg Sports (BBG). As they are in direct competition, I thought it would be natural to start by comparing IDB to BBG. Admittedly, I have had little experience with BBG.
BBG has a far sleeker layout than IDB. Here, take a look at screenshots of leaderboards from BBG and IDB. But IDB prides itself on not being "flashy," a possible dig at BBG's Flash-based platform. Consequently, IDB runs much more smoothly than BBG, while potentially at the same time making more sophisticated computations.
Now we arrive at the heat maps, a department that sets IDB apart from any platform I've seen before. Let's say you want to see the best contact hitters in the league. You go to the leaderboard and sort by contact rate, just as you would do on FanGraphs or anywhere else. But meanwhile, you can see an adjacent heat map showing the league average contact rate by strike zone location. And then, if you want to break that down into splits, such as LHBs vs. LHPs, both the leaderboards and heat maps update instantaneously. Furthermore, the heat maps are interactive in that you can isolate zones you want to look at by dragging your mouse into a certain area. After that, you can see who the best player in the league is in that zone, click on his name, and be taken to his player page, where the chosen filters remain constant. Other heat maps that I'm aware of are created in R, and it would take, conservatively, over a minute to process that much data. But it's not like the R ones even look any better than IDB's. The explanation I've been given is that Stern custom developed his own program, borrowing some fancy techniques that are used by chemical engineers. Well it's great, whatever it is. You can find quality heat mapping using IDB here and here.
Where IDB's heat maps sometimes fail are with smaller samples. For example, check the in play slugging heat maps used here. It's impossible to tell whether the observed trends are anything more than noise. Anderson says that the heat maps consider statistical significance, but from my experience, I've found that determining the right smoothing parameters is often more art than science. I would rather have an over-smoothed heat map than an under-smoothed one, as a heat map that shows no trends will at least tell you the player's mean performance, whereas a heat map with too much noise can lead you to draw false conclusions. It might be a failing of the analyst more so than the system to draw conclusions from such heat maps, because when you're looking at individual players, you probably want to choose metrics that stabilize quickly, like contact rate, called strike rate, or pitch frequency. But for analysts who don't regularly work with this sort of data, it would help if the smoothing parameters were refined for metrics such as in play slugging, which will rarely have a large enough sample to be highly consistent for individual players.
While the heat map is the bread and butter of In Depth Baseball, I feel that the most important part of any database system is how well it integrates video. Just as you can click on a player's splits to view different heat maps or spray charts instantaneously, his pitch-by-pitch log also updates. I don't think I can overstate how strongly I feel that every team should be using something more sophisticated than BATS to view video, and IDB obviously qualifies as a solution. The problem is that the pitches aren't directly linked to video streams, and instead, one must select certain pitches to a queue before watching them. If you want, you can pull up video of all Ryan Howard vs. LHP off-speed pitches in the last two years, but it would take a lot of clicks. I think it would make more sense if every video from the pitch log started on the queue, and then if you wanted to filter from there by using the splits section, videos would subsequently be removed.
I was highly impressed by the video quality, an area where IDB truly is "flashy." The Flash Player allows one to use slow motion, go frame by frame, or even change camera angles if multiple ones are available. I'm sure the playlists can be exported easily to hard drives if scouts don't want to come up with them on their own.
Bloomberg Sports holds an agreement with MLBAM, but IDB is fully independent outside of its team partnerships. Therefore, IDB has no license to video, and must borrow from teams. IDB has been able to work around this, as one thing Anderson stresses is that they use an Open API. You might be able to infer what that means, but from the TruMedia site, "This enables our partners to seamlessly integrate MLB analytics with relevant pitch by pitch video play lists within their own customizable user interface. Most importantly it allows organizations to keep their algorithms and metrics confidential." IDB has tools to incorporate HITf/x data or any other advanced data.
IDB already has an impressive advisory board, which gives them saber cred. I wouldn't be surprised if the fine folks at Complete Game Consulting have already played a hand in developing some of IDB's more advanced metrics. They have incorporated the "paint" set of metrics I believe to have been invented by Dan Brooks. IDB features "expected" values, too, and although I'm not quite sure how these are calculated, any metric with the word expected before it grabs my attention.
Another big thing is their "PZX" and "PVX" values, which measure angular velocity at the plate. They sound like something Matt Lentzner and Mike Fast discussed at this year's PITCHf/x summit, and if if I understand PVX and PVZ correctly, they could be the future way we measure movement (from the batter's point of view as opposed to the ball's). In addition, there are PVX vs. PVZ heat maps, so you can break down players by pitch movement the same way as by pitch location.
Alongside player heat maps are standard spray charts. The spray charts unfortunately use Gameday data, showing where the ball was picked up as opposed to where it was hit. Though you can mouse over a single hit to see the pitch details and video of it, for some reason you can't isolate zones like you can with the heat maps. So if I don't have the option of seeing video of all of a player's ground balls to the right side of the infield. It would make sense for IDB to add this feature.
There are other tools besides the league leaderboards and player dashboards, which contain the spray charts, heat maps, and video. One section which I didn't spend much time on is the "graphs" section, where you see a bunch of line graphs: a pitcher's fastball usage over the course of the season; a batter's contact rate by pitch velocity; a frequency distribution of a batter's ground ball angle. Pretty much any stat in line graph form. There's also a "comparisons" section, where you get an assortment of a player's heat maps side by side, such as how he does in different counts or by pitcher/batter handedness.
According to Anderson, umpire reports will be launched for the 2011 season, and they plan to venture into defense eventually as well.
While Bloomberg employs a team of programmers in research and development, Stern mostly by himself has created an incredibly powerful and efficient tool. Now, I've been wildly blown away by every database platform I've come across, but IDB certainly exceeds what is out there at all but a handful of MLB teams. What I could see making IDB so attractive to teams is that it is web based, and therefore available at all times. IDB looks fantastic on the iPad (I don't own one, so I guess everything I've seen on an iPad looks fantastic). Imagine watching a game in real time with iPad in hand and taking one click to instantly update a set of heat maps based on a change in the count or batter. So far, according to the Sports Business Journal, IDB calls the Padres and one other undisclosed team their clients. I have little doubt that IDB will continue to expand into a number of front offices, and with the news that TruMedia will be collaborating with Sportvision to provide MLB clubs with a minor league analytics platform, I am confident that the product will be that much better come Opening Day. I just hope that by then I'll still have the chance to see what IDB has had in store.