2009 Projections with Hit Tracker
Oh, no, not another projection system! Why would someone want to join the logjam of current systems? In no particular order, we have ZiPS, CHONE, Oliver, Marcel, Bill James, PECOTA and no doubt some others I haven’t stumbled across (sorry). All of these systems are designed to tell us how MLB players will perform next season, but none of them can convincingly claim to be more accurate than all the rest. When I look at any particular player’s projections in the various systems, I see a lot of similarity, which makes me suspect there must be some degree of groupthink going on. I believe there is some potential to improve performance forecasting by doing something different.
In the following paragraphs, I will outline a system for forecasting using Hit Tracker, an aerodynamic model for flying baseballs that is well-known for providing accurate home run measurements. I can guarantee that the Hit Tracker system will be different. Better? I won’t be able to say for sure until the 2009 season is over.
Background: How We Forecast Now
Why is it so difficult to forecast a player’s performance accurately? One huge reason is that every one of the current systems for performance projection starts from a set of data — the player’s prior year’s "box score stats" — that is positively riddled with statistical noise (chief among these uncontrolled noise factors are the dramatic differences in ballpark configurations and playing conditions across the 2,430 games played in 30 different parks over the course of six months).
Let’s consider another familiar form of forecasting: weather. In the 19th century, after the invention of the telegraph, weathermen began to form their predictions by first learning the weather "upwind," and then adjusting those measurements to come up with a forecast. "How hot will it be tomorrow? Well, it was 85 degrees today in the state where our weather seems to be coming from, so we’ll start with 85 and then adjust it up or down according to our experience. It’s usually a little hotter there than it gets here, so let’s say 82 degrees…" They didn’t call them "city factors" back then, but they could have.
After computers became available in the mid-20th century, weathermen became meteorologists, and the process of forecasting weather has continued to become more involved and mathematical as the years have gone by. Contemporary meteorologists now monitor a much larger array of parameters, and they feed these lower-order parameters into elaborate computer-based models to arrive at predictions for the higher-order outcomes like temperature, or winds, or precipitation. Thanks to more accurate measurements, and more detailed models, weather forecasts are dramatically more accurate today than those of even only 10 years ago.
In my opinion, baseball forecasting systems resemble the "19th century weatherman" system described above: to forecast something, measure something (well, in baseball we should say "count" something) that has happened already, then adjust this number to predict what hasn’t happened yet. So, to predict a player’s home runs, for example, the starting point is always his prior year’s total for home runs (or perhaps a weighted total from several seasons). From this starting point, various adjustments are applied to arrive at a final projection. Never mind where those home runs were hit, or how far they flew, or how much help or hindrance the weather may have provided them. Just count and adjust.
Starting from last year’s total assigns an equal value to what may in reality be very different events. For example, Jeremy Hermida hit two radically dissimilar fly balls last year, each of which cleared the home run fence: first, a windblown 321 foot homer in San Francisco on Aug. 20th, and second, a 443 foot rocket in Miami on July 19th. In a game context, they count the same, but when we are trying to measure the likelihood of future home runs, we should acknowledge that the outcome of one of those fly balls (the short one) was entirely dependent on its ballpark and weather context, while for the other fly ball, the ballpark and weather were irrelevant to the outcome. The short fly ball could only have become a home run in a park with a very shallow RF fence like AT&T Park, and only with the help of a tail wind. The long one would have been a homer in every park major league baseball has ever been played in, in any wind short of a hurricane blowing towards home plate.
Any system that cannot recognize the difference between two events such as these Hermida home runs cannot hope to consistently generate highly accurate predictions. I don’t mean this as a criticism of anyone who has created a projection system, don’t get me wrong. But I do believe that those systems have reached the limit of their capabilities, with average errors of around 60-70 points of OPS, and any further refinement of these models will probably just chase the statistical noise around in circles.
How can we get away from the practice of predicting future outcomes by using prior outcomes? I believe that the key is to consider the lower-level processes that lead to the final result of any particular batted ball. Some of these are the landing point of the hit, how hard the ball was hit, and the physical environment that the ball was hit in. For those batted balls where the physical environment is crucial (i.e. long fly balls), we need to measure the trajectory of the ball, the fence dimensions of the park, and the weather. For the rest of the batted balls, where the physical environment isn’t very important to the final result, we don’t need to.
In Hit Tracker, I have developed a method for analyzing the trajectory of long fly balls and projecting them into each of the 30 MLB ballparks for the purpose of generating a performance forecast. It is my hope that this system will yield more accurate performance forecasts.
How It Works: Steps in the Hit Tracker Forecasting Method
Case Study: Manny Ramirez
To further illustrate the method, I am going to highlight some of the findings from the Hit Tracker Analysis of Manny Ramirez over the years 2006-08, and his forecast for 2009.
First and foremost, I hope Manny Ramirez re-signs with the Los Angeles Dodgers for 2009, because Dodger Stadium is an absolutely perfect place for him to hit. I am not saying it is perfect for everyone; in fact, Dodger Stadium is a difficult place to hit for average or below average hitters, because its fences are deep in the corners where lesser hitters typically place their home runs. I am saying that Dodger Stadium is perfect for Manny. Manny’s swing, particularly his phenomenal power to center and right-center field, is ideally suited for the dimensions and environmental conditions of Dodger Stadium. I described the unique layout of Dodger Stadium (deep corners, shallow alleys and center field) in detail in my article, "Hit Tracker 2008," which was published in the 2009 Hardball Times Annual earlier this off-season.
At the opposite extreme, Manny’s home from 2000 to the 2008 trade deadline, Fenway Park, has robbed him of a great number of home runs over the years, perhaps as many as 50, as well as many other extra-base hits. Fenway’s very deep right-center and right fields have turned many of Manny’s towering opposite field drives into outs, and its 37-foot high Green Monster has turned many of his blistering drives to left and left-center field into doubles (or even singles).
A popular image exists of the Green Monster adding lots of extra-base hits to a hitter’s total by turning shallow fly balls into wall-scraping doubles, but this hasn’t been the case for Manny: in the three seasons 2006-08, Manny only hit 6 doubles at Fenway that would have been outs at Dodger Stadium. Over the same period, Manny hit 23 flyouts, 5 doubles and 1 triple at Fenway that would have been home runs at Dodger Stadium.
In the first 4 months of 2008, Manny encountered a particularly bad run of luck with his deep fly balls; despite racking up 20 home runs during that time, Manny could have gotten a lot more. Here is a list of Manny’s deep fly balls for the Boston Red Sox in 2008 that were not actually home runs, but which would have been home runs on an average day in Dodger Stadium. Where the weather negatively impacted his fly ball to a significant degree, this is listed as well:
Now, to be fair we have to look at the good luck Manny encountered during that same time frame. Here’s the list of Manny’s deep fly balls for the Boston Red Sox in 2008 that were actually home runs, but which would have not have been home runs on an average day in Dodger Stadium (there are 4):
That’s a net of 15 balls hit by Manny in the first 4 months of 2008 that had the power to fly out of Dodger Stadium, but which didn’t make it out where Manny actually hit them. Watching the video of these hits, the disbelief and disgust on Manny’s face was apparent after several of his blasts came up short due to deep fences, cold/windy weather or a combination of the two. Once he was traded to LA, those balls started making it out at a much higher rate: Manny connected for 9 home runs in only 80 at-bats in Dodger Stadium in 2008.
Forecast: Manny Ramirez 2009
Manny’s forecast for 2009 is based on analysis of all 248 long fly balls he hit during the 2006, 2007 and 2008 seasons. In 143 games in 2009, Manny should continue to perform extremely well in a Dodger uniform: the Hit Tracker forecast projects him to post the following numbers:
Los Angeles Dodgers: .430 OBP, .641 SLG, 1.071 OPS and 36 home runs (including 21 at Dodger Stadium).
As of the posting of this article, Manny is still a free agent, so here are forecasts for some other teams Manny might sign with:
San Francisco: .428 OBP, .618 SLG, 1.047 OPS, 32 home runs.
NY Mets: .417 OBP, .566 SLG, .983 OPS, 26 home runs.
Here are the Hit Tracker forecasts for several other MLB players. Some of the projections are based on three years of data (2006-08), while some are based only on one year of data (2008). The three-year forecasts are expected to be more accurate.
Forecasts Based on 2006-08 Data
Jason Bay, Boston Red Sox
Adam Dunn, free agent
Forecasts Based on 2008 Data Only
Mark Teixeira, New York Yankees
Matt Holliday, Oakland Athletics
Nate McLouth, Pittsburgh Pirates
In an attempt to validate the Hit Tracker forecasting method, I analyzed the 2007 long fly balls of three players who changed teams during the 2007-08 off-season: Torii Hunter, Aaron Rowand and Jim Edmonds. Using this data, I projected their 2008 results as a member of the teams they ended up with, and compared to their actual performances in 2008.
HT Projection as Los Angeles Angel: .325 OBP, .485 SLG, .810 OPS, 25 HR’s
Slightly off on the home runs, but overall a very good projection.
HT Projection as San Francisco Giant: .373 OBP, .507 SLG, .880 OPS, 25 HR’s
This is terrible, but there is an explanation: on June 6th, Rowand sustained a right quadriceps injury that hindered him the rest of the year. His actual production splits are as follows:
Through June 6th: .396 OBP, .526 SLG, .922 OPS, 23 HR’s (pro-rated for a full year)
The HT projection matched the pre-injury Rowand reasonably well, considering the small sample size of about 1/3 of a season. Since the forecast was based on a relatively injury-free 2007 season, this is a fair comparison to make, I think. By the way, if anyone ever comes up with a way to predict the performance of a player who plays hurt through the final 96 of his 152 games, do me a favor: a) tell me what the stock market is going to do in the next year, b) wait a couple days, c) tell the world. In a year, I’ll be rich, and you’ll be famous!
HT Projection as SD/CHC: .346 OBP, .488 SLG, .834 OPS, 18 HR’s
This is another good projection. Edmonds hit a lot of deep fly balls to left-center field in 2007 that were caught in his home park, Busch Stadium. That tendency carried over to the following season, but it didn’t help him in San Diego, where he started the year. However, after a May trade to the Cubs, Edmonds found a place where that swing worked well. Left-center field is the most favorable spot in Wrigley Field for home runs, and Edmonds took advantage, hitting 6 of his 11 Wrigley home runs into the bleachers in front of Waveland Ave. On the road he picked his spots well also, hitting 7 of his 9 away homers to left and left-center field. A projection that either didn’t factor in Edmonds’ home park, or which couldn’t discern his tendency to hit the other way with power, would be at a disadvantage when trying to accurately forecast Jim Edmonds.
More Thoughts About Forecasting
Here are some possible adjustments I considered, but decided not to include in the Hit Tracker system:
Regressing a player’s numbers towards the league average BABIP is a common tactic in projection systems. Instead of leaving alone all the non-long fly balls, I considered trying to adjust these hits according to the hitter’s BABIP, e.g. taking away an appropriate number of hits from the projection if the player showed an unusually favorable BABIP during the prior season(s).
My objection to this method is that I don’t feel that I can be certain that a player’s unusually high (or low) BABIP was due to luck instead of due to some underlying real factor. I don’t want to assume that a player’s BABIP should be a certain value, and regress back towards that value, because I don’t feel confident enough that I can pinpoint what that value should be for each individual player. I definitely don’t want to regress all hitters towards a common BABIP. In any event, the use of three years of data to generate projections should minimize any possibility of a player’s wildly aberrant BABIP ruining his projection.
Adjusting a projection for a player’s age is another common tactic which has some merit when one’s objective is to be correct "on average," for a large group of players. However, I feel uncomfortable applying an aging correction factor "across the board," without any regard for a player’s particular situation. Perhaps on average hitters lose a small amount of their power each year, but I don’t feel like I can say for which hitters that is true, and for which hitters that is not true, so I have chosen to leave out an aging factor.
I freely admit that an ideal forecasting system of the future will include some method for predicting the effects of aging on future performance, and that I am leaving it out. In the future I hope to be able to incorporate predictive aging into the HT model in terms of lower-level parameters such as speed off bat, or the direction of hits, rather than a crude adjustment of the final results. Such changes in hitters’ spray patterns can readily be detected (a good example is Jim Edmonds, whose long fly balls have decreased in distance and shifted from RF towards LF for the past several seasons.)
Modeling aging in this more detailed manner should also allow for situations where a decline in raw hitting performance does not manifest in a decline in results, such as a power hitter who loses a bit of distance on his fly balls, but still clears the fence with room to spare. I don’t want to paint that hitter, or any hitter for that matter, with the broad brush of "aging means the numbers get smaller"…
Overall "Regression to the Mean"
Some systems regress all of a player’s box score stats towards a selected value, typically a mean value for a subset of the population such as the AL, NL or all of MLB. The purpose of doing so is to account for the possibility that, due to limited sample size, a player has fortuitously outperformed or underperformed their true talent level. The league mean values are used because it is believed that it is impossible to accurately pinpoint a player’s true talent level.
It is certainly true that in any large sample of players, there will be some players that significantly outperform their true talent, some who significantly underperform, and some who perform roughly at their true talent level. In a system where box score outcomes are the only form of data, it makes sense to regress the outcomes to the mean: even though such a system might make some strange predictions (a career high 3 homers in 2009 for Juan Pierre, who has hit one ball out of the park in his last 1,097 at bats?), overall it will perform better than it could without applying such regression.
However, the Hit Tracker system accounts for variation from true talent level in a different way: by including all long flies instead of just homers, the luck factor for ballparks and weather is removed. By including multiple years of data, the sample size becomes even bigger, further decreasing the need to compensate via some form of regression to the mean. With these methods in use, I don’t feel it is appropriate to also add 75 or 80 at-bats from Gabe Gross to the reigning NL MVP’s numbers from 2008 before trying to predict how Prince Albert will do next year.
Advantages of the Hit Tracker System
Between now and the beginning of the 2009 season, I hope to post some more forecasts for other players, or perhaps expand some of the one-year forecasts listed above to three years. After the 2009 season we’ll have a chance to see how well this method did. I’m hoping that Hit Tracker will be able to bring the process of making projections forward to where weather forecasting was in the 1970’s: occasionally way off, more often on the money, but still far short of perfection (which is forever out of reach). Then we’ll figure out what the next step is…
Greg Rybarczyk is the creator of Hit Tracker, an aerodynamic model and method for recreating the trajectory of batted baseballs. With Hit Tracker, Greg has analyzed more than 15,000 MLB home runs over the past 3 seasons; a multitude of data on hitters, pitchers, ballparks and more can be found at hittrackeronline.com. While not tracking hits, Greg works as a reliability engineer, and he lives in the Portland, OR area with his wife and two children. Feel free to contact Greg at firstname.lastname@example.org.