Designated Hitter February 05, 2009
2009 Projections with Hit Tracker

Oh, no, not another projection system! Why would someone want to join the logjam of current systems? In no particular order, we have ZiPS, CHONE, Oliver, Marcel, Bill James, PECOTA and no doubt some others I haven’t stumbled across (sorry). All of these systems are designed to tell us how MLB players will perform next season, but none of them can convincingly claim to be more accurate than all the rest. When I look at any particular player’s projections in the various systems, I see a lot of similarity, which makes me suspect there must be some degree of groupthink going on. I believe there is some potential to improve performance forecasting by doing something different.

In the following paragraphs, I will outline a system for forecasting using Hit Tracker, an aerodynamic model for flying baseballs that is well-known for providing accurate home run measurements. I can guarantee that the Hit Tracker system will be different. Better? I won’t be able to say for sure until the 2009 season is over.

Background: How We Forecast Now

Why is it so difficult to forecast a player’s performance accurately? One huge reason is that every one of the current systems for performance projection starts from a set of data — the player’s prior year’s "box score stats" — that is positively riddled with statistical noise (chief among these uncontrolled noise factors are the dramatic differences in ballpark configurations and playing conditions across the 2,430 games played in 30 different parks over the course of six months).

Let’s consider another familiar form of forecasting: weather. In the 19th century, after the invention of the telegraph, weathermen began to form their predictions by first learning the weather "upwind," and then adjusting those measurements to come up with a forecast. "How hot will it be tomorrow? Well, it was 85 degrees today in the state where our weather seems to be coming from, so we’ll start with 85 and then adjust it up or down according to our experience. It’s usually a little hotter there than it gets here, so let’s say 82 degrees…" They didn’t call them "city factors" back then, but they could have.

After computers became available in the mid-20th century, weathermen became meteorologists, and the process of forecasting weather has continued to become more involved and mathematical as the years have gone by. Contemporary meteorologists now monitor a much larger array of parameters, and they feed these lower-order parameters into elaborate computer-based models to arrive at predictions for the higher-order outcomes like temperature, or winds, or precipitation. Thanks to more accurate measurements, and more detailed models, weather forecasts are dramatically more accurate today than those of even only 10 years ago.

In my opinion, baseball forecasting systems resemble the "19th century weatherman" system described above: to forecast something, measure something (well, in baseball we should say "count" something) that has happened already, then adjust this number to predict what hasn’t happened yet. So, to predict a player’s home runs, for example, the starting point is always his prior year’s total for home runs (or perhaps a weighted total from several seasons). From this starting point, various adjustments are applied to arrive at a final projection. Never mind where those home runs were hit, or how far they flew, or how much help or hindrance the weather may have provided them. Just count and adjust.

Starting from last year’s total assigns an equal value to what may in reality be very different events. For example, Jeremy Hermida hit two radically dissimilar fly balls last year, each of which cleared the home run fence: first, a windblown 321 foot homer in San Francisco on Aug. 20th, and second, a 443 foot rocket in Miami on July 19th. In a game context, they count the same, but when we are trying to measure the likelihood of future home runs, we should acknowledge that the outcome of one of those fly balls (the short one) was entirely dependent on its ballpark and weather context, while for the other fly ball, the ballpark and weather were irrelevant to the outcome. The short fly ball could only have become a home run in a park with a very shallow RF fence like AT&T Park, and only with the help of a tail wind. The long one would have been a homer in every park major league baseball has ever been played in, in any wind short of a hurricane blowing towards home plate.

Any system that cannot recognize the difference between two events such as these Hermida home runs cannot hope to consistently generate highly accurate predictions. I don’t mean this as a criticism of anyone who has created a projection system, don’t get me wrong. But I do believe that those systems have reached the limit of their capabilities, with average errors of around 60-70 points of OPS, and any further refinement of these models will probably just chase the statistical noise around in circles.

Something Different

How can we get away from the practice of predicting future outcomes by using prior outcomes? I believe that the key is to consider the lower-level processes that lead to the final result of any particular batted ball. Some of these are the landing point of the hit, how hard the ball was hit, and the physical environment that the ball was hit in. For those batted balls where the physical environment is crucial (i.e. long fly balls), we need to measure the trajectory of the ball, the fence dimensions of the park, and the weather. For the rest of the batted balls, where the physical environment isn’t very important to the final result, we don’t need to.

In Hit Tracker, I have developed a method for analyzing the trajectory of long fly balls and projecting them into each of the 30 MLB ballparks for the purpose of generating a performance forecast. It is my hope that this system will yield more accurate performance forecasts.

How It Works: Steps in the Hit Tracker Forecasting Method

• Observe all long fly balls hit by a player in the past 1-3 years.
• A long fly ball is defined as any ball the player hit that might have approached or cleared the fence, if hit in any of the 30 MLB ballparks in any reasonable weather conditions.
• This very liberal standard is applied to ensure that all the long fly balls are captured. Having a few not-so-deep flies in the data set won’t cause any problems, because if a particular ball turns out to be a flyout in every park, this is equivalent to not including that ball in the analysis.
• Analyze each long fly ball in its actual weather conditions, to determine its launch characteristics (Speed Off Bat, Horizontal and Vertical Launch Angles, Spin).
• Note each long fly ball’s original result (2B, 3B, HR, Flyout, etc.).
• Project each long fly ball into each of the 30 MLB ballparks, in the average weather conditions for that ballpark (calculated over a 5-year period).
• Note the hypothetical result of each projected fly ball in each ballpark.
• Balls that fly far enough to clear the fence are judged to be home runs.
• Balls that hit the fence more than 8 feet above field level are judged to be extra base hits.
• All other balls are considered to be "catchable," and are analyzed further using a range model.
• The range model uses standard assumed initial positions of outfielders, a distance vs. time model for an average outfielder, the actual landing point of the ball and the time of flight of the ball to determine if the ball would have been caught.
• An empirical method was used on approximately 1,000 actual fly balls to determine the 50/50 likelihood boundary between outs and hits, in terms of time and distance from the closest outfielder. This boundary is then used as the evaluation criteria for catchable balls: balls inside the range circle of any outfielder for a given time of flight are flyouts, and balls outside it are extra base hits.
• For each ballpark, count the net hits and bases for the long fly ball data set:
• For each ball that was originally a hit, but projected as an out, give a -1 for hits and –X for bases (e.g. for a ball that was originally a short home run to RF in Yankee Stadium, but which projects to be caught in Fenway Park, give -1 hit and -4 bases.)
• For each ball that was originally an out, but projected as a hit, give +1 for hits and +X for bases (e.g. for a ball that was originally a flyout to LF at Yankee Stadium, but which projects to hit the Green Monster in Fenway Park more than 8 feet up, give +1 hit and (usually) +2 bases.)
• For each ball that was originally a hit, but which projects to be another sort of hit, give ± X bases (e.g. for a home run to RCF in Shea Stadium that projects to be an extra base hit in Citi Field, give -2 or -1 bases, depending on the speed of the runner, the location of the hit and the time of flight.)
• Apply the net adjustments to hits and bases for all the long flies to the player’s actual stats for the season in question. Calculate OBP/SLG with the adjustments. This becomes the player’s projection for that ballpark.
• For projections based on multiple years of long fly balls, apply appropriate weighting factors (e.g. 3-2-1) to the projections for each ballpark.
• Using the MLB schedule for the season of the projection, create a projection for the player as a member of each team by multiplying their performance averages in each ballpark by a weighting factor proportional to the number of games each team plays in each park.

Case Study: Manny Ramirez

To further illustrate the method, I am going to highlight some of the findings from the Hit Tracker Analysis of Manny Ramirez over the years 2006-08, and his forecast for 2009.

First and foremost, I hope Manny Ramirez re-signs with the Los Angeles Dodgers for 2009, because Dodger Stadium is an absolutely perfect place for him to hit. I am not saying it is perfect for everyone; in fact, Dodger Stadium is a difficult place to hit for average or below average hitters, because its fences are deep in the corners where lesser hitters typically place their home runs. I am saying that Dodger Stadium is perfect for Manny. Manny’s swing, particularly his phenomenal power to center and right-center field, is ideally suited for the dimensions and environmental conditions of Dodger Stadium. I described the unique layout of Dodger Stadium (deep corners, shallow alleys and center field) in detail in my article, "Hit Tracker 2008," which was published in the 2009 Hardball Times Annual earlier this off-season.

At the opposite extreme, Manny’s home from 2000 to the 2008 trade deadline, Fenway Park, has robbed him of a great number of home runs over the years, perhaps as many as 50, as well as many other extra-base hits. Fenway’s very deep right-center and right fields have turned many of Manny’s towering opposite field drives into outs, and its 37-foot high Green Monster has turned many of his blistering drives to left and left-center field into doubles (or even singles).

A popular image exists of the Green Monster adding lots of extra-base hits to a hitter’s total by turning shallow fly balls into wall-scraping doubles, but this hasn’t been the case for Manny: in the three seasons 2006-08, Manny only hit 6 doubles at Fenway that would have been outs at Dodger Stadium. Over the same period, Manny hit 23 flyouts, 5 doubles and 1 triple at Fenway that would have been home runs at Dodger Stadium.

In the first 4 months of 2008, Manny encountered a particularly bad run of luck with his deep fly balls; despite racking up 20 home runs during that time, Manny could have gotten a lot more. Here is a list of Manny’s deep fly balls for the Boston Red Sox in 2008 that were not actually home runs, but which would have been home runs on an average day in Dodger Stadium. Where the weather negatively impacted his fly ball to a significant degree, this is listed as well:

• April 2, 2008 at Oakland, 407 ft. flyout to deep CF, lost 11 ft. of distance from wind and temperature.
• April 5, 2008 at Toronto, 387 ft. double to LCF.
• April 8, 2008 at Boston, 395 ft. triple to RCF, lost 25 ft.
• April 11, 2008 at Boston, 361 ft. flyout to RF, lost 7 ft.
• April 17, 2008 at New York Yankees, 395 ft. flyout to CF, lost 3 ft.
• April 24, 2008 at Boston (7th inning), 383 ft. double to RCF
• April 24, 2008 at Boston (9th inning), 402 ft. flyout to CF
• May 5, 2008 at Detroit (2nd inning), 415 ft. double to RCF
• May 5, 2008 at Detroit (3rd inning), 416 ft. flyout to CF
• May 6, 2008 at Detroit, 404 ft. flyout to LCF
• May 7, 2008 at Detroit, 402 ft. flyout to CF
• May 18, 2008 at Boston, 368 ft. flyout to RF, lost 9 feet
• May 19, 2008 at Boston, 386 ft flyout to CF, lost 11 feet
• May 23, 2008 at Oakland, 356 ft. flyout to RF, lost 6 feet
• June 4, 2008 at Boston, 364 ft. flyout to RF, lost 18 feet
• July 9, 2008 at Boston, 429 ft double to LCF off top of Monster
• July 19, 2008 at LA Angels, 378 ft double off RF wall
• July 27, 2008 at Boston, 410 ft flyout to RCF triangle
• July 30, 2008 at Boston, 367 ft flyout to LCF, lost 11 ft

Now, to be fair we have to look at the good luck Manny encountered during that same time frame. Here’s the list of Manny’s deep fly balls for the Boston Red Sox in 2008 that were actually home runs, but which would have not have been home runs on an average day in Dodger Stadium (there are 4):

• May 12, 2008 at Minnesota, 354 ft home run to RF
• May 27, 2008 at Seattle, 361 ft home run to RF
• June 1, 2008 at Baltimore, 382 ft home run to RF, got +23 ft help
• July 8, 2008 at Boston, 384 ft home run to LF, got +32 ft help

That’s a net of 15 balls hit by Manny in the first 4 months of 2008 that had the power to fly out of Dodger Stadium, but which didn’t make it out where Manny actually hit them. Watching the video of these hits, the disbelief and disgust on Manny’s face was apparent after several of his blasts came up short due to deep fences, cold/windy weather or a combination of the two. Once he was traded to LA, those balls started making it out at a much higher rate: Manny connected for 9 home runs in only 80 at-bats in Dodger Stadium in 2008.

Forecast: Manny Ramirez 2009

Manny’s forecast for 2009 is based on analysis of all 248 long fly balls he hit during the 2006, 2007 and 2008 seasons. In 143 games in 2009, Manny should continue to perform extremely well in a Dodger uniform: the Hit Tracker forecast projects him to post the following numbers:

Los Angeles Dodgers: .430 OBP, .641 SLG, 1.071 OPS and 36 home runs (including 21 at Dodger Stadium).

As of the posting of this article, Manny is still a free agent, so here are forecasts for some other teams Manny might sign with:

San Francisco: .428 OBP, .618 SLG, 1.047 OPS, 32 home runs.

NY Mets: .417 OBP, .566 SLG, .983 OPS, 26 home runs.

More Forecasts

Here are the Hit Tracker forecasts for several other MLB players. Some of the projections are based on three years of data (2006-08), while some are based only on one year of data (2008). The three-year forecasts are expected to be more accurate.

Forecasts Based on 2006-08 Data

Jason Bay, Boston Red Sox
Boston: .368 OBP, .501 SLG, OPS .869, 27 HR’s

LA Dodgers: .394 OBP, .587 SLG, .981 OPS, 47 HR’s
Washington: .389 OBP, .555 SLG, .944 OPS, 43 HR’s
NY Mets: .382 OBP, .506 SLG, .888 OPS, 35 HR’s
Atlanta: .387 OBP, .543 SLG, .930 OPS, 41 HR’s
Boston: .392 OBP, .549 SLG, .941 OPS, 39 HR’s

Forecasts Based on 2008 Data Only

Mark Teixeira, New York Yankees
New York Yankees: .420 OBP, .588 SLG, 1.008 OPS, 32 HR’s

Matt Holliday, Oakland Athletics
Oakland: .418 OBP, .563 SLG, .981 OPS, 28 HR’s
San Francisco: .426 OBP, .593 SLG, 1.019 OPS, 32 HR’s
Boston: .420 OBP, .557 SLG, .977 OPS, 25 HR’s
New York Yankees: .422 OBP, .584 SLG, 1.006 OPS, 32 HR’s
New York Mets: .417 OBP, .546 SLG, .963 OPS, 24 HR’s

Nate McLouth, Pittsburgh Pirates
Pittsburgh: .348 OBP, .484 SLG, .833 OPS, 29 HR’s

Validation

In an attempt to validate the Hit Tracker forecasting method, I analyzed the 2007 long fly balls of three players who changed teams during the 2007-08 off-season: Torii Hunter, Aaron Rowand and Jim Edmonds. Using this data, I projected their 2008 results as a member of the teams they ended up with, and compared to their actual performances in 2008.

Torii Hunter

HT Projection as Los Angeles Angel: .325 OBP, .485 SLG, .810 OPS, 25 HR’s
Actual as Los Angeles Angel: .344 OBP, .466 SLG, .810 OPS, 21 HR’s

Slightly off on the home runs, but overall a very good projection.

Aaron Rowand

HT Projection as San Francisco Giant: .373 OBP, .507 SLG, .880 OPS, 25 HR’s
Actual as San Francisco Giant: .339 OBP, .410 SLG, .749 OPS, 13 HR’s

This is terrible, but there is an explanation: on June 6th, Rowand sustained a right quadriceps injury that hindered him the rest of the year. His actual production splits are as follows:

Through June 6th: .396 OBP, .526 SLG, .922 OPS, 23 HR’s (pro-rated for a full year)
After June 6th: .303 OBP, .338 SLG, .641 OPS, 9 HR’s (pro-rated for a full year)

The HT projection matched the pre-injury Rowand reasonably well, considering the small sample size of about 1/3 of a season. Since the forecast was based on a relatively injury-free 2007 season, this is a fair comparison to make, I think. By the way, if anyone ever comes up with a way to predict the performance of a player who plays hurt through the final 96 of his 152 games, do me a favor: a) tell me what the stock market is going to do in the next year, b) wait a couple days, c) tell the world. In a year, I’ll be rich, and you’ll be famous!

Jim Edmonds

HT Projection as SD/CHC: .346 OBP, .488 SLG, .834 OPS, 18 HR’s
Actual as SD/CHC: .343 OBP, .479 SLG, .822 OPS, 20 HR’s

This is another good projection. Edmonds hit a lot of deep fly balls to left-center field in 2007 that were caught in his home park, Busch Stadium. That tendency carried over to the following season, but it didn’t help him in San Diego, where he started the year. However, after a May trade to the Cubs, Edmonds found a place where that swing worked well. Left-center field is the most favorable spot in Wrigley Field for home runs, and Edmonds took advantage, hitting 6 of his 11 Wrigley home runs into the bleachers in front of Waveland Ave. On the road he picked his spots well also, hitting 7 of his 9 away homers to left and left-center field. A projection that either didn’t factor in Edmonds’ home park, or which couldn’t discern his tendency to hit the other way with power, would be at a disadvantage when trying to accurately forecast Jim Edmonds.

Here are some possible adjustments I considered, but decided not to include in the Hit Tracker system:

Regressing a player’s numbers towards the league average BABIP is a common tactic in projection systems. Instead of leaving alone all the non-long fly balls, I considered trying to adjust these hits according to the hitter’s BABIP, e.g. taking away an appropriate number of hits from the projection if the player showed an unusually favorable BABIP during the prior season(s).

My objection to this method is that I don’t feel that I can be certain that a player’s unusually high (or low) BABIP was due to luck instead of due to some underlying real factor. I don’t want to assume that a player’s BABIP should be a certain value, and regress back towards that value, because I don’t feel confident enough that I can pinpoint what that value should be for each individual player. I definitely don’t want to regress all hitters towards a common BABIP. In any event, the use of three years of data to generate projections should minimize any possibility of a player’s wildly aberrant BABIP ruining his projection.

Adjusting a projection for a player’s age is another common tactic which has some merit when one’s objective is to be correct "on average," for a large group of players. However, I feel uncomfortable applying an aging correction factor "across the board," without any regard for a player’s particular situation. Perhaps on average hitters lose a small amount of their power each year, but I don’t feel like I can say for which hitters that is true, and for which hitters that is not true, so I have chosen to leave out an aging factor.

I freely admit that an ideal forecasting system of the future will include some method for predicting the effects of aging on future performance, and that I am leaving it out. In the future I hope to be able to incorporate predictive aging into the HT model in terms of lower-level parameters such as speed off bat, or the direction of hits, rather than a crude adjustment of the final results. Such changes in hitters’ spray patterns can readily be detected (a good example is Jim Edmonds, whose long fly balls have decreased in distance and shifted from RF towards LF for the past several seasons.)

Modeling aging in this more detailed manner should also allow for situations where a decline in raw hitting performance does not manifest in a decline in results, such as a power hitter who loses a bit of distance on his fly balls, but still clears the fence with room to spare. I don’t want to paint that hitter, or any hitter for that matter, with the broad brush of "aging means the numbers get smaller"…

Overall "Regression to the Mean"

Some systems regress all of a player’s box score stats towards a selected value, typically a mean value for a subset of the population such as the AL, NL or all of MLB. The purpose of doing so is to account for the possibility that, due to limited sample size, a player has fortuitously outperformed or underperformed their true talent level. The league mean values are used because it is believed that it is impossible to accurately pinpoint a player’s true talent level.

It is certainly true that in any large sample of players, there will be some players that significantly outperform their true talent, some who significantly underperform, and some who perform roughly at their true talent level. In a system where box score outcomes are the only form of data, it makes sense to regress the outcomes to the mean: even though such a system might make some strange predictions (a career high 3 homers in 2009 for Juan Pierre, who has hit one ball out of the park in his last 1,097 at bats?), overall it will perform better than it could without applying such regression.

However, the Hit Tracker system accounts for variation from true talent level in a different way: by including all long flies instead of just homers, the luck factor for ballparks and weather is removed. By including multiple years of data, the sample size becomes even bigger, further decreasing the need to compensate via some form of regression to the mean. With these methods in use, I don’t feel it is appropriate to also add 75 or 80 at-bats from Gabe Gross to the reigning NL MVP’s numbers from 2008 before trying to predict how Prince Albert will do next year.

Advantages of the Hit Tracker System

• The Hit Tracker system goes a long way towards removing statistical noise from the projection. Most good or bad luck a player may have had because they hit a particular ball in a large or small park, in favorable or unfavorable weather, will be removed.
• Analyzing all long fly balls increases the sample size for evaluating power potential, which is one of the most important variables in performance projection. This method makes it possible to detect unlucky trends (Adam Dunn hit 16 balls more than 400 feet that were not home runs in 2008), or lucky trends (9 of Mark DeRosa’s 21 homers in 2008 were blown over the fence by the wind.)
• Team-specific projections are created, but without the use of the extremely blunt instrument known as Park Factors. Because park-based projections are used, the fit of a player’s spray profile to a park’s dimensions and weather is included, and is crucial. The frequency of visits to other parks is also included, capturing the importance of the unbalanced schedule and the vagaries of the interleague schedule.
• Hit Tracker projections are based entirely on what a player does, rather than what an average player does. Since the HT method is focused on making an accurate projection for a single player (and not an entire league), it does not use across the board regression to the mean. Regression to the mean compensates for variables that are missing from a model: Hit Tracker measures those variables instead.

• The HT method is time-consuming. The observation data required for this method is not for sale, and the analysis can only be done by me.
• The HT method requires video of all batted balls for the player in question. If any hits are not available, the accuracy of the forecast may be reduced proportionally to the percentage of missing balls.
• Because the method depends on analysis of long fly balls, there is a limited ability to evaluate rookies.

Between now and the beginning of the 2009 season, I hope to post some more forecasts for other players, or perhaps expand some of the one-year forecasts listed above to three years. After the 2009 season we’ll have a chance to see how well this method did. I’m hoping that Hit Tracker will be able to bring the process of making projections forward to where weather forecasting was in the 1970’s: occasionally way off, more often on the money, but still far short of perfection (which is forever out of reach). Then we’ll figure out what the next step is…

Greg Rybarczyk is the creator of Hit Tracker, an aerodynamic model and method for recreating the trajectory of batted baseballs. With Hit Tracker, Greg has analyzed more than 15,000 MLB home runs over the past 3 seasons; a multitude of data on hitters, pitchers, ballparks and more can be found at hittrackeronline.com. While not tracking hits, Greg works as a reliability engineer, and he lives in the Portland, OR area with his wife and two children. Feel free to contact Greg at grybar@hittrackeronline.com.

Holy crap -- did this guy Greg just land on the moon?!? I feel like I staring at the Gates of Heaven of player projection. Surely the Dodgers will sign Manny if anyone in that front-office reads this article!

I would love to see Albert Pujols projection...but I think it will probably look like his normally excellent statistical line. As a Cards fan I would be most interested in Ryan Ludwick and Rick Ankiel since neither has a track record to go by...thus making them terrible candidates for the normal projection systems.

I am glad someone is trying a new direction, even if I am way underqualified to have any opinion on it. I hope lots of baseball scientists start examining your thoughtful work and expand upon it.

We've already seen lots of good writing here and there around the pitchfx stuff, so it seems like between the two projection systems could be improved.

Obviously, personal observation is what drives this projection and as noted, is horribly time-consuming. Any time you can spend that much time over detailed information, hopefully the results would be much better.

This is great stuff but obviously only works for you since it is your data. Perhaps once you got the methodology down, you can hire a staff to help you do this on a broader basis. Or perhaps sell it to one of the baseball stats houses, I recall there was one starting up about 5-10 years ago to compete with STATS (in the Northwest somewhere), don't know if they survived or not, but they hired people to look at videos of every play and record the data into their database. Your methodology would be useful for a company that is looking at all videos anyhow, and if the projections are really that much more accurate, they would make a lot of money selling those projections to teams. Perhaps they would be willing to go 50/50 with you on selling such projections if your methodology and projections are good enough.

Very interesting stuff. I think that the future of projection systems lies in using more granular, unbiased metrics, and the HitTracker system certainly goes a long way in doing this.

As for adjusting for unusually high or low BABIP - I've been working on building a model to essentially strip luck out of the equation and arrive at a "luck-neutral" expected BABIP, or xBABIP. Check out the pieces on THT:

http://www.hardballtimes.com/main/article/batters-and-babip/

http://www.hardballtimes.com/main/fantasy/article/whats-the-best-babip-estimator/

Oh, and thanks for doing Aaron Rowand, I've been trying to figure out what caused his steep dropoff and you found out about his quadricep problem. Where did you find that information? I had been looking all over for information like that. Thanks.

i've wanted to do this for a while, but obviously hit f/x doesn't exist yet. this is pretty friggin' tremendous, especially if you're going to hook us up with a spreadsheet.

it also makes me wonder what the stability of xbh% is like...

Ken at February 5, 2009 8:01 AM -

On the other hand, using the same info, Adam Dunn and a couple decent pitchers for the same amount of money doesn't sound bad either.

Ken - Are you sure about the Dodgers signing Manny? If I were them, I see just how close Adam Dunn is to Manny for about 1/3 the total and go out and sign Dunn, Sheets, and Wolf tomorrow for about the same as Manny.

This looks good, but I wonder what can be done about installing a system similar to the one used in tennis for resolving player challenges. I find it amazing that they have a system that tracks the trajectory, angle, and spin of a ball to determine if the judges call was correct.

A trailblazing team might be interested in installing a system that would be able to do the same thing in their home stadium. This would provide a competitive advantage by accessing your data for players in their stadium.

Sounds promising.

One more thing I like about this is it shows Dodger stadium is not a poor power park. A true power hitter will still succeed in Dodger stadium because the gaps and center are not deep. You just take out all the cheapies that the Juan Pierres of the world usually hit. That's why I'm glad the Dodgers never went out and got a player like Juan...wait. Nevermind.

Phenomenal work. I figured with all of the batted ball data being available, we'd eventually get a projection system that accounted more thoroughly for the types of contact players were making, but I never imagined anything like this. Beautiful work. This makes park factors look downright pedestrian. :-)

Wow. Talk about labor intensive. But definitely cool. Interesting that Fenway might've hurt Manny... it makes sense to me. He doesn't hit many cheapies. Fenway mostly helps the Bucky Dents of the world.

I like what I see in that Teixiera projection :)

I wonder how many front offices are doing exactly this when they evaluate FA's or possible trade targets.

Thanks for the kind words, guys.

Some replies:

Pujols projection: most likely will be pretty similar to his recent production, unless/until he changes teams or gets hurt. He's been pretty consistent, and has no holes in his swing to exploit, no quirks in what he does, etc.

Expanding coverage, working with a company. I have done some checking on this. Nothing happening yet with companies in terms of projections, the interest has been with individual players, but when 2009 is done, if these projections do significantly better than the rest of the existing systems, I will expect my phone to be ringing.

Installing a system in a park: works for tennis because the lines are few in number, easily monitored, and all the questionable calls are at the line. Baseball is harder because to track all balls, you have to monitor everywhere on a vastly larger playing field, and for homers you would have to monitor outside the field as well.

Chris - BABIP - Thanks, I will check that out. I do need to try to do some more research on BABIP broken out by long flies and everything else. I looked at around 900 balls, and the BABIP for long flies (that stayed in the park) was a bit higher than for other types of balls, but maybe not significantly so. To correct non long flies, I'd need to know xBABIP for non-long flies, which might not be equal to xBABIP...

Rowand injury: I didn't start by looking for injuries, I started with the projection, and I was suprised by how much he underperformed it. Then I checked in baseball-reference.com for his game logs, and saw that his hitting rates peaked around early June, and then checked the game logs for injuries. He left the game on June 6th, so I found some stories on it and that was that. Now you can Google Aaron Rowand injuries 2008 and get a decent list as well, but I came at it from the other direction at first...

Front offices: I can confirm that there are several front offices very interested in this, and I expect that interest to grow over time, especially when some of the projections that diverge from the consensus numbers prove to be true...

Any questions, drop me an email or comment here...

Greg

You are definitely off to a good start, however to truly have a better projection system, I think you will need to include the BABIP and Age adjustments, to make the extra work you are doing justifiable.

As it stands I dont see much difference between your target numbers and other systems(James #s from 2009 and 2008 Bill James handbooks):

bay for this year:

Jason Bay, Boston Red Sox

HitTracker: .368 OBP, .501 SLG, OPS .869, 27 HR

Bill James: .376 OPB, .505 SLG, OPS .881, 30 HR

really pretty close, we have to wait and see who wins this little battle.

on to the 2008 projections:

Torii Hunter

HitTracket(LA): .325 OBP, .485 SLG, .810 OPS, 25 HR

Bill James(MN): .329 OBP, .482 SLG, .811 OPS, 27 HR

actual (LAAofA): .344 OBP, .466 SLG, .810 OPS, 21 HR

that was for the wrong team and the differences between the BJ formula and your mountains of legwork are negligible.

Aaron Rowand

HitTracker(SF): .373 OBP, .507 SLG, .880 OPS, 25 HR

BillJames(Phi): .347 OBP, .473 SLG, .820 OPS, 21 HR

Actual(Giants): .339 OBP, .410 SLG, .749 OPS, 13 HR

So here we have the wrong team again, yet the James system is closer than yours adjusted for one of the best hitting parks, which I suspect would get even closer across the board if you put the Legend in a Giants uni.

I guess my point is that this system is interesting, but the results are too similar to the established theories to justify the mountains of data you have to collect to get your results. If you could incorporate this new data into say the James projection system, then I think you would have something.

Do you see any ratio you could use for the weather adjustments so you dont have to check every game? Also the wind doesnt blow the same speed all day long. Did you get the wind readings down to the minute? If so, I think you are NUTS but I love your passion for your work.

I hope I didnt come across as belittling your ideas, I think there is a place for them, and Im impressed with what you have compiled. Good luck with your hypothesis.

Eric,

True, the Bay and Hunter numbers didn't diverge much from the BJO numbers, but the other projections I made did, sometimes by quite a bit:

Manny: .955 BJO, 1.071 HT (if he signs with LA, that is)

Teixeira: .956 BJO, 1.008 HT

Dunn: .913 BJO, around .940 HT for several teams

Holliday: .937 BJO, .981 HT

After the season, we ought to be able to tell which system was better for these guys...

Also, on the Rowand numbers, I'd give BJO the advantage IF they were predicting that he'd play 1/2 the year hurt and get the numbers he did. However, if they predicted he'd be healthy and get those numbers, then they are off by a lot. For the record, I project Rowand in 2009 to hit the numbers I gave for 2008, if healthy. I'm not going to try to figure out how he will do if he's partially healthy. I could see someone else trying to do so based on his typical ratio of health to non-health, though.

Eric, did you actually read the article? He explained the difference for Rowand quite nicely, I thought.

Eric, I forgot to talk about weather. It is a combination of pre-game weather and minute-by-minute weather. First I compiled the box score weather numbers, which are typically taken about 150-30 minutes before the game starts (and thus rather undescriptive of actual game conditions, often). Next, I looked at the average changes in weather during a three hour period starting at game time for various months, and then applied a correction to account for this to the average bos score weather values (I hope that made sense). This was easier than trying to average all of the weather streams, which often are posted at 1-minute intervals...

However, for actual observations of hits, I do use weather values from the moment the hit was made - typically by observing flags for the wind in the video, and for temperature I use the closest available Weather Underground station. If I can't see the flags, I will use wind from WU as well, or sometimes if that won't work, I'll go forward or backwards in the video stream until I can see some flags moving, and use that. Most of the time the wind readings are reliable, occasionally these methods may be off some, of course. I'd like to see MLB put weather stations in the parks that can be accessed from the Internet...

I think you are right, by the way, in a manner of speaking I am NUTS :) But, I am having a lot of fun doing what I'm doing...

Greg,
It really is impressive work(and yes OGC I did read it, my point was despite the details the other projection still hit closer to the mark). Your weather study is the best part, and I hope you are closer than BJO on the sluggers you mentioned. The Holliday one is the most interesting, since BJ is lower despite being in a hitters park(if its not updated from the 09 handbook, i dont have mine handy to check) where as you have it considerably higher in a pitchers park.

Your Rowand projection is definitely the highest Ive seen as far as HR with 25. I suspect that is due to the fact that you exclude age as a deterrent where as the general rule is that at 31/32 years old you are typically passed your prime. Hes only had HR totals in the 20s twice in his 8 year career, and those seasons were the only ones where he hit over .300 as well. In 2004 with the White Sox(.310 24 HR -prime- 27 years old) and 2007 with the Phillies(.309 27HR). Two of the more lenient parks for HR as Im sure you are aware. I dont doubt that there is something to the wind and weather that you can quantify to better predict performance, I just worry that the other factors will mess with your totals.
again,
good luck,

Eric

I'd be very interested to see how your projections look for David Wright and Jose Reyes, after your initial analysis on CitiField showed it to likely be pretty tough on HRs.