Projections vs. Reality
I love stats. I’ve been a baseball stathead since I was a kid. I kept track of my own batting average in Little League as well as walks, extra base hits (those were easy – there weren’t many of them), and RBIs. I used to check the box scores in The Record every morning to see what the Mets’ updated stats looked like after the previous night’s game (Unless the game was on the west coast – ahhh, the dark ages), and I’ve continued that habit into the present day.
Over the past 10 years or so, I’ve been familiarizing myself with sabermetrics. I like them. I find them useful. Especially the ones I can understand. Advanced metrics have given me a whole new perspective on the careers of baseball players today, and a renewed appreciation of players from the past.
The one relatively new development I haven’t gotten on board with is the concept of statistical projections.
Projection systems – like PECOTA (which was released today for the 2014 season), ZiPS, or Steamer – are usually based on the average performance of a player over the last 3-4 years, taking into account decline or improvement caused by age and other factors. They’re meant to be an objective measurement of what a ballplayer is supposed to be based on his track record.
First of all, how accurate are they? Let’s look at projections for a few Mets players for the 2013 season, and how they stacked up against reality. These are based on the ZiPS algorithm:
ZiPS assumed Wright would play a full season, despite the fact that he missed a good chunk of 2011 with broken back induced by a collision with Ike Davis. For this, I give ZiPS credit. One of my major pet peeves with projection systems is that they’ll take an injury from 3 years ago, and no matter how freakish or isolated it was, they’ll assume that the player will miss a significant chunk of time in his subsequent years.
As it turned out, Wright missed a month and a half with a hamstring pull. Had David put up 610 plate appearances in 2013, his doubles and RBI totals would have been close to the ZiPS projection. He would have hit a few more home runs, however, and ZiPS completely missed his batting average, on-base percentage, and slugging percentage. This is probably due to the fact that he hit .254/.345/.427 in 2011 (while playing with the back injury), which was a statistical outlier when you look at his entire career.
Two of the biggest misses ZiPS came up with last year were Ruben Tejada and Ike Davis. I’m going to throw those out because of how badly they deviated from their previous trends. Not even the most carefully crafted software program could have seen that coming.
One player that ZiPS seems to have the most success with is Daniel Murphy. Here are his 2013 results:
ZiPS predicted fewer plate appearances because of his 2011 injury. I’m not sure why it factored Murphy’s injury into his projected plate appearances, but didn’t factor Wright’s into his.
If ZiPS had the number of PAs right, the rest of his stats would have been very close. ZiPS also came close to accurately projecting Murph’s 2012 numbers.
Anyone with a strong knowledge of baseball stats can look at a player’s career trends and make an educated guess about the kind of season he’s about to have. You don’t necessarily need a supercomputer to do so. But if I’m compiling projections for every player in the league (plus minor leaguers), then you’d better believe I’m going to have a CPU do it for me.
So, while I still believe blind reliance on projections would be ill-advised, I can see how they can be useful. They can help a front office decide what they have, which in turn helps them decide what they need. I would still treat them as just one of many tools in the box, however, since their accuracy can fluctuate.