Late January means only one thing. Yup, it is projection season again. After the postseason rumpus all those clever baseball sorts lock themselves away for a few months, tinker with their spreadsheets and databases before proffering their finest guesses/estimates as to the performance of all major league players for the coming year.
The purpose of this article is to take a bunch of high profile players whom the mainstream media will be keeping their collective beady eye on and see how said players project. Projection systems tend to follow strict mathematical rules, so we’ll see how robust the numbers appear as we throw a pinch of subjective data into the mix. At the end we’ll total up the results and try to discern any trends specific to each projection system.
Part 1 will focus on hitters; a follow-up in a couple of weeks will chew the fat on pitching.
Introducing the Contenders
I have selected three different projection systems. To use a boxing analogy we have a heavyweight, a welterweight and a featherweight.
In the red corner representing the heavyweight division we have Baseball Prospectus’ PECOTA. For 2006 Sean Smith, found that PECOTA had the best collective prediction performance out of all publicly available projections. The R for PECOTA versus actual performance was 0.74, a shade behind the theoretical maximum of 0.77, calculated by Tangotiger (for 500 at-bats). PECOTA uses every technique in the book to derive its numbers, but what sets it above the competition is its use of similarity scores as an added variable with which to project likely player evolution.
In the blue corner say hello to the welterweight champion, CHONE 2.1, produced by the aforementioned Sean Smith. (Chone is his blog name.) Sean has only been in the projection business for a year but has spent an unhealthy amount of time in the off-season perfecting his model by messing around with aging curves, sample weightings and a gaggle of other adjustments that an MIT professor would struggle to understand. Sean found that in 2006 CHONE stacked up well to the competition with an R of 0.68. I expect his 2007 numbers to be even better given the effort he has put it.
In the green corner please welcome the featherweight contender: Marcel the Monkey. Marcel was created by Tangotiger in 2004. The idea was that the projections were so basic that even a monkey could make them (provided someone taught him how to use a database), hence the name. Marcel uses regression to the mean and aging curves to arrive at a forecast. The R between Marcel and actual performance was 0.66 in 2006. Any projection system that does worse than Marcel isn’t worth bothering about. We at THT have just started published Marcels here, so there is no excuse not to be familiar with them.
Let’s have a look at some batters worth watching in 2007.
1. Barry Bonds
PA AVG OBP SLG HR HR/PA GPA PECOTA 229 0.267 0.441 0.535 12 5.1% 0.332 CHONE 2.1 376 0.285 0.451 0.557 20 5.3% 0.342 Marcel 452 0.280 0.458 0.560 24 5.3% 0.345 AVERAGE 352 0.277 0.450 0.550 19 5.3% 0.340
The first question when considering Bonds is whether he’ll slam 23 over the fence to break Hank Aaron’s home run record. Mainstream opinion is divided: Perry says nay; Sports Illustrated says aye. All projections say it will be close. Marcel has him edging over the line, CHONE has him falling agonizingly short and PECOTA, well, PECOTA doesn’t give him a cat’s chance in hell. Why the difference? Well, PECOTA is a little more bearish on Bonds’ power but more importantly thinks he’ll have only 230 plate appearances. With that few opportunities Bonds will need to hit home runs at a 10% clip, which is a big ask for anyone.
Bonds mashed 26 home runs last year but he was clearly struggling for fitness and form in the early half of 2006—it wasn’t uncommon for him to go two to three weeks without a round-tripper. However, Bonds upped his home run rate significantly in August and September last year which could be the harbinger of a mini-resurgence. On that data point alone I’m more inclined to believe the monkey than anyone else. I think he’ll do it. All that said, at the moment Bonds is still haggling over minor contractual details, and with the drug spectre still firmly over his head any outcome, including sitting out the season, is possible.
In my column next week I’ll proselytize the doubters on why I think Barry will break the record.
PA AVG OBP SLG HR HR/PA GPA PECOTA 642 0.287 0.349 0.569 39 6.1% 0.299 CHONE 2.1 665 0.278 0.336 0.536 39 5.9% 0.285 Marcel 632 0.275 0.332 0.515 33 5.2% 0.278 AVERAGE 646 0.280 0.339 0.540 37 5.7% 0.287
The Fonz had a great year last year, slugging a career-high 46 homers with 24 of those in power-sapping RFK stadium—perfect timing given 2006 was his free agency year. For those suffering from baseball narcolepsy Soriano subsequently signed a mammoth eight-year contract for the princely sum of $136 million with the Cubbies.
So how will he do? Given 2006 was a career year (.277/.351/.560) both Marcel and CHONE suggest slight regression, particularly in power. PECOTA, on the other hand, pegs him to have an even better year, but that presumably includes a hefty park adjustment (which Marcel wouldn’t but CHONE would). All projections have him marginally above his career average of .280/.325/.510, so at the close of 2007 look for him to be worth 1.5 to two wins above average. Soriano has clubbed an average of 35 homers each year, so Marcel’s projection of 33 looks light. Although the 40 hurdle will be tough to clear, don’t forget the Wrigley factor—if the wind is blowing out a lazy pop-fly has a shot of clearing the fence. Last year he played three games in front of the hallowed ivy and hit .167/.167/.417. Nothing can ever be concluded on such a microscopic sample size but for $136 million Cub fans have the right to expect him to reach new heights.
3. Andruw Jones
PA AVG OBP SLG HR HR/PA GPA PECOTA 612 0.279 0.365 0.540 35 5.7% 0.299 CHONE 2.1 633 0.267 0.361 0.518 37 5.8% 0.292 Marcel 602 0.265 0.354 0.526 35 5.8% 0.290 AVERAGE 616 0.270 0.361 0.528 36 5.8% 0.293
Moving from a player who had a great free-agent year to another who is coming up on his free-agent year, Andruw Jones will probably be 2007/8′s hottest free-agent property. Interestingly, looking purely at the stat line Jones appears less of a talent than Soriano does: he projects to have similar power with a slightly lower average but does compensate with a higher on-base percentage.
Home run totals of 51 in 2005 and 41 in 2006 suggest that perhaps the forecasts err on the side of caution when it comes to the long ball, but for 2007 all project him to exceed his career average line of .267/.345/.505. Both PECOTA and CHONE see an OBP of .360 suggesting that Marcel, at .354, is likely to be at the low end. However, the projections all give some weight to his poor 2004 season when he hit .261/.345/.488 with 29 dingers. If you believe that was an aberration then you’ll probably see some upside in all the numbers.
4. Carlos Lee
PA AVG OBP SLG HR HR/PA GPA PECOTA 613 0.296 0.359 0.530 30 4.9% 0.294 CHONE 2.1 647 0.290 0.350 0.510 32 4.9% 0.285 Marcel 616 0.287 0.346 0.509 29 4.7% 0.282 AVERAGE 625 0.291 0.352 0.516 30 4.9% 0.287
There was unanimity among everyone in baseball—analysts and not analysts alike—that Carlos Lee’s $100 million six-year deal with the Houston Astros was the most egregious piece of business this offseason. Certainly an OPS range of .850 to .890 doesn’t seem worthy of such a megadeal. None of the projections expects Lee to perform at the $17 million-a-year level—his .291/.352/.516 average projection pegs him at 1.5 wins above average in 2007, which translates to a generous $12m a year (and that is ignoring his quite atrocious performance in the field).
The concern lies largely in the fact that Lee’s avoirdupois body form will probably result in a steeper that usual age-related decline. Don’t forget that he’ll bat half his games in Minute Maid where a lazy outfield fly toward the Crawford Boxes can easily become a home run, so don’t be surprised if he exceeds the projected home run totals (low 30s), which would be marginally ahead of his career average. Expect a solid season in 2007, but don’t bank on too much production as he enters the latter half of his deal.
5. Delmon Young
PA AVG OBP SLG HR HR/PA GPA PECOTA 588 0.297 0.334 0.473 18 3.1% 0.269 CHONE 2.1 510 0.302 0.341 0.489 18 3.5% 0.276 Marcel 266 0.305 0.353 0.486 8 3.0% 0.280 AVERAGE 455 0.301 0.342 0.482 15 3.2% 0.275
Switching gears from players who have been either free agents or are on the verge of free agency to those some way off, Delmon Young is one of the brighter prospects in the game. In fact he was widely regarded as the number-one prospect in the 2006 season, but a 50-game suspension for tossing his bat at an ump somewhat sullied his star. All projection systems tend to find forecasting prospects difficult because there is less data. This uncertainty means that regression to the mean is higher, so it is rare to see a prospect with awesome projections. A system that uses similarity scores, like PECOTA does, tends to have a better record as Jeff Sackmann alluded to on Friday.
His 2006 was characterized by high contact, poor patience at the plate (and away from it too) and a little power as he wound up with a line of .317/.343/.476 in 131 plate appearances. All agree that Young will pretty much replicate that performance in 2007, although Marcel is significantly down on his home runs. This is not because of an expectation that Young will lose power but rather because Marcel is ultraconservative on plate appearances. The reason for this discrepancy: the spartan nature of Young’s 2006 season caused the the monkey to toss in a lot of regression to the mean. I’d imagine that the D-Rays will find an everyday spot for Young this year, so look for CHONE/PECOTA to win this round, provided he maintains his form with the timber.
6. Frank Thomas
PA AVG OBP SLG HR HR/PA GPA PECOTA 499 0.263 0.375 0.552 34 6.9% 0.307 CHONE 2.1 455 0.256 0.373 0.488 24 5.3% 0.290 Marcel 492 0.258 0.364 0.510 29 5.9% 0.291 AVERAGE 482 0.259 0.371 0.517 29 6.0% 0.296
Based on his 2004 and 2005 campaigns, Thomas had no real right to sign a two-year $18 million contract with the Blue Jays this offseason. In those years he lived up to his Big Hurt moniker with only 108 plate appearances. As he is entering his age-39 season we’d expect the projections to heavily discount his performance as a consensus 2007 forecast of .259/.371/.517 attests—his line last year was .270/.381/.545. CHONE has him slugging less than .500—a feat he hasn’t achieved since 2002—with Marcel close behind, while PECOTA sees a continuation of the form he showed in the Colisseum. Last year Thomas snorted 39 dingers and as with the batting line all the projections build in a discount with PECOTA, again, being the most optimistic.
The Big Hurt is a difficult call. If he can remain healthy then there are plenty of reasons to believe the he’ll beat these projections. However, in three of the past six years he has played in less than half the season. One thing is for sure: the potential variance on all projections is high. Pick a line, but if I had to bet my money would be on PECOTA.
7. Ryan Howard
PA AVG OBP SLG HR HR/PA GPA PECOTA 652 0.299 0.393 0.616 47 7.2% 0.331 CHONE 2.1 641 0.290 0.381 0.599 47 7.3% 0.321 Marcel 587 0.303 0.395 0.600 40 6.8% 0.328 AVERAGE 627 0.297 0.390 0.605 45 7.1% 0.327
Irrespective of whether or not you thought that Howard was worthy of the NL MVP last year there is little doubt that he had a historically great sophomore season. Until mid-September it looked as though he would become just the sixth player to break the 60 home run mark but he came off the boil and finished just shy with 58 yard-leavers. After batting .288/.356/.567 with 22 home runs in 2005, Howard established a new level of performance in 2006 with a line of .313/.425/.658. He was a late bloomer (2007 is his age 27 season) and as such he can be treated harshly by projection systems that frown on such players. Regression to the mean is greater as there is less hard data (he has fewer cumulative plate appearances) and a system such as PECOTA that uses similarity scores can have difficulty finding appropriate matches. Marcel bears that out and has him hitting nearly 20 fewer home runs in 2007 than in 2006. However, the projections agree that Howard is .300/.380/.600 hitter at a minimum, which is still pretty special. Were he to attain those numbers he’d win 5.5 games above a replacement hitter, making his 2007 salary of approximately $400,000 a snip.
8. Derrek Lee
PA AVG OBP SLG HR HR/PA GPA PECOTA 429 0.288 0.369 0.527 21 4.9% 0.298 CHONE 2.1 488 0.289 0.376 0.535 25 5.1% 0.303 Marcel 371 0.301 0.380 0.553 19 5.1% 0.309 AVERAGE 429 0.293 0.375 0.538 22 5.0% 0.303
Derrek Lee spent a good chunk of 2006 on the DL after he broke his wrist clattering in to Rafael Furcal at first base. Lee almost had a triple-crown year in 2005 batting .335/.418/.662, so much was expected of him in 2006. The injury restricted him to a line of .286/.368/.474 in just 50 games. Perhaps unsurprisingly all projections peg his 2007 at somewhere in between those performances and more in line with his 2003-2004 numbers. Marcel has him hitting only 19 four-baggers, which superficially seems low but on further inspection is because Marcel thinks that Lee will play just half the season (it’s like a broken record isn’t it?). CHONE and PECOTA are more optimistic on the plate appearance front but are arguably still a touch conservative bearing in mind that last year aside the last time Lee had less than 500 at-bats was in 2000 (where he had 477). Rate stats are above marginally his career average of .276/.363/.500 and if you believe that 2005 was an outlier then the projections don’t look unreasonable. As with Sori, Lee will need to play at his best if the Cubs to have a shot at the NL Central.
PA AVG OBP SLG HR HR/PA GPA PECOTA 523 0.288 0.330 0.506 24 4.5% 0.275 CHONE 2.1 647 0.280 0.320 0.481 27 4.2% 0.264 Marcel 570 0.282 0.325 0.490 25 4.4% 0.268 AVERAGE 580 0.282 0.325 0.492 25 4.3% 0.269
From a personal point of view I always take a peek at the Francoeur forecasts because I yearn for evidence that someday, somehow, he’ll learn plate discipline. All the projections agree that he’ll strike 25-27 home runs with a home run rate hovering around 4.5% although only PECOTA has him clearing the .500 slugging mark. Interestingly all expect his plate discipline to improve with an OBP range of .320 to .330 versus .293 last year. That jives with his walk rate progress in 2006, which went from two walks per month over the first four months of the season to seven per month for the last two! Astonishingly Francoeur was the only Brave to start every game and racked up 686 plate appearances last year. PECOTA thinks that Cox will use him more sparingly in 2007, which, let me tell you, barring injury is sheer folly. Sometime soon Francoeur may break out and become a star hitter—none of our projections forecast it will be 2007. Braves fans, take solace in his age … at least he is only 22.
10. Albert Pujols
PA AVG OBP SLG HR HR/PA GPA PECOTA 666 0.331 0.428 0.617 39 5.9% 0.347 CHONE 2.1 628 0.323 0.424 0.646 46 7.3% 0.352 Marcel 587 0.331 0.424 0.635 38 6.5% 0.350 AVERAGE 627 0.328 0.426 0.633 41 6.5% 0.350
We can’t go through this exercise without looking at the most prolific hitter in baseball. Since bursting on to the scene in 2001 at age 21, Pujols has had six MVP caliber seasons batting .332/.419/.629 and averaging 43 gopher balls a year. In that time his lowest batting average was .314 and his lowest slugging percentage was .561—both in his sophomore year—so it is no surprise to see that all projections expect another messianic season. Largely. PECOTA is the most bearish and has him clubbing just 37 home runs, which would be significantly below his career average. Although Marcel thinks he’ll only notch 39 ripsnorters that is predicated on, yes, you’ve guessed it, just 500 at-bats. If Pujols has fully recovered from last year’s DL stint even the optimistic CHONE numbers could be conservative. 2007 is Pujols’ age 27 season—don’t be surprised if it is among the greatest of all time.
Can we discern any trends among the different systems? Is one more optimistic or pessimistic that the others? This table shows how many times each system came in highest and lowest based on GPA.
Optimistic Pessimistic PECOTA 6 4 CHONE 2.1 1 3 Marcel 3 3
Nothing too much to report except that PECOTA seems to take the extreme position more often than not. By using similarity scores, PECOTA adds different information to the mix that allows it to make more ballsy projections. As the year progresses keep an eye out on how these hitters peform. As sure as night follows day at least one of these is likely to be spectaculary wrong. The question is which one. Are there any prospective clairvoyents out there?
Tune in in a couple of weeks when we give hurlers the same treatment.