May 26, 2013

THT Essentials:
Fangraphs Player Search:


And here's the full roster.

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.



Or you can search by:

THT E-book


Third Base: The Crossroads is THT's e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.



Get your very own THT merchandise from our CafePress store. We've got baseball caps, t-shirts, coffee mugs and even wall clocks with the classy THT logo prominently displayed. Also, check out the THT Bookstore. Please support your favorite baseball site by purchasing something today.


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

Friday, January 23, 2009

Draft Manifesto (part 2)


In the first part of my Draft Manifesto, I went over a few quick strategies that I try to keep in mind during a draft. I took some flak about my stance on catchers, namely, that the top-shelf ones are often drafted too high in my opinion. But the resulting discussion was enlightening and caused my stance to soften a bit. In any case, here's a continuation of that piece.

7) Don't take a starter until the 8th or 9th round unless you have a very good reason. Wins fluctuate wildly from year to year, but for some reason preseason rankings always seem to assume 18-, 19-, 20-, and 21-game winners are going to repeat that the following year. Odds are overwhelming that they won't. In addition, there's always guys who get called up midseason (Cole Hamels), turned to starters midseason (Francisco Liriano), or who demonstrate a true talent improvement during the season (Cliff Lee) that you can capitalize on. In most 12-person leagues, most players simply pick up starters based on Wins and ERA to that point in the year. By looking at stats like FIP and even simple ones like K/9 and BB/9, you can make better decisions than your opponents on who to pick up, who to trade, and who to trade for. Hitters are far more consistent from year to year, and I think the relatively unsophisticated fantasy owners still have a good sense about the true talent of hitters.

8) Avoid hitters who have had wrist injuries. If you ask 10 people what the most important part of the body is when swinging a baseball bat, you'll get 10 different answers. But one of them will be the wrist, and wrist injuries just seem to take the longest for players to recover from.

9) After the 9th or 10th round, give precedence to any certain-closers that remain. When someone needs saves, they'll often overpay for them. Closers make the best trade bait mid-season.

10) Power hitters make the second-best trade bait. For some reason, Juan Pierre is draftable in the 6th round, but almost entirely untradeable mid-season. Likewise with Wily Taveras—always drafted in a reasonable position in the draft, but impossible to trade. Players are far more likely to give up on steals and focus on the power stats than they are to make a mid-season trade to solidify steals. In the draft, err on the side of too much power over too much speed.

11) Know the value of the Mark DeRosas. Players most often get six games per week, and days off either Mondays and Thursdays. So a bench hitter is really only looking at one game per week that he could fill in over regulars. Derosa played 149 games last year, so 149/162 or 92% of the time, and was qualified at 1B, 2B, 3B, and OF. On any given Monday or Thursday that he was playing, odds were you had a 1B, 2B, 3B, or OF whose team had the day off. The fantasy season is typically 22 games long, so in assessing his value as a bench guy, we want to look at 92% of his production per 22 games. This comes out to a .285 BA, 13 Runs, 3 HR, 11 RBI, and 1 SB. Obviously not a ton of production, but enough to swing a few games during a head-to-head season, or a few spots in a Roto league. Of course, the tradeoff is the benefit of those numbers, versus an extra SP or middle reliever that provides some value as well. I think one bench utility player like DeRosa is a must, even in leagues where opponents are maximizing their bench slots for starters—if you're dedicated, you can even boost that production a bit more by subbing him in for regulars when the park or opposing pitcher makes it an even more favorable game for him.

I once again welcome comments that anyone wishes to share. I'm not saying I'm correct about all of these assertions—they are almost entirely anecdotal—but they're my best guess at ways to go towards an optimum draft experience.

Posted by Michael Lerra at 12:06pm (16) Comments

Monday, January 26, 2009

THT Season Preview 2009 goes to printers


After months of hard work, THT's David Gassko is getting a well-deserved rest. He's been hard at work on The Hardball Times Season Preview 2009, which has now been sent to the printers.

I'm not going to try telling you everything that's in the book; David did that over on the main site today. In brief, I've got an article in there on rookies to watch, and THT Fantasy's Chris Neault and Victor Wang also have articles on injuries and risk, respectively. Plus, there's team commentary from some of the top bloggers, projections, fantasy values, depth charts, and all kinds of other stuff.

I haven't had a chance to read it yet, but if the all work that went into it is any indication, it should be a good one. If you're going to buy it, please support THT by purchasing it from the publisher’s website using this link. THT gets much less if you buy it from a site like Amazon.

On an unrelated note, we've got more articles going today than can fit on the main page, so make sure you don't overlook the other two: mine on BABIP estimators and Victor's on late-round picks.

Posted by Derek Carty at 3:10am (0) Comments

What’s the best BABIP estimator?


BABIP is a stat that lots of people like to throw around but many don't fully understand (even some who profess to be statistically inclined).

Background info


BABIP stands for Batting Average on Balls in Play. It measures the rate at which balls in play fall in for hits. Essentially, any ball that the batter makes contact with, puts into fair territory, and does not become a home run falls into the domain of BABIP. It is calculated as (H-HR)/(AB-K-HR).

We use BABIP to evaluate both pitchers and hitters, but the way in which we use it differs greatly among the two. Most pitchers regress toward the league average BABIP of around .300 or .305. Very few pitchers can repeatedly do better or worse than this, so we say that pitchers have very little control over BABIP.

Hitters, on the other hand, can have a substantial amount of control over BABIP. Ichiro Suzuki, for example, has a .356 career BABIP. Hitters do not regress toward league average, rather, they each regress toward their own, unique number.

The big question these days seems to be, what is that number? Today, I'd like to look at several ways of determining it and see which is best.

The test


This is something I've been curious about for a while, so I took as many BABIP estimators as I could think of and decided to put them up against each other to see which does the job of predicting the following year's BABIP the best.

The combatants


  • Previous year BABIP (BABIP): This is simply the player's BABIP from the previous year.
  • Expected BABIP (xBABIP): This is a BABIP model created by Chris Dutton and Peter Bendix, introduced at THT last month in this article. xBABIP is the primary reason for this article as I have been very curious how well our newest model does what it intends. Also, please note that Chris has tweaked the model a little since the original article ran. Please check the bottom of this article for more details.
  • Quick Expected BABIP (qxBABIP): As Dutton and Bendix's xBABIP includes some stats that aren't readily available to the casual fan, they've created a simplified version using stats that are readily available, of course, at the (expected) expense of accuracy.
  • Line drive BABIP (ldBABIP): This is the one that gets the most play. Everyone seems to be using it these days, but for reasons I've explained many times before, I'm not a fan. It's calculated as line drive rate plus .120.
  • Studes BABIP (studesBABIP): This one was created around the same time Dave Studeman put out Line drive BABIP but doesn't get nearly the same attention. Not a whole lot more difficult to calculate, but uses more than one variable. It's calculated as 0.245 + 0.52 * LD% - 0.16 * FB% + 0.11 times K%.
  • Expected Batting Average BABIP (xBA BABIP): This one is the BABIP portion of Baseball HQ's Expected Batting Average (xBA) statistic. I should note that this uses HQ's SX stat, which I couldn't replicate precisely. I was, however, able to get it very close. Also, because SX and PX are indexes based on league (American/National) average, for player's switching teams mid-year, I weighted each based on games spent with each team.
  • Marcels BABIP (mBABIP): This isn't so much an estimator as a projection, but I thought it would be good to include for context. It's simply what Marcels projects for the following year. It's also currently what I'm using in my True Batting Average calculations.

The process


I used data from 2004 to 2008, matching players from one year to the next. As xBABIP was the reason for doing the study, I had to work around that a little bit. xBABIP wasn't calculated for anyone with fewer than 300 plate appearances, so I made that the cut-off for both year one and year two. There are some biases with using cut-offs, but there's no way around it in this instance.

From there, I adjusted each stat for differences in league average and ran a couple of tests. You can see the results below.

The results


+---------------+-------+--------+---------+---------+-------------+-----------+--------+
| TEST          | BABIP | xBABIP | qxBABIP | ldBABIP | studesBABIP | xBA BABIP | mBABIP |
+---------------+-------+--------+---------+---------+-------------+-----------+--------+
| Correlation   | 0.38  |  0.50  |   0.45  |   0.20  |       0.32  |     0.40  |  0.46  |
| R-Squared     | 0.14  |  0.25  |   0.20  |   0.04  |       0.10  |     0.16  |  0.21  |
| Average Error | 0.028 |  0.021 |   0.022 |   0.029 |       0.024 |     0.022 |  0.022 |
+---------------+-------+--------+---------+---------+-------------+-----------+--------+

As you can see, there's a pretty clear pecking order in these results:
+------+-------------+
| RANK |   ESTIMATOR |
+------+-------------+
|    1 |      xBABIP |
+------+-------------+
|    2 |      mBABIP |
|    3 |     qxBABIP |
+------+-------------+
|    4 |   xBA BABIP |
|    5 |       BABIP |
+------+-------------+
|    6 | studesBABIP |
+------+-------------+
|    7 |     ldBABIP |
+------+-------------+

I've also broken things down by tiers. Dutton and Bendix's xBABIP seems to be the best, and I can only imagine what looking at multiple years of it would do. Just one year of data can explain 25 percent of the change in BABIP, a very big number for a stat with such wide variability. That it beats three years worth of Marcels data (plus regression to the mean and age adjustments) is excellent as well.

After that comes Marcels (which I've currently been using), and the quick version of xBABIP (which, I should note, doesn't include a not-hard-to-apply team adjustment. I didn't include it for some logistical reasons, but it would likely improve the accuracy a bit). It's very nice to see the quick version grade out so nicely since it will be easy to calculate in-season (although thanks to Sal Baxamusa, Marcels isn't very difficult either).

Then comes Baseball HQ (which Average Error thinks belongs in tier two) and actual BABIP, followed by Dave's more complex BABIP estimator (which was derived back at the beginning of 2005 when we were first starting to work with batted ball data).

Finally, line drive BABIP — which is the arguably the most popular of any other measure on this list — comes in dead last, well below everyone else and significantly worse than simply using actual BABIP. I've long said that I dislike this way of estimating BABIP, and it's very nice to see the tests confirm it.

Going forward


Going forward, I'll be using xBABIP in place of Marcels BABIP in my True Batting Average calculations and when discussing a player's BABIP in general. I'm committed to giving you guys the best there is, and Chris and Peter's model is tops among any BABIP estimator that I know of. If you missed the original article, I'd definitely recommend you go back and read it.

Some notes from Chris Dutton


Chris worked a lot with me on this, and I really appreciate his receptiveness and helpfulness. Here are some things he wanted me to pass along.

First, he has changed the model a bit since the original article. Here are the exact changes and his explanation of them:

Old formula: Hitter eye, Pitches per extra-base hit, LD%, FB/GB, Speed score, Contact rate, Spray, Pitches per AB
New formula: HR/FB, IF/FB, LD%, FB/GB, Speed score, Lefty*(FB/GB%), Contact rate, Spray

The differences are basically that I used hr/fb as a measure of power rather than pitches per extra base hit, added popups/FB to measure poorly hit balls, and included an interaction variable of lefty*(fb/gb%) to adjust for the fact that lefty ground ball hitters tend to often hit balls to the right side of the field (which rarely become hits). I also removed pitches_per_AB, which seemed to be potentially correlated with other variables, and removed hitter_eye since contact rate seemed to be capturing a very similar effect.

Chris also says that he's isn't done improving the model. He is constantly looking for ways to improve it even further, and is specifically hoping to incorporate some PITCHf/x data as soon as possible.

Finally, Chris is developing a tool that would allow readers to easily calculate the quick version of xBABIP. This would prove to be useful in-season when we constantly need to be changing our evaluation of hitters. While constantly calculating things like Spray would be time-consuming and difficult, the quick version utilizes stats that are all readily available and — as the tests show — is still effective. The tool also has some other cool features: interactive graphs, projected stat lines, and some other things you might find useful.

References and resources


Expected BABIP and Quick Expected BABIP data was provided for me by Chris Dutton. A big thanks to him for his help and also for helping to create such an excellent stat.

Marcels BABIP was taken from Tango's site. The rest of the stats I calculated myself.

Posted by Derek Carty at 3:17am (17) Comments

Dollar specials


In this article I'll be looking at a few end gamers who can provide some good value for 2009. By end gamers I mean players you can get for $1 near the end of your auction or at the end of your draft. One thing to keep in mind when taking these players is that often times one ends up evaluating these players based on a limited sample size. For example, you may have cut a player like Jason Kubel early last year only to end up seeing him put up a pretty solid season. This is a topic for another article some time but is something to keep in mind.

Nelson Cruz, OF, Texas: Cruz is a very interesting case. He gets really good projections by pretty much every projection system out there and is slated to start on opening day for Texas. However, part of me feels that he may be overvalued come draft day. There's a little bit of uncertainty around his projections and a lot of owners out there will be looking for the next Ryan Ludwick. Cruz figures to be one of the more popular picks to be that guy. Despite this, if you can get him for cheap or towards the end of your draft, I would highly recommend it.

Joey Devine, RP, Oakland: Brad Ziegler is likely to be Oakland's closer come opening day. However, look for him to face some regression to the mean. Meanwhile, Joey Devine figures to get the first crack at a closer's job should Ziegler falter. Devine displayed a very good skill set in his first season in Oakland. While we have a bit of a limited major league sample size on him, Devine's scouting reports tend to back up his performance from 2008.

Boof Bonser, SP/RP, Minnesota: Look past the ERA from his last two years. Bonser was troubled by a lot of bad luck over the the past few years, including a ridiculously low 57.9% LOB. He has displayed solid skills over the past few years and should see some better luck with his more dependent pitching statistics. The Twins' rotation could be tough to crack but you never know what could happen with injuries. You figure this would be a perfect opportunity for a major league team to buy low and trade for Bonser as well, which would improve his fantasy stock tremendously.

Yusmeiro Petit, SP, Arizona: Petit has always put up some pretty good minor league numbers. However, scouts have always questioned how well his minor league stats would translate to the majors. Petit struggled initially in his first few major league appearances but showed glimmers of some upside in 2008. There are concerns with his flyball rate but if Petit keeps his skills growth up, there is some solid upside here.

Gary Sheffield, OF/Util, Detroit: Sheffield had a pretty brutal year, on the surface at least. However, he had some bad luck with a poor hit rate, granted that he did have pretty low line drive rate. Sheffield still does have a solid set of secondary skills, with solid walk and strikeout rates while maintaining decent power skills. Look for him to bounce back this year. While he does have some potential to collapse, you can deal with that sort of risk if you commit to him during the end of a draft or for cheap at an auction.

Russell Branyan, 3B, Seattle: Branyan has a decent chance to start on the right side of a first base platoon for Seattle. You know what you're going to get with Branyan: a mediocre batting average but very good power. If gets consistent playing time, Branyan can really help with your power numbers. However, you'll need to get some additional batting average support from somewhere else.

Cha Seung Baek, SP, San Diego: Baek was a nice pick up from Seattle last year. His lack of a true out pitch limits his upside potential, but he showed a solid skill set last year. Of course, it helps to be pitching in PETCO park. Baek should be a solid addition to the end of a fantasy rotation, though he might not be able to help much with your win totals.


Posted by Victor Wang at 3:50am (8) Comments

Tuesday, January 27, 2009

Consistency meter: Alfonso Soriano


Five-tool players are extra valuable for one simple reason: They have a positive impact on every category. Only a handful of players are truly five-faceted, and even fewer keep up this standard of production over a period of time. That is why Alfonso Soriano is so impressive; he has been a five-tool player his entire career.
image
Soriano sporting a uniform that would make any fantasy owner proud (Icon/SMI)

+------+-----+-----------+-----+-------+----+-----+-----+----+
| YEAR | AGE | TEAM      | AB  | BA    | HR | RBI | R   | SB |
+------+-----+-----------+-----+-------+----+-----+-----+----+
| 2004 |  28 | Rangers   | 608 | 0.280 | 28 |  91 |  77 | 18 |
| 2005 |  29 | Rangers   | 637 | 0.268 | 36 | 104 | 102 | 30 |
| 2006 |  30 | Nationals | 647 | 0.277 | 46 |  95 | 119 | 41 |
| 2007 |  31 | Cubs      | 579 | 0.299 | 33 |  70 |  97 | 19 |
| 2008 |  32 | Cubs      | 453 | 0.280 | 29 |  75 |  76 | 19 |
+------+-----+-----------+-----+-------+----+-----+-----+----+

Soriano certainly has had an impressive prime, but looking at the end of the above chart gives the impression that perhaps his golden years are behind him as he enters his ninth MLB season.

Compounding the concern is his recent injury history, something he never experienced in his first six seasons, save a minor hamstring injury in 2004 that has been recurring throughout his career. His broken finger in 2008 did result from a hit by pitch, meaning I would label it a fluke, but nevertheless I would call Soriano a medium injury risk in 2009.

With all this in mind, let's look at his stats to see what we can expect from him in the upcoming season.

Power ability


For a leadoff hitter Soriano has always been notoriously powerful, setting the record for most home runs to lead off a game in 2003. His power totals peaked in 2006, when he hit an astounding 46 home runs, but have since returned to normal, settling in around 30.

If you're new to THT Fantasy Focus and are unfamiliar with True Home Runs (tHR) or any of the other stats I'm using, check out our quick reference guide. These stats provide a much clearer picture of a player's talent, so it's well worth taking a couple of minutes to learn them.
+------+-----+-----------+-----+----+-----+-------+--------+--------+--------+
| YEAR | AGE | TEAM      | AB  | HR | tHR | HR/FB | tHR/FB | nHR/FB | OF/FB% |
+------+-----+-----------+-----+----+-----+-------+--------+--------+--------+
| 2006 |  30 | Nationals | 647 | 46 |  33 |    20 |     14 |     17 |     47 |
| 2007 |  31 | Cubs      | 579 | 33 |  28 |    17 |     14 |     14 |     44 |
| 2008 |  32 | Cubs      | 453 | 29 |  24 |    18 |     15 |     14 |     45 |
+------+-----+-----------+-----+----+-----+-------+--------+--------+--------+

Soriano seems to have the ability to outperform his True Home Run (tHR) totals by about five—a few other players just seem to be able to do that—so I would not consider Soriano's totals the past two years out of line. The other good news is that if we extrapolate Soriano's totals from the past two years to a full season's worth of play (remember he missed time in both years), we get 35 home runs for 2007 and 32 for 2008.

Now for the bad news. Soriano will be 33 years old next season, so I would expect about three fewer home runs from him simply because of his age. And then there is the ominous downward trend in power ability. I cannot say for sure (not that I can say anything for sure) if the downward trend will continue, but I can try to explain part of it. To do that, we will need to look more closely at his batted ball data.

+---------+---------+------+-----+--------+-----+-----+--------+-----+
| LAST    | FIRST   | YEAR | AB  | OF/FB% | FL% | LD% | IF/FB% | GB% |
+---------+---------+------+-----+--------+-----+-----+--------+-----+
| Soriano | Alfonso | 2006 | 647 |     42 |  11 |  14 |      4 |  29 |
| Soriano | Alfonso | 2007 | 579 |     41 |   9 |  14 |      2 |  34 |
| Soriano | Alfonso | 2008 | 453 |     36 |  22 |  11 |      3 |  29 |
+---------+---------+------+-----+--------+-----+-----+--------+-----+

While it appears that Soriano's outfield fly ball percentage (OF/FB%) has been stable, if we include the ever-intriguing fliner into the mix, a decline in OF/FB percentage appears. An admittedly slight decline, but enough to explain part of his recent drop in power.

Just as I cannot say if his decline in power will continue, I also cannot substantiate any claim that his fliner percentage (FL%) will remain at 22 percent in 2009. Twenty-two percent is well above the average 11 percent fliner rate, but also not extreme enough to guarantee a meaningful regression.

To clarify, (and I invite you to skip this paragraph if you feel you understand) when looking for home runs, outfield fly balls are optimal because they go over the wall the highest percentage of the time compared to the other types of batted balls. Fliners are better overall in terms of run production, but do not become home runs very often. So when Soriano hits 13 percent more fliners and five percent fewer outfield fly balls as he did in 2008, his home run totals are gong to suffer, by about three home runs. If he were to hit fewer fliners in 2009, his outfield fly ball percentage almost surely would tick up a few points, resulting in a couple of home runs.

From this information—despite all the uncertainty mentioned above—we still can come up with a reasonably small range for Soriano's expected home run total in 2009. My low end projection is 24 home runs. In making this prediction I expected the same high fliner percentage and some time missed because of injuries. The high end projection is 32 home runs, which is assuming good health for the most part.

Overall I expect Soriano's power totals to remain the same as last year, around 28 home runs. Let's now check on that batting average of his.

Contact ability


+------+-----+-----------+-----+-------+-------+-----+-------+--------+-----+--------+---------+
| YEAR | AGE | TEAM      | AB  | BA    | tBA   | CT% | BABIP | mBABIP | LD% | BIP/HR | BIP/tHR |
+------+-----+-----------+-----+-------+-------+-----+-------+--------+-----+--------+---------+
| 2006 |  30 | Nationals | 647 | 0.277 | 0.255 |  75 | 0.302 |  0.300 |  20 |     11 |      15 |
| 2007 |  31 | Cubs      | 579 | 0.299 | 0.273 |  78 | 0.337 |  0.313 |  20 |     14 |      16 |
| 2008 |  32 | Cubs      | 453 | 0.280 | 0.274 |  77 | 0.305 |  0.312 |  23 |     12 |      15 |
+------+-----+-----------+-----+-------+-------+-----+-------+--------+-----+--------+---------+

First off, if you are wondering why Soriano is also outperforming his True Batting Average (tBA), it is because tBA uses the tHR numbers. If we cancel out the home run noise his tBAs for 2006, 2007 and 2008 become .267, .280 and .283 respectively. That makes things look a bit rosier, but besides that not much else is happening. Let's look at those plate discipline stats.

+------+-----+-----------+-----+-----+------------+------+-------------+----------+
| YEAR | AGE | TEAM      | AB  | CT% | JUDGMENT X | A/P  | BAT CONTROL | BAD BALL |
+------+-----+-----------+-----+-----+------------+------+-------------+----------+
| 2006 |  30 | Nationals | 647 |  75 |         85 | 0.65 |          85 |       52 |
| 2007 |  31 | Cubs      | 579 |  78 |         93 | 1.07 |          84 |       57 |
| 2008 |  32 | Cubs      | 453 |  77 |         90 | 0.94 |          83 |       57 |
+------+-----+-----------+-----+-----+------------+------+-------------+----------+

Soriano has below average judgment, and makes about just as many passive mistakes and aggressive ones (A/P). Interestingly, he was more passive in 2006, his monster home run year. He is about average in terms of hitting balls inside the zone (Bat Control) and is exactly league average at hitting balls outside of it (Bad Ball).

Soriano will never be a great average hitter, but for the time being it appears that he still will bat somewhere in the .270s. If you feel he will hit the high end of his power projection, expect something closer to .280, and if you feel the opposite, think high .260s.

Speed ability

+------+-----+-----------+-----+----+-----+-------+------+-----+-----------+-------------+
| YEAR | AGE | TEAM      | AB  | SB | SBA | SBO%  | SBA% | SB% | FAN SPEED | FAN BALLOTS |
+------+-----+-----------+-----+----+-----+-------+------+-----+-----------+-------------+
| 2004 |  28 | Rangers   | 613 | 18 |  23 | 0.220 |   16 |  78 |        84 |           8 |
| 2005 |  29 | Rangers   | 637 | 30 |  32 | 0.186 |   25 |  94 |         0 |           0 |
| 2006 |  30 | Nationals | 647 | 41 |  58 | 0.206 |   39 |  71 |        84 |          27 |
| 2007 |  31 | Cubs      | 579 | 19 |  25 | 0.201 |   20 |  76 |        82 |          32 |
| 2008 |  32 | Cubs      | 453 | 19 |  22 | 0.211 |   21 |  86 |        73 |          56 |
+------+-----+-----------+-----+----+-----+-------+------+-----+-----------+-------------+

Soriano has stolen at least 20 bases every season of his career, and certainly has the potential to steal a lot more. Recurring leg injuries have prevented him from reaching his potential and since joining the Cubs he has been attempting to steal a lower percentage of the time (SBA%). If his legs are not bothering him in 2009, I could see Soriano stealing up to 30 bases, but more realistically I see him getting about 15-20 steals.

Final thoughts


If you were nervous about selecting Soriano in your draft, I would not worry because it seems that this will not be the year he burns out. My biggest concern with him is injuries, and he is not even a high risk compared to plenty of other players.

If you want him on your team, you will have to take him early. In the mock drafts I've seen and participated in, he has gone as high as 16th overall and as low as 34th. With my second-round pick, I probably would not pull the trigger on him but in the third I would have no reservations selecting him.

Posted by Paul Singman at 1:01am (5) Comments

Wednesday, January 28, 2009

Introducing: CAPS Road Park Factors


image
Everyone knows Dan Haren is a great pitcher, but could he be even better than we think?
(Icon/SMI)

As I'm sure all of our regular readers know (and all the ones Rob Neyer sent over here), a couple of weeks ago I unveiled a new stat that I called CAPS (Context Adjusted Pitching Statistics). If you aren't familiar, I'd definitely suggest checking out that article, but to briefly summarize, CAPS adjusts a pitcher's peripheral numbers based on a number of different contexts to give us a better idea about what that pitcher should be expected to do going forward.

Up until now, CAPS adjusted for home ballpark, quality of batters faced, and any league change. Today, I'd like to add one more adjustment to the mix: road ballpark factors.

As we know (and as David Gassko covered thoroughly in this article) ballparks can have a significant effect on just about every stat we fantasy players look at: everything from runs and home runs to strikeouts, walks and ground balls. For the majority of baseball players, we tend to ignore these effects because they remain on the same team from one year to the next. The context remains exactly the same, so it has no bearing on our expectations.

Have you ever considered, though, what the effects might be of all of the games that are played on the road? While a player may play all of his home games in the same environment, his mix of road stadiums will undoubtedly differ from year to year. When most everyone talks about ballpark factors, they talk about the home ballpark impacting the numbers, working under the assumption that the road side is completely neutral. This, however, is simply not true.

A pitcher who happens to throw a disproportionate number of times in PETCO and AT&T Park will be helped in the home run department simply as a matter of context, the same as a pitcher who throws too often in Coors and Chase Field will be hurt. So with the help of Retrosheet, I've calculated an individualized "Road Park Factor" for each pitcher using his exact blend of road ballparks (and the time spent in each) for every year back to 2004 and for every stat we care about, neutralized it, and then applied a 2009 factor based on the exact 2009 road schedule for every team.

Method


Only read this if you're interested in hearing a little more about exactly what I did. If you're not interested, you know all you need to and can skip to the next section.

The method used is pretty intuitive, but to elaborate just a bit further, I calculated each pitcher's road park factor by weighting each park he played in depending on the number of opportunities he had to accumulate each stat. To come up with the strikeout factor, for instance, I looked at every non-HBP batter faced. For all types of hits and batted balls, I looked at all fair, contacted balls. (If it makes it easier, think about it in terms of Pizza Cutter's flow chart.) Once I arrived at the factor, I simply applied it to each pitcher's road stat line.

The one other note I need to make deals with batted-ball types (ground balls, infield flies, etc.). Because Retrosheet classifies these differently than Baseball Info Solutions does, I wasn't able to apply these factors to the pitcher's road stat line. Instead, for batted balls, I had to cut the pitcher's full-year line in half, applying the road factors to one half and the home factors to the other half. It shouldn't make that much difference, but it does need to be noted.

After coming up with these factors and neutralizing the player's line for each year, I then took each team's 2009 road schedule and combined the ballparks appropriately. I then applied this factor to every year we'll look at to put all of the numbers into the context of 2009, which is what we care about.

CAPS: Where we're at


To summarize, the CAPS numbers you'll be seeing going forward take all of the following into account:
  • Past home ballpark
  • 2009 home ballpark
  • Past road ballparks
  • 2009 road ballparks
  • Past quality of opponents (neutralized)
  • League switch adjustments
  • Ground balls adjusted for league average line drive rate (called xGB)

How large is the road ballpark impact?


As I noted earlier, baseball analysts have long ignored road park factors, assuming these things are neutral. While logically we know this isn't true, could it just be that the effects are so small that this is a fair assumption to make? Let's take a look at the leaders and trailers for 2008 and find out.

Note: The "a" before each stat in the third column stands for "adjusted." This is what the player's stat would look like if it was neutralized for road park. Also, because more strikeouts are good and fewer walks, homers, and hits are bad, the tables are arranged so that the five unluckiest are always on top and the five luckiest are always on the bottom, regardless of stat.
+------------------+-----+-------+------+      +-----------------+-----+------+------+
| PLAYER           | K   | aK    | DIFF |      | PLAYER          | BB  | aBB  | DIFF |
+------------------+-----+-------+------+      +-----------------+-----+------+------+
| Gil Meche        | 104 | 108.5 |  4.5 |      | Miguel Batista  |  34 | 33.0 |  1.0 |
| Ubaldo Jimenez   | 100 | 104.5 |  4.5 |      | Jeff Suppan     |  33 | 32.1 |  0.9 |
| Jorge de la Rosa |  62 |  65.9 |  3.9 |      | Manny Parra     |  30 | 29.1 |  0.9 |
| Zack Greinke     |  93 |  96.8 |  3.8 |      | Felix Hernandez |  32 | 31.1 |  0.9 |
| Josh Beckett     | 105 | 108.2 |  3.2 |      | Dave Bush       |  24 | 23.1 |  0.9 |
| ..................................... |      | ................................... |
| Chad Billingsley |  89 |  86.3 |  2.7 |      | Zach Duke       |  21 | 21.6 |  0.6 |
| Felix Hernandez  |  78 |  75.3 |  2.7 |      | Phil Dumatrait  |  24 | 24.7 |  0.7 |
| Jake Peavy       |  67 |  64.2 |  2.8 |      | Paul Maholm     |  29 | 29.8 |  0.8 |
| Cole Hamels      | 106 | 102.7 |  3.3 |      | Tom Gorzelanny  |  34 | 34.9 |  0.9 |
| Ricky Nolasco    | 106 | 101.5 |  4.5 |      | Ian Snell       |  44 | 45.3 |  1.3 |
+------------------+-----+-------+------+      +-----------------+-----+------+------+

+----------------+------+--------+------+       +-----------------+-----+------+------+
| PLAYER         | H-HR | aH-aHR | DIFF |       | PLAYER          | HR  | aHR  | DIFF |
+----------------+------+--------+------+       +-----------------+-----+------+------+
| Jon Lester     |   93 |   90.1 |  2.9 |       | Javier Vazquez  |  14 | 12.6 |  1.4 |
| Jon Garland    |   99 |   96.3 |  2.7 |       | Shaun Marcum    |  14 | 12.6 |  1.4 |
| Josh Beckett   |   78 |   75.8 |  2.2 |       | Brett Myers     |  16 | 14.6 |  1.4 |
| Joe Saunders   |   75 |   72.9 |  2.1 |       | Aaron Harang    |  16 | 14.7 |  1.3 |
| Jeff Weaver    |   88 |   86.0 |  2.1 |       | Gavin Floyd     |  12 | 10.7 |  1.3 |
| ..................................... |       | ................................... |
| Kyle Kendrick  |   97 |   99.6 |  2.6 |       | Scott Olsen     |  18 | 19.0 |  1.0 |
| Mike Mussina   |   87 |   89.8 |  2.8 |       | Jon Garland     |  12 | 13.0 |  1.0 |
| Andy Pettitte  |  103 |  106.0 |  3.0 |       | Nate Robertson  |  16 | 17.0 |  1.0 |
| Cliff Lee      |  113 |  116.4 |  3.4 |       | Todd Wellemeyer |  13 | 14.0 |  1.0 |
| Carlos Silva   |  121 |  124.8 |  3.8 |       | Brian Bannister |  14 | 15.0 |  1.0 |
+----------------+------+--------+------+       +-----------------+-----+------+------+

Looking at our four leaderboards (the one on the bottom left represents all singles, doubles and triples, if it isn't clear), we can see that the effects aren't huge, but they are there. Obviously the biggest raw differences are seen with strikeouts and hits because they are more numerous to begin with, but these effects are pretty large even in a relative sense.

With 4.5 more strikeouts, Gil Meche's K/9 would have jumped 0.2 points from 7.8 to 8.0. Twenty previously unaccounted for points of K/9 is huge. In terms of walks, the effects are much smaller, with Felix Hernandez's BB/9 falling from 3.59 to just 3.55 and Miguel Batista's from 6.18 to 6.11. Even Ian Snell's would only have risen 0.08 points.

Looking at home runs, though, we see some big changes. Aaron Harang's HR/FB would have fallen from 15.3 to 14.7, which explains a sizable portion of his unlucky-looking HR/FB this year. It's very nice to be able to write it off to a specific cause instead of simply to "bad luck" (although it wouldn't really be wrong to do).

Of course, we're dealing with the extremes, but you can see that the assumption that road effects are neutral is simply not true. Also, while these effects won't be very large for many players, the whole point is to add this onto our current CAPS system. When we combine all of the different effects—even if any one is small in isolation—we can see some big differences in value. And that, I believe, is what fantasy leaguers care about. If this can highlight for us just a few undervalued players or help us to avoid a few overvalued ones, this becomes a powerful, powerful tool.

Also of interest (albeit perhaps more to the non-fantasy crowd) is the groupings, which some of you may have picked up on. If you notice, all five of the luckiest in walks are Pirates. Three of the unluckiest with walks are Brewers. The unluckiest with hits are all Red Sox and Angels. Two of the unluckiest with strikeouts are Royals and two are Rockies. As this started as an exercise to determine "divisional park effects" (the inspiration for which came from commenter Nick on the original CAPS article), it's not surprising to see players of the same team appear on the lists together.

Derek Lowe


I didn't get a chance to post a full article about Lowe when he signed with the Braves, so we'll take a quick look at him now.
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+
| YEAR | LAST | FIRST | TEAM    | IP    | QERA | K/9  | BB/9 | K/BB RI | xGB% | BABIP | HR/FB |
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+
| 2005 | Lowe | Derek | Dodgers | 222.0 | 3.88 | 5.92 | 2.23 |    0.04 | 59.4 | 0.286 |  21.4 |
| 2005 | Lowe | Derek | Braves* | 222.0 | 3.91 | 6.06 | 2.16 |    0.05 | 59.2 | 0.276 |  20.0 |
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+
| 2006 | Lowe | Derek | Dodgers | 218.0 | 4.09 | 5.08 | 2.27 |   -0.19 | 63.8 | 0.293 |  12.3 |
| 2006 | Lowe | Derek | Braves* | 218.0 | 4.15 | 5.05 | 2.17 |   -0.22 | 63.6 | 0.281 |  11.0 |
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+
| 2007 | Lowe | Derek | Dodgers | 199.3 | 3.75 | 6.64 | 2.66 |    0.13 | 62.8 | 0.292 |  17.9 |
| 2007 | Lowe | Derek | Braves* | 199.3 | 3.76 | 6.62 | 2.54 |    0.11 | 62.6 | 0.281 |  15.8 |
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+
| 2008 | Lowe | Derek | Dodgers | 211.0 | 3.65 | 6.27 | 1.92 |    0.15 | 57.8 | 0.286 |  10.1 |
| 2008 | Lowe | Derek | Braves* | 211.0 | 3.62 | 6.16 | 1.62 |    0.18 | 58.3 | 0.276 |   9.0 |
+------+------+-------+---------+-------+------+------+------+---------+------+-------+-------+

Nothing of much interest here. Lowe's adjustments are minimal, as will be the case with a lot of players. I normally try writing about the players who are more interesting, but as Lowe is a guy I'm sure many of you have been wondering about ... ta-da! As I said earlier, though, the value in the CAPS system won't be the guys that it values the same, but rather the guys who it sees a big difference in. Check out our next player for a case like that.

Lowe is obviously an extreme groundball pitcher, though he does manage to strike out about as many batters as a league-average pitcher. This has value in fantasy leagues, as does his mid-3.00s ERA. Overall, taking Lowe in the round 12-to-15-area of a traditional, 12-team mixed league should get you a fine player. He seems to struggle a little with home runs, but he keeps his BABIP pretty low, and the move from the Dodgers to the Braves could help. The UZR difference between the two was 4.8 per 150 last year.

Dan Haren


We covered Haren in the original CAPS article, but he seems to have also caught a string of bad luck on the road—particularly with strikeouts—and we didn't look at all of his numbers last time.
+------+-------+----------+-------+------+-----+------+---------+------+-------+-------+
| YEAR | LAST  | TEAM     | IP    | QERA | K/9 | BB/9 | K/BB RI | xGB% | BABIP | HR/FB |
+------+-------+----------+-------+------+-----+------+---------+------+-------+-------+
| 2006 | Haren | A's      | 223.0 | 3.75 | 7.1 |  1.8 |    0.44 |   45 | 0.292 |  14.2 |
| 2006 | Haren | D'Backs* | 223.0 | 3.49 | 7.6 |  1.6 |    0.57 |   45 | 0.299 |  16.0 |
+------+-------+----------+-------+------+-----+------+---------+------+-------+-------+
| 2007 | Haren | A's      | 222.7 | 3.71 | 7.8 |  2.2 |    0.52 |   44 | 0.292 |  10.5 |
| 2007 | Haren | D'Backs* | 222.7 | 3.54 | 8.4 |  2.3 |    0.63 |   43 | 0.301 |  11.9 |
+------+-------+----------+-------+------+-----+------+---------+------+-------+-------+
| 2008 | Haren | A's      | 216.0 | 3.17 | 8.6 |  1.7 |    0.81 |   45 | 0.308 |   9.7 |
| 2008 | Haren | D'Backs* | 216.0 | 2.76 | 9.5 |  1.4 |    1.09 |   46 | 0.310 |   9.3 |
+------+-------+----------+-------+------+-----+------+---------+------+-------+-------+

As you can see, Haren has had some terrible luck for a few years now. In terms of his strikeout rate, he's probably the unluckiest pitcher in baseball over the past three years.

This bad luck wasn't quite as pronounced in past years (and part of those past years' numbers are due to the league change, so he really shouldn't have been expected to post them with the A's), but in 2008 Haren really deserved much better. His strikeout rate was almost a point too low, and his QERA was a ridiculous 2.76. 2008's actual QERA leader was C.C. Sabathia's Brewers stint at 2.89, to put things into perspective.

We shouldn't expect him to post identical numbers in 2009, but he has steadily risen four years in a row, will be 28 years old, and should have a good deal of luck catching up with him. His current Mock Draft Central ADP is 57.39, which would put him at the end of the fourth round in a 12-team league, though I have seen him go in the sixth. I'm not a fan of taking pitchers that early, but if Haren falls into the eighth or ninth round, I don't imagine I'll be passing him up. If the strategy you're employing allows you to take starters earlier than that, Haren seems like a very good choice.

Concluding thoughts


As I said in the original CAPS article, if you guys have any ideas for further things we could adjust for, feel free to contact me. If you have any questions about CAPS or anything fantasy baseball related, also don't hesitate.

Errata


In the original CAPS article, I accidentally applied the home ballpark factors to Javier Vazquez's entire line instead of just the home side. This has been fixed, and the new CAPS numbers (with road ballpark adjustments included) are displayed below. As you can tell, very little changes, and my evaluation remains the same; Vazquez makes a great fantasy pick this year.

Javier Vazquez
+------+-------+------+------+------+---------+------+-------+-------+
| YEAR | IP    | QERA | K/9  | BB/9 | K/BB RI | xGB% | BABIP | HR/FB |
+------+-------+------+------+------+---------+------+-------+-------+
| 2006 | 202.7 | 3.84 |  8.2 |  2.5 |    0.59 |   40 | 0.311 |  10.7 |
| 2006 | 202.7 | 3.37 |  9.4 |  2.3 |    0.87 |   40 | 0.311 |   9.0 |
+------+-------+------+------+------+---------+------+-------+-------+
| 2007 | 216.7 | 3.34 |  8.8 |  2.1 |    0.84 |   38 | 0.294 |  12.1 |
| 2007 | 216.7 | 2.89 | 10.1 |  2.0 |    1.15 |   39 | 0.298 |  10.1 |
+------+-------+------+------+------+---------+------+-------+-------+
| 2008 | 208.3 | 3.76 |  8.6 |  2.6 |    0.62 |   39 | 0.320 |  11.3 |
| 2008 | 208.3 | 3.30 |  9.7 |  2.4 |    0.95 |   39 | 0.319 |   9.6 |
+------+-------+------+------+------+---------+------+-------+-------+

Posted by Derek Carty at 1:05am (6) Comments

Thursday, January 29, 2009

Forecasting


There are lots of fantasy baseball writers offering advice out there. Is it useful to listen to more than one? Some websites charge money for access and all of them use up some of your limited time, so you definitely want to pick your advisors carefully. Advice generally falls into one of two categories: strategy (example: “Don’t pay for saves”) or forecasts (example: “Albert Pujols’ injury concerns are overblown”). In this article, I will write a bit about forecasting and whether it is better to listen to the best, to the stupid or to the many.

Forecasts are predictions of the future, a process which harbors two complications: life is random, so forecasts will never be exact (God does play dice with the universe), and we do not know the exact nature of this randomness (We don’t know what kind of dice God uses). Science is the search for truth: scientists want to know why things are the way they are, so they spend their time trying to figure out which dice God is playing with. Forecasters mostly care about their prediction being close to the eventual realization as possible; they don’t care about why (if they are going to make many more predictions in the future, they may want to learn why so that they can do better in the future. But for just a one shot prediction, forecasters don’t care about science).

To see why this distinction matters, consider the following two baseball experts:

Enlightened Stathead has a huge statistical model, years and years worth of data, and a supercomputer on loan from NASA which he uses to figure out which variables are statistically significantly correlated to other variables. However, Enlightened Stathead’s model is mis-specified—he’s missing a variable that adjusts for the fact that power has dropped in the league in the post-steroid era. This isn’t a mortal sin, all models are mis-specified in one way or another. If Enlightened Stathead forecasts using his NASA model, he’s on average wrong (this is called Biased in statistics), but, depending on the type of model mis-specification, on average he won’t be “very wrong” (he has a low forecast standard error).

Opinionated Idiot has a rather simpler way of forecasting. He watches each player play once in Spring Training and, based on only that data, forecasts. If a pitcher strikes out the side against the Marlins B-squad, he’s projected to be a superstar for the rest of the year. Note that Opinionated Idiot will likely have a very high forecast standard error but very little bias.

What’s the tension? The Stathead uses lots and lots of data—indeed he needs a lot of data in order to figure out what’s statistically important. This data is by definition historical; maybe he uses a century’s worth of data. The Idiot uses very little data and it is mostly of a recent vintage. With so little data, he can’t figure out much about the world he’s living in. But suppose the world he’s living in changes from time to time (God picks up a different set of dice), making old data potential much less useful in understanding a new world. Like the Stathead, the Idiot doesn’t know that God has switched dice, but since he doesn’t use old data, he doesn’t care much anyway. In baseball terms, the Idiot might adjust much faster to changes in league-wide steroid use, or at the individual level, his forecasts might adjust much faster to the fact that a pitcher has added a splitter to his arsenal than the Stathead’s would.

So whom should you listen to? Depends on the type of baseball gods we have (and I don’t know). If the baseball gods are the type to pick up new sets of dice very often, then the Idiot may be better. If not, the Stathead will be better.

And how many of them should you listen to? This is an incredibly complex question. Here’s a rule of thumb:

If we live in a Stathead’s world (not many new dice), then it will usually be better to listen to lots of different experts and form some sort of prediction based on the average of their opinions. This is the case even if you know that some of the Statheads are better (lower forecast standard error) than others—it is wise to include all of them in your average.

If we live in an Idiot’s world, then you’re kinda up the river without a paddle.



Posted by Jonathan Halket at 2:23am (1) Comments

Managing your money in Rotohog


Rotohog Baseball is a fantasy baseball game with free entry, large prizes, and a unique "stock exchange" trading mechanism. Thousands of players compete in a global contest to see who can accumulate the most points. Like some "salary cap" baseball games, Rotohog gives you the opportunity to turn over your entire roster every day, greatly increasing the importance of taking factors such as opponent and park into account when determining your lineup.

When Rotohog first launched in early 2007, one of its main selling points was its unique ‘stock market’ trading mechanism. A great deal was made of the fact that the pricing algorithms had been designed by Kent Smetters, a University of Pennsylvania professor of economics, who was one of the founders of Rotohog.

During the 2007 season, building up roster value through active trading was one of the keys to success at Rotohog Baseball. From a starting roster value of $300, the top teams all spent the bulk of the season above $700, and most of them consistently had the $900 or more that they needed to acquire whatever players they wanted without any limitations. The greatest key to success was having a lotof free time, and being at the computer to instantly sell players within seconds of games starting. This led to most players not really having any chance to be competitive, and ultimately led Rotohog to adjust the rules to reduce the importance of roster value the following season.

In 2008, Rotohog used a system of transaction fees tied to roster value to limit how much Rotohog money players would be able to accumulate while actively maintaining their roster. The transaction fees escalated sharply as a team’s roster value increased past various thresholds from $350 (where fees went from $.10 to $.75) up to $450 (where fees went from $2.50 to $4.50).

I knew that my edge over the competition would come from making superior decisions about which daily match-ups to exploit, which meant that I would need to turn over most of my roster each day. That virtually guaranteed that I would be stuck at $350 or below for the entire season. Luckily, I also knew that my opponents would sometimes need to trade players to ensure that they used their entire 162 game allotment at each position. With the extremely high transaction fees I was confident that no contenders would be able to maintain a roster value above $450. I felt that actively playing match-ups with a $350 team, I would probably be able to outperform anyone employing a “buy and hold” strategy using a $450 team. It turned out that I never had to determine if that was true, because none of the top teams managed to stay above $400 for long. In fact, most of us spent almost the entire season between about $330 and $380. The escalating transaction fees had virtually eliminated the stock market trading as a key part of success in the game. I completely ignored roster value for the entire season, planning my transactions entirely to maximize scoring each day, and didn’t suffer at all for it. My ability to do that was dependent on having a schedule compatible with Rotohog’s trading floor hours though. I know several skilled players who were not able to trade until shortly before games began each day, and instead of their roster almost magically hovering around $350 for the whole season (as mine did), they often found themselves at the minimum salary of $250, which made fielding a competitive team almost impossible.

Rotohog staff members have indicated that trading floor hours are likely to be one of the few areas of the game format that will be tweaked in 2009. Last year the trading floor didn’t open until noon Eastern. That was a really bad idea, since it meant that on the weekends, people had only a 50 minute window to be at the computer and make roster moves before their lineup would begin to lock in for the day. Rotohog will almost certainly have an earlier start to the trading day this year, or even allow very late night trading as they did in 2007. No matter what time they settle on, players who are able to make transactions shortly after the trading floor opens will have an advantage in maintaining a reasonable roster value. If you can’t generally make trades early in the trading day, you’ll need to start sacrificing some of your daily match-ups in order to avoid always buying your players at their most expensive and selling them at their cheapest. The key to this is not to have players in your lineup the day before their team has an off day (when you’ll get stuck holding them as their price drops), and to try to buy your starting pitchers several days ahead of time to avoid paying peak price.

Note that most of what I’ve discussed here will be affected by whether Rotohog changes the transaction fees to escalate more gradually and whether they still have a five day ‘holding period’ for starting pitchers. We won’t know for sure what adjustments they’ll make until their official launch date on February 23rd, so for now I’m assuming that there won’t be any changes.

Posted by Alex Zelvin at 4:29am (7) Comments


This is Page 3 of 3 THT Fantasy Focus pages  <  1 2 3