March 19, 2010

Fangraphs Player Search:

The "World Champ of Baseball Annuals


Most Recent Comments


For daily results, visit the THT Daily Archive.


And here's the full roster.



Or you can search by:

Sports Tickets

Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets.
Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com.
Chicago Cubs Tickets
Chicago Tickets
Championship Tickets





Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.
Roll mouse over date for entries
THT Live Calendar
November 2009
S M T W T F S
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30




Saturday, November 21, 2009

Building a Retrosheet database, the short form


Today marks the latest release of Retrosheet's event files, including 2009, the 1952 AL and some corrections to other years. This is a guide written for someone who already has an SQL database (MySQL, that is) set up and is comfortable with it. I am in the process of putting together a set of tutorials for those with no SQL experience whatsoever, but I've made a lot of changes to how I set up a database and I know some people would appreciate having those. And mind, this is a tutorial for Windows users. And make sure you have a lot of space free on your hard drive.

You need to install wget and 7-zip. I have made the presumption that you are installing them to C:\Program Files\. If you are not, you will have to edit some batch files.

By the same token, I am assuming that everything else goes in the folder C:\Retrosheet. If you put it somewhere else, you will have to edit some batchfiles. I have created a zip file that contains everything you need. Simply place that file in the C:\Retrosheet folder (you may have to make one first) and use 7-zip to unzip the file there. That file contains EVERYTHING you need except the event files themselves. That includes a copy of Chadwick version 0.5.2, which is used to parse Retrosheet's event files.

WARNING: The database schemas in here will only work with the version of Chadwick provided. You will get errors (or corrupted data) if you try to use an older version of Chadwick to parse these files and then load them into the database.

Go to C:\Retrosheet\data\zipped and run the get_zip_files.bat file. (If you have installed 7-zip or wget somewhere other than the default, you'll need to edit this file.) This will download all the zip files from Retrosheet's website and extract the contents. Then go to the folder C:\Retrosheet\data\zipped, which should contain the files you just extracted. There are three batch files in there:


  • $cwevent.bat
  • $cwgame.bat
  • $cwsub.bat


Run those to parse the event files through Chadwick.

Okay, time to load that database. A warning - a lot of these queries will take a long time. The first thing to do is run retrosheet table schema.sql. That will wipe out anything you have in any database named "retrosheet," so if that's important to you, make a backup before starting. Then, you will find three files in the loaders directory:


  • events.sql
  • games.sql
  • subs.sql


Run those. (In SQLyog, that option is called "Restore From SQL Dump," under the Tools menu.) They will populate the database. (Note - if you are using a different directory structure, you need to edit these files first to reflect that.) Then, run the partition.sql file - that will "partition" the database to make it easier to load into memory. This will dramatically speed up queries. (After you've run partition.sql, you can feel free to delete the tables events_bck and games_bck.) Finally, run lookup_codes.sql to load the lookup tables (the ones that will tell you what the different codes mean).

That's it!

Thanks to Tangotiger and Mat Kovach, whose work this is based on, Ted Turocy for providing the Chadwick tool set, and Dave Smith and all the volunteers over at Retrosheet for their hard work.

I hope this is useful to you. Let me know if you need help or if something doesn't work right. And again, if all of this is a bit more advanced than you're prepared for right now, I am working on a set of more indepth tutorials that should get you started - look for them sometime this week, hopefully.

UPDATE: Last time I posted a Retrosheet tutorial, I included some run expectancy code. Quite frankly, it was pretty poor. This is much better. It includes code for figuring linear weights as well. It is very compute-intensive, though, and requires roughly four times the code of the old method. I think it's worth it, but then again I'm kind of an obsessive on the topics of run expectancy and linear weights.

Posted by Colin Wyers at 12:24pm (4) Comments

HR/FB Park Factors


Just a quick hit to share park factors for HR/FB rate.

I used BIP data and the methodology from Baseball Reference to determine simple HR/FB park factors for 2009 and 4-year weighed factors (weights are 5,3,2,1).

Update: My spreadsheet was thrown off by the Rays' name change. I've corrected the numbers below

Without further adieu, here's the list:

Team        Park                         2009    4 Year
Angels      Angel Stadium                 110      96
Astros      Minute Maid Park              104     108
Athletics   McAfee Colisuem                95      92
Blue Jays   Rogers Centre                 105     108
Braves      Turner Field                   90      95
Brewers     Miller Park                   108     106
Cardinals   Busch Stadium                  86      84
Cubs        Wrigley Field                  97     103
DiamondbacksChase Field                    99     106
Dodgers     Dodger Stadium                 89      95
Giants      Pacific Bell Park             104      95
Indians     Jacobs Field                   75      88
Mariners    Safeco Park                    95      96
Marlins     Dolphins Stadium              109      99
Mets        Citi Field                     98      98
Nationals   Nationals Stadium              91      92
Orioles     Oriole Park at Camden Yar     109     115
Padres      PETCO Park                     73      75
Phillies    Citizens Bank Park            109      94
Pirates     PNC Park                      105      94
Rays        Tropicana Field               110     111
Rangers     The Ballpark at Arlington      98      97
Red Sox     Fenway Park                    98      90
Reds        Great American Ballpark       121     114
Rockies     Coors Field                   103     112
Royals      Kaufman Stadium                73      78
Tigers      Comerica Park                  94     101
Twins       Metrodome                     109      96
White Sox   US Cellular Field             115     118
Yankees     New Yankee Stadium            130     130


The Mets and the Yankees Park Factors are one season only

The Nationals Park Factor is two seasons, weighted at 5 and 3

Posted by Dan Turkenkopf at 7:58am (13) Comments