Tuesday, July 26, 2011
Oliver production notesPosted by Brian Cartwright
First off, I made a necessary correction in the in-season projection of fielding runs. Quite simply, the previous code was not correctly designed and was giving an incorrectly weighted sum of the past four seasons.
The intention was to apply the same weighting system that is used for the in-season batting and pitching projections, but the complication is that each season's fielding runs is a sum of several skills which are calculated independently, and each player can play several different positions each season.
With batters and pitchers, it's easy to just look at the projection and know if it's out of line with the seasonal major league equivalencies used to generate the projection, but not so with the defense. The weighting formula has been corrected, but I also calculated the regressed "true talent" fielding runs per 162 games for each player at each position. The fielding runs projection is then a weighted mean of the positions the players has played this season.
For major league players, this will soon use the THT team depth charts as input for expected playing time at each position. For example, Steve Pearce of the Pirates is +5/162 at first base, -1/162 at third, but -16/162 in right field. If Pearce is expected to play 20 games at first, 10 games at third and five games in right the rest of the season, his fielding runs projection would be (5*20-1*10-16*5)/162 = +10/162 = +0.1 runs. If Pearce is expected to spend 100 percent of his time at first, then it would be (5*35)/162 = +175/162 = +1.1 runs.
The second major item is the new version of Baseball on a Stick (BBOS), a Python language application developed by Kyle Willkomm (available at SourceForge) that I use to collect the Gameday data that is a major portion of the Oliver projections. Some of the upgrades are bookkeeping in nature, but are important to maintaining the integrity and completeness of the database. I am now able to check the final game status in order to detect suspended games (which will continue to be downloaded until finalized) and games that were downloaded while in progress (which will be deleted and downloaded again the next time).
Another important data addition is the explicit tracking of base runner advancement, which will allow me to rate players on their base running as well as outfielders on their throwing.
Japanese stats are now "live" in that in a manner similar to BBOS, new scripts I have written allow me to load on demand standings, rosters, batting and pitching from the NPB website. Soon I'll also be able to access box scores, which contain additional information not available on their batting and pitching pages.
I also recently improved my database code, which incrementally updates the tables that Oliver uses as input. The three major steps are to first download data from the web, second rearrange that data into efficient formats and compile situational stats, then finally conduct the analysis that results in Oliver.
In that second step, with seven seasons and more than 100,000 games, building many of these tables (games, hits, pitches, events, batting/pitching by park, batting/pitching vs. hand, etc) from scratch takes too long, but it can be a tricky process to efficiently and accurately load new data into these tables. I also needed to be able to delete, download again and accurately update games in which MLB announced scoring changes. The greatly improved stability of this step will allow more of the process to be automated (I won't have to sit here and watch for errors) and that might allow us to in the future update the website more often than weekly.
I would like to take opportunity to thank those who have helped me with software development over the past couple years. I've mentioned Kyle Willkomm, developer of BBOS, with assistance from Russell Goldstein, who've put up with my many feature requests for more than two years. Now I know quite a bit of MySQL database programming, but fellow BBOS user Colin Wyers offered valuable assistance when I was a newbie. I'm fairly good at the basics of the Excel spreadsheet, but Richard Bergstrom has been very kind to offer his help coding macros which allow automating the process of having Excel download data from web pages. And I've slowly been learning how to code in Python, having written from scratch the scripts that get the Japanese stats, but I greatly appreciate the assistance and patience with me of Wells Oliver, now developer of Baseball Systems for the Padres, whose set of py_mlb I use to get rosters, transactions and player biographical information from mlb.com.