<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/">

    <channel>

    <title>The Hardball Times -- Colin Wyers</title>
    <link>http://www.hardballtimes.com/main</link>
    <description>Baseball. Insight. Daily.</description>
    <dc:language>en</dc:language>
    <dc:creator>studes@hardballtimes.com</dc:creator>
    <dc:rights>Copyright 2013</dc:rights>
    <dc:date>2013-05-24T08:08:15+00:00</dc:date>
    <admin:generatorAgent rdf:resource="http://www.pmachine.com/" />


    <item>
      <title>Where, oh where, has that grounder gone?</title>
       
<link>http://www.hardballtimes.com/main/article/where&#45;oh&#45;where&#45;has&#45;that&#45;grounder&#45;gone/</link>
<guid>http://www.hardballtimes.com/main/article/where-oh-where-has-that-grounder-gone/#When:10:00:15</guid>       
<description><![CDATA[<p>Previously we've discussed some ways that <a href="http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/" target="new">observer location could affect the scoring of air balls.</a> It's a tough topic to study; there are only so many ballparks available to study, which means one has to make do with a smaller sample than one might like.</p>

<p>What we can say for right now is that there appears to be a relationship&mdash;slight, but statistically significant&mdash;between viewing height and line drive rate. Further study&mdash;and a better model for line drive rate&mdash;could perhaps show the relationship is stronger or weaker than this preliminary study has. I do want to emphasize that the relationship we are seeing in the data is what we should expect to see based upon what we know about optics, although depending on how you feel about the matter you could consider that either another data point in support of the idea or something that's encouraging confirmation bias for me.</p>

<p>So let's look at ground balls for a moment. A ground ball is much simpler to study. We don't care too much about the trajectory of the ground ball. We're simply interested in knowing the direction it's headed, typically expressed in the angle it travels from home plate. So how well do we know that?</p>

<h3 class="article_title">What we think we know</h3>

<p>First, just to be sure we're on the same page, let's define a ground ball as (practically) any ball that lands before it reaches the outfield, or a ball that at least we would expect to land before reaching the outfield. (The exception comes when someone drops or misses an infield fly ball; say he loses a popup in the sun.) There may be a few cases where a low line drive and a grounder can be confused, but they're subtle and rare and probably don't affect our analysis much. Suffice it to say that everyone probably has the same conception of when a batted ball is a ground ball.</p>

<p>The next question is, where does it go? Typically, the field is divided into "zones" that each ball is assigned to, like so:</p>

<img src="http://www.hardballtimes.com/images/uploads/field_diagram_closeup.png" border="0" alt="image" name="image" width="315" height="283" />

<p>That chart is based upon the one for <a href="http://www.retrosheet.org/location.htm" target="new">Project Scoresheet</a>, the largest source for hit location data available freely and the source for Retrosheet's play-by-play data from 1989 to 1999. Those are the data we'll concern ourselves with for the time being.</p>

<p>The process of collecting that hit location data is simple: In its most basic form, a human being sits there and makes a judgment call. This is true for almost any source of hit location data we have available. The question isn't whether humans make errors; we know they do. The question is about bias: Do those errors occur in ways that don't wash out of our sample with enough repetitions?</p>

<h3 class="article_title">Looking for park effects</h3>

<p>The most obvious control over where a ground ball goes is the handedness of the batter: A left-handed batter will tend to hit the ball toward the first  base side of the field, while a right-handed batter will tend to hit the ball toward the third base side.</p>

<p>We already have some reason to believe that <a href="http://www.hardballtimes.com/main/article/whats-a-batted-ball-to-do/" target="new">pitchers have little to no control over where their ground balls end up</a>. It doesn't even appear that they have a significant impact on which side of the field a grounder is hit to, once you account for the handedness of the batter. (Of course pitcher handedness is a crude proxy for batter handedness in the aggregate, so if that's all you have it's better than nothing.)</p>

<p>So here's what we'll do. Let's take the average rate that a grounder is hit in each zone, broken down by batter handedness. Using that, we can calculate an "expected" rate of balls in zone for each park from '89 to '99. Then, let's compare that with the actual and see what we get.</p>

<p>Looking at the root mean square error between expected and observed rates in each zone, we get an average of .020, with an average of about 2,000 balls in play at each park. Looking at expected random variance, we see that it should explain an average error of only .008. In other words, over an extended period (sometimes 10 years), we see a difference in the distribution of ground balls between zones that we can't easily explain through random chance.</p>

<h3 class="article_title">What does it mean?</h3>

<p>As far as I can tell, there are these potential explanations for this difference:</p>

<ul>
<li>Differences in the field of play. For instance, a "crowned" field will see more ground balls at the sides than up the middle (relative to the average) because the slope of the field will tend to push grounders toward the foul lines.
<li>Individual scorer bias. For instance, some scorers may have been more likely to score a ball in the 'tweener zones, like 56 and 34, if it passed between the two fielders than others.
<li>Observer position bias. Similar to the problem with line drives, the viewpoint of the scorer could impact what zones he assigns a ball to. So in some parks, a scorer might be more likely to score a ball as being toward the line, or toward the center.
<li>Particular hitting styles. Any one individual hitter shouldn't have a significant impact on our total figure. But if a team has a certain hitting philosophy and is able to impart it to all of its players efficiently, that could move the needle a little.
</ul>

<p>Given the data at hand, it's hard to say for sure how much each effect acts upon what we see in the data (and on the field of play). I think at the very least it's safe to say there is a significant difference in the recorded distribution of ground balls in play from park to park, one that does not seem to be attributable to random error. What we don't know yet is how much of that is attributable to observer error and how much of that is attributable to actual differences in batted ball distribution.</p><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-12-18T10:00:15+00:00</dc:date>

    </item>

    <item>
      <title>When is a fly ball a line drive?</title>
       
<link>http://www.hardballtimes.com/main/article/when&#45;is&#45;a&#45;fly&#45;ball&#45;a&#45;line&#45;drive/</link>
<guid>http://www.hardballtimes.com/main/article/when-is-a-fly-ball-a-line-drive/#When:10:15:15</guid>       
<description><![CDATA[<p>Let's talk about <a href="http://en.wikipedia.org/wiki/Parallax" target="new">parallax</a>.</p>

<p>Simply put, parallax is a phenomenon in which the apparent position of an object can change based upon the location of the observer. Let's consider a very modest change in location&mdash;moving from the driver's seat to the passenger's seat of a car. (Do not attempt this while the car is in motion, please.) Observe how the gas gauge looks in each position:</p>


<img src="http://www.hardballtimes.com/images/uploads/parallax.jpg" border="0" alt="image" name="image" width="500" height="314" />

<p>In this instance, the passenger would tend to believe that there was less gas in the car than the driver would.</p>


<p>Baseball fans will be most familiar with the effects of this phenomenon when it comes to calling balls and strikes off the television feed; most broadcasts use an offset camera angle that <a href="http://www.slate.com/id/2221384/pagenum/all/" target="new">distorts our view of the strike zone</a>. But what else could it affect?</p>

<p>Most of what we know about batted balls comes from stringers who score games. Typically those stringers are given a spot in the press box. Now, because of the way different stadiums are constructed, the view from the press box shifts around. This, for instance, is the view from the Wrigley Field press box:</p>

<img src="http://www.hardballtimes.com/images/uploads/wrigley_press_box_view_small.jpg" border="0" alt="image" name="image" width="500" height="334" />

<p>Compare to Citi Field:</p>

<img src="http://www.hardballtimes.com/images/uploads/citi_field_press_box_view_small.jpg" border="0" alt="image" name="image" width="500" height="332" />

<p>It's a bit more of a dramatic difference than simply moving over a seat in a car, isn't it?</p>

<p>So let's test a theory&mdash;that the placement of the observer has an effect on how that observer determines the trajectory of a batted ball. Let's focus on air balls&mdash;fly balls, line drives and pop-ups. Based upon what we know, we should expect that the higher the observer, the flatter a batted ball looks and the more likely it is to be scored a line drive.</p>

<h3 class="article_title">Grading the parks</h3>

<p>Figuring out press box heights is not a simple task. I did the best I could given the tools I had. But the heights I collected are at best estimates. This is especially true for stadiums where press boxes have multiple levels. And for some parks I gave up on trying to get a decent estimate at all. I collected data on 27 parks in use from 2005-2009. The entire list, including the estimated heights, is available <a href="http://www.editgrid.com/user/cwyers/Press_Box_Heights" target="new">here.</a> Parks are coded with <a href="http://www.retrosheet.org/parkcode.txt" target="new">Retrosheet park codes</a>. I excluded one park from consideration, Coors Field; its inflated line drive rate caused by the high elevation makes it unsuitable for this study.</p>

<p>Then I calculated line drives per total air balls (flies, liners and pop-ups) as per the batted ball data available from Retrosheet from 2005-2009. Those data are based on the observations of MLB Gameday stringers. To avoid having a league bias, I removed all at-bats from pitchers. And because a team often uses the same hitters over a period of years, I looked only at the visiting team batting. Yes, there may be some persistence of pitcher line drive rates across seasons, but it's a minor effect compared to the persistence of hitter line-drive rates.</p>

<p>We do in fact see a slight correlation between press box height, about .16, after weighting for the number of years a park was in use during the sample:</p>

<img src="http://www.hardballtimes.com/images/uploads/pb_height_1.png" border="0" alt="image" name="image" width="640" height="480" />

<p>(Each park gets its own data point, but the correlation&mdash;and the linear trend line&mdash;are based on the weighted data.)</p>

<p>If that's all there was to it, we could probably table the matter as perhaps real but not especially significant. But let's focus on the extreme parks a minute&mdash;those 40 feet or lower and 70 feet or higher:</p>

<img src="http://www.hardballtimes.com/images/uploads/pb_height_2.png" border="0" alt="image" name="image" width="640" height="480" />

<p>The blue-coded points are parks that are either extremely low (the Oakland Colliseum and Shea Stadium) or extremely high (Fenway Park, Turner Field, PNC Park and Citizens Bank Park). They don't seem to follow the trend line at all; they actually seem pretty centered on the median. My hunch is that in those parks, scorers aren't relying on their view from the press box. Instead, they are looking at the TV feed. If we look at the trend with those parks excluded, the relationship becomes much stronger, with a correlation of .38:</p>

<img src="http://www.hardballtimes.com/images/uploads/pb_height_3.png" border="0" alt="image" name="image" width="640" height="480" />

<p>Running a regression analysis, we see that a change in observer height of one foot is worth nearly .002 points of line drive percentage. That's a significant effect, for my money.</p>

<h3 class="article_title">The implications</h3>

<p>So far we've looked at a theory about how batted balls are observed and provided some evidence to support the claim. What are the implications, if this theory is true?</p>

<p>The professional data providers, BIS and STATS, certainly should provide better data than what the Gameday stringers provide; they take more care with the data, provide cross-checking, etc. But they are still unable to provide a consistent point of observation in every ballpark. (STATS uses a primary scorer in the press box; BIS has no in-park scorers but relies on video feeds.)</p>

<p>The implication of this is that we could see an effect where fielders are over- or underrated by defensive metrics based upon that scoring data, even over a period of years, because of an error introduced by a persistent bias. What I can't tell you&mdash;at least not without a lot more study&mdash;is which players, by how much or even the magnitude of the potential effect.</p>

<p>This isn't a repudiation of current defensive metrics, mind you. But people get the impression that they are based on a cold, calculating computer. But all current means we have of measuring defensive impact are based on human observation. We don't have a perfect means of evaluating anything&mdash; hitting, defense or pitching. It doesn't mean we don't strive for perfection, though.</p><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-12-04T10:15:15+00:00</dc:date>

    </item>

    <item>
      <title>Top baserunners of 2009</title>
       
<link>http://www.hardballtimes.com/main/blog_article/top&#45;baserunners&#45;of&#45;2009/</link>

<guid>http://www.hardballtimes.com/main/blog_article/top-baserunners-of-2009/#When:19:51:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-11-28T19:51:15+00:00</dc:date>

    </item>

    <item>
      <title>Databases for sabermetricians, Part One</title>
       
<link>http://www.hardballtimes.com/main/article/databases&#45;for&#45;sabermetricians&#45;part&#45;one/</link>
<guid>http://www.hardballtimes.com/main/article/databases-for-sabermetricians-part-one/#When:10:00:15</guid>       
<description><![CDATA[<p>Stop me if this sounds familiar. You have a question about baseball, the sort of thing that should be answerable with stats, you think. So you go looking for the stat, but you can't find anyone who computes it the way you want. So you find a site that (maybe) has the raw data you need, copy and paste it into Excel, and bang out a few formulas until you get a result. And all along the way, you've thought: There has to be a better way.</p>
<p>I'm here to tell you that in fact there is&mdash;the relational database. And I'm here to help.</p>
<p>First, a warning: I am not a professional database administrator. I am a hobbyist. I may not follow "best practices." I will not be explaining a lot of relational database theory or lecturing you on the importance of the normal forms. Frankly, I don't know too much about that myself. And if you have your own database setup and workflow already, your setup may differ from mine. I hope you can still derive some value from this.</p>
<p>Also, I run Windows, so this tutorial will be written for Windows users. If you use a Mac, the people at Beyond the Boxscore <a href="http://www.beyondtheboxscore.com/tags/saberizing%20a%20mac">have some helpful hints.</a></p>
<p>Most modern databases use something called the Structured Query Language, or SQL. It is, yes, a programming language. But please, don't let that scare you. It's simply a way for you to &quot;talk&quot; to a database and ask it questions. We'll get into that more in a minute.  First, we need to set up a database.</p>
<h3 class="article_title">Installing the software</h3>
<p>There are many databases to choose from. I use MySQL. It's free, which is nice. And there's a large user base, which means there are plenty of tools and tutorials for you to look at.</p>
<p>First, you need to download and run the installer for <a href="http://dev.mysql.com/downloads/mysql/5.1.html#win32">MySQL 5.4 CE Essentials</a>. Choose a "typical" install. You'll typically have to wait a few minutes for files to copy. Then there will be some ads. Skip over them. Make sure that "Configure the MySQL server now" is checked. (I uncheck the box about registering the software with MySQL, but that's up to you.) Chose "Detailed configuration."</p>
<p>Then you'll go through a set of pages asking you questions. Here are the answers:</p>
<ul>
<li>Developer Machine </li>
<li>Non-Transactional Database Only </li>
<li>Decision Support (DSS)/OLAP </li>
</ul>
<p>Continue on, using the defaults. When prompted, either add a password or uncheck the box to alter security settings. (This will leave your root password blank.) Once you're finished, you shouldn't have to start MySQL at all&mdash;it should be ever vigilant, waiting for you to command it to do your sabermetric bidding.</p>
<p>(If you get an error message that says &quot;This installation package  could not be started. Contact the application vendor to verify that this is a  valid Windows Installer package,&quot; you need to download and install <a href="http://www.microsoft.com/downloads/details.aspx?displaylang=en&amp;FamilyID=5a58b56f-60b6-4412-95b9-54d056d6f9f4">Windows Installer</a>.)</p>
<p>Again sticking with the theme of "absolutely free," I use <a href="http://www.webyog.com/en/downloads.php#sqlyog">SQLyog</a>. You want the "Windows-Binary" from the Community Edition. Installation is straightforward&mdash;just download it and click "Next" a few times. Then run SQLyog.</p>
<p>From there, you'll be presented with a screen asking you for the settings to connect to the database. Everything should be correct.  If you selected a password, type it into the password blank. Then hit "Connect."</p>
<p>And... voila! You've been connected to, well, to an empty database. I suspect you want some actual data in your database.</p>
<h3 class="article_title">Putting the "data" in database</h3>
<p>There are obviously a lot of different things you could want to have in your database, depending on your area of interest. Let's start off modestly, if not exactly small. The <a href="http://baseball-databank.org/">Baseball Databank</a> is a record of pretty much every official statistic for every baseball player in the history of the majors. (Right now, it's current through 2008; the latest update is expected soon after the awards voting is
finished.)</p>
<p>First, <a href="http://baseball-databank.org/files/BDB-sql-2008-11-13.sql.zip">download the ZIP file.</a> Extract the .SQL file from the ZIP, and remember when you put it. Now go into SQLyog.</p>
<p>In SQLyog there will be three panes. The left-hand side should list your databases, the top is where you write queries, and the bottom shows the results. Let's add a database. Right-click in the pane, and select "Create Database." Call it "BDB," leaving off the punctuation. Leave the rest as the defaults, and click okay.</p>
<p>Now, let's populate the database. Right-click on the database, and select "Restore from SQL Dump." There should be an button with an elipsis (...) in it. Click that, and select the file we extracted from the ZIP. Select Execute. It'll ask you if you want to execute the queries in database BDB. You do. This should take a little while.</p>
<p>Once that's done, go into your database and click on the folder that says Tables. If nothing is there, try hitting "F5" on the top row of your keyboard to refresh the browser.</p>
<p>Go ahead and click on batting, and then slide on over to the right and click on "Table Data." Scroll around for a bit. If you've ever used a spreadsheet or looked at a table before, the layout should feel familiar. Now it's time to play around with the database a little.</p>
<h3 class="article_title">Asking the database questions</h3>
<p>The fundamental unit of the database is the table, which is further broken down into the row and the column. Each row is a "record" in the database; you can have as many records as you like. Every row has several fields which contain the data you're interested in. A column is a group containing all of the fields of a certain type in the table.</p>
<p>You won't be entering formulas into cells, like you would a spreadsheet. You interact with the data in the spreadsheet by writing queries. Let's start off by looking up the hitting stats of the 1990 Royals.</p>
<p>Slide up into the query editor and type in this:</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:gray">* <br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br />
&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">WHERE </span><span style="color:black">yearID </span><span style="color:blue">= </span><span style="color:black">&quot;1990&quot;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">AND </span><span style="color:black">teamID </span><span style="color:blue">= </span><span style="color:black">&quot;KCA&quot;</span></code>
<p>There is a green play button in the upper left-hand corner; press it. (Or just hit F9.) Most of the column headers should be pretty self-explanatory: H is hits, SF sacrifice flies, etc. And playerID is each player's unique identifier, taken from the first five letters of the last name and the first two letters of the first name, plus a number. Some are easier to figure out than others.</p>
<p>To explain:</p>
<ul>
<li>SELECT is your bread-and-butter SQL command; it tells the database to fetch records out of the table. </li>
<li>* is a wildcard; it tells the database to fetch all columns. </li>
<li>FROM tells the database what table to fetch from.</li>
<li>WHERE tells the database to filter the information.</li>
<li>AND means you want the database to match multiple columns.</li>
<li>The semicolon tells the database that our instruction is complete. </li>
</ul>
<p> The equal sign is one way to compare two values in SQL. The quotation marks are important; if you don't use quotation marks, MySQL will assume you're referring to a column of that name.</p>
<p>In some results you'll see a value of (NULL); that means the field is empty. In the batting table, NULLs are generally used to indicate:</p>
<ul>
<li>Players who appeared in a game but had no official plate appearance (generally relief pitchers, pinch-runners and defensive replacements). </li>
<li>Statistics that were not tracked that year (generally sacrifice flies, double plays, etc.). </li>
</ul>
<p>In this case, we really only want to see players who went to the plate. And let's say we also want to know the "sabermetric triple crown" stats of <a href="http://tangotiger.net/wiki/index.php?title=Batting_Average">batting average</a>, <a href="http://tangotiger.net/wiki/index.php?title=On-Base_Percentage_%28OBP%29">on-base percentage</a> and <a href="http://tangotiger.net/wiki/index.php?title=Slugging_percentage">slugging percentage</a>. (Tango's <a href="http://tangotiger.net/wiki/index.php?title=Main_Page">sabermetric wiki</a> is a great source of formulas to try out, incidentally.) We can add to our query a bit:</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:gray">*, </span><span style="color:black">H</span><span style="color:gray">/</span><span style="color:black">AB </span><span style="color:blue">AS </span><span style="color:magenta">AVG<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:black">H</span><span style="color:gray">+</span><span style="color:black">BB</span><span style="color:gray">+</span><span style="color:black">HBP</span><span style="color:gray">)/(</span><span style="color:black">AB</span><span style="color:gray">+</span><span style="color:black">BB</span><span style="color:gray">+</span><span style="color:black">HBP</span><span style="color:gray">+</span><span style="color:black">SF</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">OBP<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:black">H</span><span style="color:gray">+</span><span style="color:black">2B</span><span style="color:gray">+</span><span style="color:black">2</span><span style="color:gray">*</span><span style="color:black">3B</span><span style="color:gray">+</span><span style="color:black">3</span><span style="color:gray">*</span><span style="color:black">HR</span><span style="color:gray">)/</span><span style="color:black">AB </span><span style="color:blue">AS </span><span style="color:black">SLG<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">WHERE </span><span style="color:black">yearID </span><span style="color:blue">= </span><span style="color:black">&quot;1990&quot;</span><span style="color:gray">AND </span><span style="color:black">teamID </span><span style="color:blue">= </span><span style="color:black">&quot;KCA&quot;</span><span style="color:gray">AND </span><span style="color:black">AB </span><span style="color:blue">IS </span><span style="color:gray">NOT NULL;</span></code>

<p>You don't have to break up the different lines; I do that for legibility. You do want to separate each thing you want to select by commas. I put the commas at the start of each individual line, but that's a personal style preference. SQL simply ignores carriage returns (that is to say, line breaks), tabs and extra spaces, so feel free to "pretty up" your code in a way that makes it easy for you to read.</p>
<p>SQL will do any simply arithmetic with the records you provide. (You can also do a lot more complex math with it if you really want to.) The AS command is very helpful&mdash;otherwise the column will end up being named "(H+BB+HBP)/(AB+BB+HBP+SF)," which simply isn't very convenient.</p>
<p>Let's say we don't want every column, we just want a few columns:</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:black">playerID</span><span style="color:gray">, </span><span style="color:black">AB</span><span style="color:gray">, </span><span style="color:black">H</span><span style="color:gray">, </span><span style="color:black">2B</span><span style="color:gray">, </span><span style="color:black">3B</span><span style="color:gray">, </span><span style="color:black">HR</span><span style="color:gray">, </span><span style="color:black">BB<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">SO</span><span style="color:gray">, </span><span style="color:black">IBB</span><span style="color:gray">, </span><span style="color:black">HBP</span><span style="color:gray">, </span><span style="color:black">SH</span><span style="color:gray">, </span><span style="color:black">SF<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">H</span><span style="color:gray">/</span><span style="color:black">AB </span><span style="color:blue">AS </span><span style="color:magenta">AVG<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:black">H</span><span style="color:gray">+</span><span style="color:black">BB</span><span style="color:gray">+</span><span style="color:black">HBP</span><span style="color:gray">)/(</span><span style="color:black">AB</span><span style="color:gray">+</span><span style="color:black">BB</span><span style="color:gray">+</span><span style="color:black">HBP</span><span style="color:gray">+</span><span style="color:black">SF</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">OBP<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:black">H</span><span style="color:gray">+</span><span style="color:black">2B</span><span style="color:gray">+</span><span style="color:black">2</span><span style="color:gray">*</span><span style="color:black">3B</span><span style="color:gray">+</span><span style="color:black">3</span><span style="color:gray">*</span><span style="color:black">HR</span><span style="color:gray">)/</span><span style="color:black">AB </span><span style="color:blue">AS </span><span style="color:black">SLG<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">WHERE </span><span style="color:black">yearID </span><span style="color:blue">= </span><span style="color:black">&quot;1990&quot;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">AND </span><span style="color:black">teamID </span><span style="color:blue">= </span><span style="color:black">&quot;KCA&quot;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">AND </span><span style="color:black">AB </span><span style="color:blue">IS </span><span style="color:gray">NOT NULL;</span></code>
<p>Bored with the 1990 Royals yet? Let's try something different.</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:black">playerID</span><span style="color:gray">, </span><span style="color:black">yearID</span><span style="color:gray">, </span><span style="color:black">teamID</span><span style="color:gray">, </span><span style="color:black">HR<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">HR </span><span style="color:blue">DESC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<p>ORDER BY tells the database to sort the list; Desc tells it to go from large to small. (Asc is the corresponding command to sort from small to large.) LIMIT tells SQL to return only the first 50 records.</p>
<p>So that's the top 50 home run seasons. (Supply your own asterisks where you feel it necessary.) But what if we want career totals?</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:black">playerID</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">HR</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">HR<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">HR </span><span style="color:blue">DESC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<p>The SUM command will sum everything in the column unless you stop it. GROUP BY tells the database to combine everything based upon player ID.</p>
<p>Now what if want to look at the top 50 career batting averages? We obviously don't want to see some pitcher who lucked his way into going 2-for-3 and never saw the majors again. So let's say we want a minimum 3,000 career plate appearances.</p>
<code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">SELECT </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">H</span><span style="color:gray">)/</span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">AB</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:magenta">AVG<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">AB</span><span style="color:gray">+</span><span style="color:black">BB</span><span style="color:gray">+</span><span style="color:black">HBP</span><span style="color:gray">+</span><span style="color:magenta">COALESCE</span><span style="color:gray">(</span><span style="color:black">SF</span><span style="color:gray">,</span><span style="color:black">0</span><span style="color:gray">)) </span><span style="color:blue">AS </span><span style="color:black">PA<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">HAVING </span><span style="color:black">PA </span><span style="color:gray">&gt;= </span><span style="color:black">3000<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:magenta">AVG </span><span style="color:blue">DESC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<p>Because we're using the SUM function, we have to use the HAVING function instead of the GROUP BY function. Note the &gt;= - that's greater than or equal to.</p>
<p>So, what's that COALESCE doing there? Remember, sacrifice flies weren't officially recorded until 1954. For years prior, those values are NULL in the database. Trying to sum a NULL value chucks everything out. Using the COALESCE function in this way tells the database to substitute 0 in the place of a NULL. Otherwise, players like Ty Cobb would be excluded from the list (precisely the sorts of players we <em>don't</em> want excluded!)</p>
<p>Now let's try writing a query to list the 50 lowest career earned run averages, minimum 1,000 innings pitched. The Databank doesn't list IP for pitchers, simply outs. Remember to divide by three:</p>
<p><code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">ER</span><span style="color:gray">)/</span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">)*</span><span style="color:black">9 </span><span style="color:blue">AS </span><span style="color:black">ERA<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">IP<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">pitching<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">HAVING </span><span style="color:black">IP </span><span style="color:gray">&gt;= </span><span style="color:black">1000<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">ERA </span><span style="color:blue">ASC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code></p>
<p>If you run this query, you'll have a lot harder time recognizing the top leaders than the top batting average leaders. A lot of these guys were deadball-era pitchers. So let's find these guys' names.</p>
<h3 class="article_title">Working with joins, a first glance</h3>
<p>Often, all the data you want to use won't be stored in a single table&mdash;you'll have some data in one table, and some data in another. That's where joins come into play. A join lets you hook two (or more) tables together for querying.</p>
<p>There are two ways to do a join: explicitly and implicitly. When I say an explicit join, I mean a join where you use the JOIN command. An implicit join is one where MySQL is just supposed to figure out that you want it to join. For now, we'll start off with the implicit join.</p>
<p>In this case, what we want are the first and last names out of the master table. In this case, what we're really after is a player's full name. So let's put that together:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">CONCAT</span><span style="color:gray">(</span><span style="color:black">nameFirst</span><span style="color:gray">,</span><span style="color:black">&quot; &quot;,nameLast</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">Name</span><span style="color:gray">, </span><span style="color:black">playerID <br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM MASTER</span><span style="color:gray">;</span></code>
<p>CONCAT is a fun command for string manipulation. It lets you take multiple strings and combine them. In this case, we took the first name, a space (surrounded by quotes) and a last name, separating them with commas. CONCAT does the rest.</p>
<p>Instead of joining to the entire master table, we're going to join to the results of that query using an alias.</p>
<p>In SQL, you use the period separator to indicate what table you want to take data from, like so:</p>
<p>table.column</p>
<p>where table is the name of the table and column is the name of the column. You can also assign an alias to a query when you use it in the FROM command, like so:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">p.playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">m.Name<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">p.ER</span><span style="color:gray">)/</span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">p.IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">)*</span><span style="color:black">9 </span><span style="color:blue">AS </span><span style="color:black">ERA<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">p.IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">IP<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">pitching p<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:blue">SELECT </span><span style="color:black">CONCAT</span><span style="color:gray">(</span><span style="color:black">nameFirst</span><span style="color:gray">,</span><span style="color:black">&quot; &quot;,nameLast</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">Name<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">playerID <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM MASTER</span><span style="color:gray">) </span><span style="color:black">m<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">WHERE </span><span style="color:black">p.playerID </span><span style="color:blue">= </span><span style="color:black">m.playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">HAVING </span><span style="color:black">IP </span><span style="color:gray">&gt;= </span><span style="color:black">1000<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">ERA </span><span style="color:blue">ASC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<p>So now, instead of seeing that walshed01 is the career ERA leader, we can see that it's Ed Walsh. It's an improvement, right?</p>
<p>The key is in the WHERE clause, which matches up a column from one table with the other. The equivalent explicit join is:</p>
<p><code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">p.playerID</span><span style="color:gray">, </span><span style="color:black">Name</span><span style="color:gray">, </span><span style="color:black">ERA</span><span style="color:gray">, </span><span style="color:black">IP<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:gray">(</span><span style="color:blue">SELECT </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">ER</span><span style="color:gray">)/</span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">)*</span><span style="color:black">9 </span><span style="color:blue">AS </span><span style="color:black">ERA<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">SUM</span><span style="color:gray">(</span><span style="color:black">IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">IP<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">pitching<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">HAVING </span><span style="color:black">IP </span><span style="color:gray">&gt;= </span><span style="color:black">1000</span><span style="color:gray">) </span><span style="color:black">p</span></code><br /><code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">JOIN </span><span style="color:gray">(</span><span style="color:blue">SELECT </span><span style="color:black">CONCAT</span><span style="color:gray">(</span><span style="color:black">nameFirst</span><span style="color:gray">,</span><span style="color:black">&quot; &quot;,nameLast</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">Name<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">playerID <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM MASTER</span><span style="color:gray">) </span><span style="color:black">m<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ON </span><span style="color:black">p.playerID </span><span style="color:blue">= </span><span style="color:black">m.playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">ERA </span><span style="color:blue">ASC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
</p>
<p>You want to be careful when writing the WHERE or ON clause of a join&mdash;if you aren't careful, you'll end up doing what's called a Cartesian join, or essentially joining every row in one table to every row in another. For those of you keeping score at home, that would mean joining 39,016 rows from the pitching table to 17,264 rows from the master table, giving you... 673,572,224 rows as a result.</p>
<p>You do not want that.</p>
<p>We're far from done with joins&mdash;they're quite possibly the most powerful (and most dangerous!) tool in your SQL arsenal. But for right now let's move along to some other things. We'll be back.</p>
<h3 class="article_title">Dealing with data types</h3>
<p>I want you to see this at least once before it creeps up on you by accident. I do not, on the other hand, want to be the sort of tutorial writer who hands you a code sample and then, when it doesn't work, shouts "Gotcha!"; So I'll be clear with you: This doesn't work right. Let's say you want the top 50 player seasons with more runs scored than runs batted in.</p>
<p>Try running this:</p>
<p><code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">playerID</span></code><br /><code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span class="style1">, </span><span style="color:black">yearID<br />
      <code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code></span><span style="color:gray">, </span><span style="color:black">teamID</span><span style="color:gray"><br />
      <code style="font-size: 12px;"><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></code>, </span><span style="color:black">R</span><span style="color:gray">-</span><span style="color:black">RBI </span><span style="color:blue">AS </span><span style="color:black">R_RBI<br>
  &nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting<br>
  &nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">R_RBI </span><span style="color:blue">DESC<br>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code></p>
<p>We are being lied to here. Nobody, but nobody, has 18,446,744,073,709,551,615 more runs than RBI in a single season. I don't even know how to say that number.</p>
<p>The problem is with data types. SQL has an awful lot of data types, but we're only going to look at a few of the more common ones.</p>
<p>Text is generally stored as "varchar," or "Variable Character Field." What that means is simply that you can store any length of text in there up to the size of the field. There are some other text formats, but nothing to really be worried about right now.</p>
<p>There's a much greater diversity of number formats, which can be broken down into three types:</p>
<ol>
<li>Integers, or whole numbers. </li>
<li>Decimals, for precise fractional values. </li>
<li>Floats, for approximate fractional values. </li>
</ol>
<p>These can be further subdivided: you'll see values like "tinyint"; or "smallint"; thrown around, for example, as ways of describing the size of the number you can store in that column.</p>
<p>There's also the issue of signs, which becomes important here. For SQL, what we mean by signs is the plus/minus sign&mdash;whether a number is positive or negative.</p>
<p>You can store numbers in MySQL either as "signed"; or "unsigned";. Unsigned values cannot be negative. <em>The Baseball Databank stores all numbers as unsigned.</em></p>
<p>(Why? Because it saves storage space and time.)</p>
<p>So in order for you to deal with unsigned numbers where values may be negative, you may have to change the datatype before doing any calculations. We do that with a CAST function:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">b.playerID<br>
</span><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">b.yearID<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">b.teamID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:magenta">CAST</span><span style="color:gray">(</span><span style="color:black">R</span><span style="color:gray">-</span><span style="color:black">RBI </span><span style="color:blue">AS </span><span style="color:black">SIGNED</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">R_RBI<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">batting b<br>
</span><span style="color:black">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">R_RBI </span><span style="color:blue">DESC<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<h3 class="article_title">The heart of the CASE</h3>
<p>There are times when you want your query to be able to make some decisions on its own. Let's say , for instance, that we want to look at pitchers between the ages of 25 and 29, post WWII. Remember, a player's age is generally listed as his age come July 1. So, a simple little query here:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">SELECT </span><span style="color:black">p.playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">m.Name<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">p.yearID<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:magenta">CASE </span><span style="color:blue">WHEN </span><span style="color:black">m.birthMonth </span><span style="color:gray">&lt; </span><span style="color:black">7<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">THEN </span><span style="color:gray">( </span><span style="color:black">p.yearID </span><span style="color:gray">- </span><span style="color:black">m.BirthYear </span><span style="color:gray">)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ELSE </span><span style="color:gray">( </span><span style="color:black">p.yearID </span><span style="color:gray">- </span><span style="color:black">m.BirthYear </span><span style="color:gray">- </span><span style="color:black">1 </span><span style="color:gray">) </span><span style="color:blue">END</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">Age<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">p.ER</span><span style="color:gray">/</span><span style="color:black">p.IPOuts</span><span style="color:gray">/</span><span style="color:black">3</span><span style="color:gray">*</span><span style="color:black">9 </span><span style="color:blue">AS </span><span style="color:black">ERA<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">p.IPOuts</span><span style="color:gray">/</span><span style="color:black">3 </span><span style="color:blue">AS </span><span style="color:black">IP<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM </span><span style="color:black">pitching p<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, (</span><span style="color:blue">SELECT </span><span style="color:black">CONCAT</span><span style="color:gray">(</span><span style="color:black">nameFirst</span><span style="color:gray">,</span><span style="color:black">&quot; &quot;,nameLast</span><span style="color:gray">) </span><span style="color:blue">AS </span><span style="color:black">Name<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">playerID <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;, </span><span style="color:black">birthMonth <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">, </span><span style="color:black">birthYear <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">FROM MASTER</span><span style="color:gray">) </span><span style="color:black">m<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">WHERE </span><span style="color:black">p.yearID </span><span style="color:gray">&gt; </span><span style="color:black">1943<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">AND </span><span style="color:black">p.playerID </span><span style="color:blue">= </span><span style="color:black">m.playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">GROUP BY </span><span style="color:black">playerID<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">HAVING </span><span style="color:black">Age </span><span style="color:gray">BETWEEN </span><span style="color:black">25 <br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">AND </span><span style="color:black">29<br>&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:blue">ORDER BY </span><span style="color:black">ERA </span><span style="color:blue">ASC<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:black">LIMIT 50</span><span style="color:gray">;</span></code>
<p>The CASE statement allows you to have different outputs based upon the underlying data. This could have easily been written as an IF statement as well:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;, <span style="color:blue">IF</span><span style="color:gray">(</span><span style="color:black">m.birthMonth </span><span style="color:gray">&lt; </span><span style="color:black">7<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:gray">,( </span><span style="color:black">p.yearID </span><span style="color:gray">- </span><span style="color:black">m.BirthYear </span><span style="color:gray">)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,( </span><span style="color:black">p.yearID </span><span style="color:gray">- </span><span style="color:black">m.BirthYear </span><span style="color:gray">- </span><span style="color:black">1 </span><span style="color:gray">)) </span><span style="color:blue">AS </span><span style="color:black">Age</span></code>
<p>I like CASE because it feels cleaner, personally. You can also use multiple WHEN statements inside a CASE.</p>
<h3 class="article_title">Saving your work</h3>
<p>Let's say you've written an amazing, bad-ass query. You know you're going to want to look at this one later. You have two options:</p>
<code style="font-size: 12px;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue">CREATE VIEW </span><span style="color:black">view_name </span><span style="color:blue">AS<br>&nbsp;&nbsp;&nbsp;&nbsp;CREATE TABLE </span><span style="color:black">table_name </span><span style="color:blue">AS</span></code>
<p>Simply put either one of those at the front of your query.</p>
<p>So what's the difference? A view:</p>
<ol>
<li>Updates whenever the underlying data does.</li>
<li>Preserves the query used to generate it.</li>
</ol>
<p>Creating a table:</p>
<ol>
<li>Does not update to incorporate changes in data.</li>
<li>Does not run the query every time it's called.</li>
</ol>
<p>A view is more flexible; a table is quicker.</p>
<p>Also, I recommend creating a new database to store tables in&mdash;multiple databases if you have multiple projects. So let's say we wanted to start a new database for a project that does <a href="http://www.tangotiger.net/marcel/">Marcels projections</a>. Right-click in the pane on the left and select "Create Database." Call it Marcels and click on it to make it the active database. To run queries on tables in the Baseball Databank, simply prefix the names of the tables with bdb., like so:</p>
<p>bdb.batting </p>
<p>bdb.pitching</p>
<p>Well after that little slip, it would hardly be fair to send you on your way without some Marcels projections, wouldn't it?</p>
<p>First off, I just want to thank Tangotiger, who came up with the Marcels, and Sal Baxamusa, whose <a href="http://www.hardballtimes.com/main/article/is-this-guy-for-real/">spreadsheet versions</a> of the Marcels were invaluable as learning aids for me.</p>
<p>I tried my best to reimplement the Marcels as faithfully as possible, but there are (likely) differences between Tango's work and mine. All credit goes to him and Sal, all blame for mistakes and errors goes to me.</p>
<p><a href="http://basql.wikidot.com/marcels">Here's the code for hitter projections</a>. Have fun with it, play around with it. Try to make it better, try to break it, try to take it apart and see how it works.</p>
<p>That should be enough to get you started. If you have questions, don't be afraid to ask! And most of all, have fun.</p><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-11-24T10:00:15+00:00</dc:date>

    </item>

    <item>
      <title>Building a Retrosheet database, the short form</title>
       
<link>http://www.hardballtimes.com/main/blog_article/building&#45;a&#45;retrosheet&#45;database&#45;the&#45;short&#45;form/</link>

<guid>http://www.hardballtimes.com/main/blog_article/building-a-retrosheet-database-the-short-form/#When:16:24:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-11-21T16:24:15+00:00</dc:date>

    </item>

    <item>
      <title>Is peak at age 29?</title>
       
<link>http://www.hardballtimes.com/main/blog_article/is&#45;peak&#45;at&#45;age&#45;29/</link>

<guid>http://www.hardballtimes.com/main/blog_article/is-peak-at-age-29/#When:23:46:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-11-17T23:46:15+00:00</dc:date>

    </item>

    <item>
      <title>Should Jeter have bunted?</title>
       
<link>http://www.hardballtimes.com/main/blog_article/should&#45;jeter&#45;have&#45;bunted/</link>

<guid>http://www.hardballtimes.com/main/blog_article/should-jeter-have-bunted/#When:17:03:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-10-31T17:03:15+00:00</dc:date>

    </item>

    <item>
      <title>Let&#8217;s do the time WARP!</title>
       
<link>http://www.hardballtimes.com/main/blog_article/lets&#45;do&#45;the&#45;time&#45;warp/</link>

<guid>http://www.hardballtimes.com/main/blog_article/lets-do-the-time-warp/#When:04:21:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-10-23T04:21:15+00:00</dc:date>

    </item>

    <item>
      <title>Get rich quick?</title>
       
<link>http://www.hardballtimes.com/main/blog_article/get&#45;rich&#45;quick/</link>

<guid>http://www.hardballtimes.com/main/blog_article/get-rich-quick/#When:19:41:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-10-15T19:41:15+00:00</dc:date>

    </item>

    <item>
      <title>Game One Odds</title>
       
<link>http://www.hardballtimes.com/main/blog_article/game&#45;one&#45;odds/</link>

<guid>http://www.hardballtimes.com/main/blog_article/game-one-odds/#When:03:01:15</guid>
       
<description><![CDATA[<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Colin Wyers</dc:creator>
      <dc:date>2009-10-07T03:01:15+00:00</dc:date>

    </item>


    </channel>
</rss>