In 1780, a British army of over ten thousand soldiers laid siege to Charleston, eventually causing the surrender of 5,000 American patriots. Soon after, a force of British Loyalists massacred almost four hundred patriots near Lancaster, in the same state. In all, it was a very bad year for the Continental Army in South Carolina. Taken as a snapshot, one might conclude that the Revolutionists were destined to lose the war -- after all, they suffered such abject and utter defeat at the hands of the British in that state that they could not possibly win, right?
Such is the folly of making projections based on small sample sizes.
Like it or not, many (most?) things in life cannot be evaluated in small chunks. Jobs, relationships, vehicular reliability, the movie Ben-Hur...these things all take a long exposure before the experience can give a person an accurate picture of their quality and where their deficiencies lie, if any.
And yet, we live in a society that promotes instant gratification. Football is nice in that way. After four games, an NFL fan generally can accurately say whether or not their team stinks or is a viable playoff contender. Other sports are not like that.
Baseball is particularly frustrating, because it takes one hundred and sixty-two games to sift the weevils from the flour. Also, baseball does not get its stars flying out of the gate -- it often takes five years for professionals to reach the major leagues...if they even make it at all. Because of these things, understanding sample sizes and how to evaluate performances within the framework of a small sample is critical to staying pain-free as you bounce haphazardly between unbounded elation and abject despondency during the major league baseball season.
Traditional stats like ERA and Batting Average are great. Their purpose is to show a player's past success in very specific aspects of baseball: ERA shows whether or not runs scored while a pitcher is on the mound, discounting fielding errors. Batting Average shows whether or not a player reached base via a hit, discounting walks and fielding defense. ERA and BA have a long tradition in baseball and should continue to be used for that reason - they have a context that all baseball fans understand.
However, there are a couple things about baseball stats such as these that the enlightenment-seeking fan needs to understand. First, statistics that measure events dependent on outside influences do not truly measure a player's skill, other than in the most broad sense. Second, they are extremely susceptible to wild swings over small sample sizes, as those outside factors assert their influences.
One way to understand whether traditional statistics are being monkey'd with by these uppity inputs taking advantage of sample size mathematics is through use of more context-neutral statistics. Batting Average on Balls in Play (BABIP) is a good one because it's easy to explain. BABIP is like batting average, only it includes only those balls that the batter makes contact with and stay in play. So, it's batting average without Home Runs and Strikeouts, and plus Sac Flies. What has been shown over the long haul is that players have very little control over their own BABIP - it is so independent of a batter or pitcher's skill that it is sometimes used to quantify "luck" (which is an oversimplification, but who cares?). BABIP can go nuts over small samples, and so it can be generally* used to make the following type statements:
- Robbie Grossman sure has sucked this year. But his BABIP is so low that he'll probably turn it around and be okay moving forward
- Man, look at how awesome J.D. Martinez has been! But his BABIP is .400, which is ridiculous. He's going to crash back to earth after the All-Star break.
- Jesus Guzman has been awful at the plate this year. *checks BABIP* Sigh...Jesus Guzman is awful at the plate.
- Robbie Grossman has been awful in the first half. But his BABIP is .211, compared to a career Minor League BABIP of .356. Given that BABIP is largely out of a player's control and given his strong MLB peripherals of a high walk rate and manageable strikeout rate, I predict that he will have a much stronger 2nd half. .275/.350/.400
- Jose Altuve has been a godsend, but his .355 first-half BABIP is really high. Given that he doesn't walk much, I think he'll come down a bit, though his high-contact, low-strikeout approach will prevent a collapse. .300/.340/.410 in the 2nd half.
- Dexter Fowler and Matt Dominguez will keep doing what they did in the first half.
- Jonathan Singleton should see a pretty dramatic improvement in the 2nd half, due to his walk rate and low first-half BABIP. Those strikeouts gotta come down though, and they probably will. That peripheral does NOT match his minor league strikeout rate, and should correct itself.
- Josh Fields should be one of the Astros' best relievers in the 2nd half, and will win the 2015 closer's job outright. His 1st-half BABIP is .364. His FIP is 2.10.
- Tony Sipp will probably do significantly worse in the 2nd half, judging by his crazy-low BABIP. The time to trade him is now.
- Ditto Collin McHugh, only don't trade him. He just won't be as good as he was in the 1st half. My guess? An ERA around 4.00.