clock menu more-arrow no yes mobile

Filed under:

Astros Sabermetrics: Free Samples!

CRPerry13 explains the ups-and-downs of the Astros' season by describing how performance can be wildly swung by outside influences over small time periods.

Jesse Johnson-USA TODAY Sports

In 1780, a British army of over ten thousand soldiers laid siege to Charleston, eventually causing the surrender of 5,000 American patriots.  Soon after, a force of British Loyalists massacred almost four hundred patriots near Lancaster, in the same state.  In all, it was a very bad year for the Continental Army in South Carolina.  Taken as a snapshot, one might conclude that the Revolutionists were destined to lose the war -- after all, they suffered such abject and utter defeat at the hands of the British in that state that they could not possibly win, right?

Such is the folly of making projections based on small sample sizes.

Like it or not, many (most?) things in life cannot be evaluated in small chunks.  Jobs, relationships, vehicular reliability, the movie Ben-Hur...these things all take a long exposure before the experience can give a person an accurate picture of their quality and where their deficiencies lie, if any.

And yet, we live in a society that promotes instant gratification.  Football is nice in that way.  After four games, an NFL fan generally can accurately say whether or not their team stinks or is a viable playoff contender.  Other sports are not like that.

Baseball is particularly frustrating, because it takes one hundred and sixty-two games to sift the weevils from the flour.  Also, baseball does not get its stars flying out of the gate -- it often takes five years for professionals to reach the major leagues...if they even make it at all.  Because of these things, understanding sample sizes and how to evaluate performances within the framework of a small sample is critical to staying pain-free as you bounce haphazardly between unbounded elation and abject despondency during the major league baseball season.

Traditional stats like ERA and Batting Average are great.  Their purpose is to show a player's past success in very specific aspects of baseball:  ERA shows whether or not runs scored while a pitcher is on the mound, discounting fielding errors.  Batting Average shows whether or not a player reached base via a hit, discounting walks and fielding defense.  ERA and BA have a long tradition in baseball and should continue to be used for that reason - they have a context that all baseball fans understand.

However, there are a couple things about baseball stats such as these that the enlightenment-seeking fan needs to understand.  First, statistics that measure events dependent on outside influences do not truly measure a player's skill, other than in the most broad sense.  Second, they are extremely susceptible to wild swings over small sample sizes, as those outside factors assert their influences.

One way to understand whether traditional statistics are being monkey'd with by these uppity inputs taking advantage of sample size mathematics is through use of more context-neutral statistics.  Batting Average on Balls in Play (BABIP) is a good one because it's easy to explain.  BABIP is like batting average, only it includes only those balls that the batter makes contact with and stay in play.  So, it's batting average without Home Runs and Strikeouts, and plus Sac Flies.  What has been shown over the long haul is that players have very little control over their own BABIP - it is so independent of a batter or pitcher's skill that it is sometimes used to quantify "luck" (which is an oversimplification, but who cares?).  BABIP can go nuts over small samples, and so it can be generally* used to make the following type statements:

  • Robbie Grossman sure has sucked this year.  But his BABIP is so low that he'll probably turn it around and be okay moving forward
  • Man, look at how awesome J.D. Martinez has been!  But his BABIP is .400, which is ridiculous.  He's going to crash back to earth after the All-Star break.
  • Jesus Guzman has been awful at the plate this year. *checks BABIP*  Sigh...Jesus Guzman is awful at the plate.
*I say generally because nothing is black-or-white.  There are always other factors that can cause a bad performance regardless of "luck", such as an injury, being a mental case, or trying to hit against Clayton Kershaw.

Allow me to lay a pictograph on you.  Below are three curves that show how BABIP can skew things.

I've intentionally chosen a small sample (The first 70 games of the Astros' 2014 season) to make this point.  Here we see that during the first 20 games, the Astros scored as few as 2 runs per game, which explains their horrible 9-19 record.  But we see that their BABIP during that time dropped as low as .200, where .300 is league average.  That means only 1 in 5 balls that the Astros put into the field of play fell for hits.  That's not a crappy team - that's rotten luck.  And maybe also a crappy team, but the luck thing was certainly a huge factor.

Through 70 games, BABIP swung like a wild crazy thing - because it can do that - and the Astros' runs scored came up with it.  As BABIP peaked at .325 in May, the Astros started scoring almost 5 runs per game.  Then it dropped back down to a reasonable level.

The purpose of that picture is to show how outside-influenced stats can affect the "traditional stats" of a team whose talent stays constant over short sample sizes.  It also shows how one needs to take the "season stats" listed on various web sites with a grain of salt.  At a time where the Astros' season BABIP was still otherworldly-low, their small-sample BABIP was actually sky-high.  Eventually, that high BABIP pulled the season numbers up.

Stats you are used to perform the same way.  Here is the same sample of games, showing how BABIP can cause pitcher ERA to fluctuate over small samples.
Here one can see how a pitcher's traditional stats can be mauled by the effects of "luck".  As short-sample BABIPs dodge, dip, dive, duck, and dodge, short-sample ERA follows as if being dragged along by it.  When a pitcher has a high BABIP, he is giving up more hits, and therefore more runs.  But remember - a pitcher can't control his BABIP very much, and so these fluctuations in ERA are not truly representative of his skills.  But you can look at, say, BABIP around game 35 (.330) and say "wow, that's really high.  That sure explains why the staff's ERA has been around 5.50 for the past week.  They didn't suddenly just get terrible."  Again, around game 55: "The Astros' ERA has been 2.75 for the past week, wow they're awesome!  Wait, no, their BABIP was .220, lucky dogs."

This is how you can use BABIP to not freak out (either way) over small sample statistics.  Again, this graph also shows how fluctuations at the beginning of a season can really screw with a seasonal statistic such as ERA.  It starts high because of small sample in April, but then drops to a reasonable level with the addition of the ridiculously good luck seen during the hot streak in May.  You can't take every stat at face value without a long enough time period for it to stabilize, as the orange ERA line is trying to do towards the right of this chart.

Some stats stabilize quicker than others, though.  This next graph shows the breakdown in components that go into the BABIP calculation.
In this one, we see that Home Run Rate, Strikeout Rate, and Walk rate are largely unconcerned with the up-and-down motions of their parent stat, BABIP.  Hits per Plate Appearance, on the other hand is affected pretty strongly by changes in BABIP.  Or rather, BABIP is affected strongly by fluctuations in Hits.  That's because hits are heavily influenced by outside influences:  pitcher quality, weather, defensive shifting, defensive quality, and good-old-fashioned luck.  Walks, Home Runs, and Strikeouts are primarily a factor of a batter's skill.

K%, BB% and HR% make up a few of what we call a player's peripheral stats.  Meaning that they are the building blocks used to calculate the stats we always talk about like BA, OBP, SLG%, ERA, FIP, etc.  Other peripheral stats include batted ball data like Line Drive rates, contact rates, etc.  If a player's stats are oddly bad or oddly good, and you notice that the BABIP is high or low, it's time to ask the question why.  And to answer that question, turn to a player's peripherals.  K%, BB%, and HR%, unlike ERA, stabilize comparatively quickly.  Over a smallish sample (but not TOO small!), you can get a pretty good approximation at a player's skill with that peripheral stat.

And so, using the rough methodology described above, I am prepared to make the following projections based on my knowledge of how small samples are misleading and how peripherals may be used to interpret them:
  • Robbie Grossman has been awful in the first half.  But his BABIP is .211, compared to a career Minor League BABIP of .356.  Given that BABIP is largely out of a player's control and given his strong MLB peripherals of a high walk rate and manageable strikeout rate, I predict that he will have a much stronger 2nd half.  .275/.350/.400
  • Jose Altuve has been a godsend, but his .355 first-half BABIP is really high.  Given that he doesn't walk much, I think he'll come down a bit, though his high-contact, low-strikeout approach will prevent a collapse.  .300/.340/.410 in the 2nd half.
  • Dexter Fowler and Matt Dominguez will keep doing what they did in the first half.
  • Jonathan Singleton should see a pretty dramatic improvement in the 2nd half, due to his walk rate and low first-half BABIP.  Those strikeouts gotta come down though, and they probably will.  That peripheral does NOT match his minor league strikeout rate, and should correct itself.
  • Josh Fields should be one of the Astros' best relievers in the 2nd half, and will win the 2015 closer's job outright.  His 1st-half BABIP is .364.  His FIP is 2.10.
  • Tony Sipp will probably do significantly worse in the 2nd half, judging by his crazy-low BABIP.  The time to trade him is now.
  • Ditto Collin McHugh, only don't trade him.  He just won't be as good as he was in the 1st half.  My guess?  An ERA around 4.00.