clock menu more-arrow no yes mobile

Filed under:

Talking Sabermetrics: What Does Astros Pitcher BABIP tell us?

Looking at x-BABIP and batted ball data to tell us what we may see from Astros pitching in the future.

Thomas B. Shea

BABIP (Batting Average on Balls in Play). Simple statistic. But much analyzed by sabermetricians.

DIPS theory, an important sabermetric concept, indicates that pitchers have limited control over the outcomes of batted balls in play. A corrollary to DIPS is that batters probably have more influence over their BABIP than pitchers do over the BABIP they allow. As a result, analysts and fans frequently compare a pitcher's BABIP to the league average BABIP and assume that the difference between the pitcher's results and the average is due to some combination of luck and defense. This can provide a starting point for predicting the direction of a pitcher's regression from season to season.

However, this isn't the end of the story. We know that pitchers have differing tendencies for fly ball and groundball rates, which affects BABIP (since groundballs generally produce more hits in play than fly balls). It's widely accepted that specialty pitches (e.g., knuckleballs or Mariano Rivera's cutter) can confound BABIP expectations. And it doesn't take much to expand that conjecture to other pitchers who seem to induce weaker than average contact. A better way of looking at BABIP is to view it as a 30 or 40 point range around the league average. Pitchers with really high or low BABIPs beyond that range probably have experienced unsustainable results. But it is also possible that pitchers' differing skills and pitch types will slant their BABIP result toward the high or low end of the normal range. Because BABIP is subject to quite a bit of random variation, it is difficult to distinguish pitcher-specific BABIP from luck or defense.

A recent article in the "Community Research" section of Fangraphs provides an interesting effort at illuminating pitcher BABIP. Steve S presents his results in Projecting BABIP Using Batted Ball Data, and develops a formula for pitcher-specific "expected BABIP," or x-BABIP. (As you may know, x-BABIP formulas have been applied previously to hitters.) Later in this story, I will use that formula to calculate x-BABIP for Astros' pitchers. The most interesting aspect of his article is that it shows a myriad of correlation coefficients between and among BABIP, batted ball types, pitch f/x data, and pitch types. I wouldn't take these results as conclusive, but I think they show the direction of certain relationships, and provide some leads for future hypotheses which might help explain how or why pitchers achieved a particular performance level.

Not surprisingly, line drive rates and infield fly rates are found to be important determinants of BABIP. However, line drive rates are less predictable, even if multiple years of data are used to predict a future line drive rate. The pitcher has some effect on the line drive rates; for example, groundball and strike out rates seem to be correlated with preventing line drives. But, in general, line drive rates probably reflect more year to year random fluctuation. Infield fly ball rates, on the other hand, appear to be more predictable and consistent from year to year. As the Hardball Times glossary states: For some pitchers, inducing infield flies may be a repeatable skill. And infield fly balls are an important outcome, since infield pops are the surest form of out other than a strike out.

Another thought provoking observation from the article: different, and perhaps divergent, pitching skills are related to preventing line drives and inducing infield flyballs. Four seam fastballs in the zone combined with using slower pitches to change speeds appear to be the best recipe for inducing infield flyballs. Fastball movement has a closer relationship to infield flyball rates than velocity. Avoiding line drives, on the other hand, appears to depend on sinkers and 2 seam fastballs, as well as higher velocity. To some extent, these two elements of run prevention profile as two different types of pitchers.


The fangraphs article develops the following formula for a pitcher's expected BABIP: xBABIP = 0.4*LD% – 0.6*FB%*IFFB% + 0.235

I calculated the x-BABIP for some of the Astros' pitchers likely to return next year in key roles. Comparing the pitchers' actual BABIP to x-BABIP provides some information regarding the liklihood of a positive or negative regression in the pitcher's performance next year. If the pitcher's actual BABIP exceeds x-BABIP, this may support an argument that the pitcher is likely to revert to a lower BABIP, and vice versa. However, if the lower than expected BABIP performance is primarily due to poor defense (not an unreasonable possibility, since the Astros did not rank well on advanced defensive stats), than a reversion to x-BABIP may depend on an improved defense in 2013. The variance column is actual BABIP minus x-BABIP (i.e., a positive variance means that the pitcher is expected to revert to a lower BABIP).





FDP wins



























































NL Avg.


The column titled "FDP Wins" is the potential impact of batted ball and the timing of pitching outcomes during 2012, expressed on a WAR type basis. I described Fielding Dependent Pitching Wins in a previous article here. A negative FDP Wins means that the pitcher was hurt by the number and timing of balls in play, and is consistent with a positive variance between BABIP and x-BABIP.

Results and Predicted Direction

  • Lucas Harrell. Neutral. Harrell's BABIP is one point higher than his x-BABIP--for our purposes, his performance was consistent with his x-BABIP. However, Harrell's FDP-Wins is negative, primarily because he was below average in leaving runners on base. If you believe that this mostly reflects bad luck, then he may experience improvement in the future. I'm not sure we can assign all of the consequences of negative timing to bad luck, though. The Astros' defense faltered too often with runners on base, and Harrell sometimes had control issues in those situations.
  • Bud Norris. Positive Regression. Norris' actual BABIP was somewhat higher than his x-BABIP, indicating the potential for a reversion to a lower BABIP. Norris allowed a high line drive rate, and was fairly decent at inducing infield flies. Since line drive rates are more erratic than infield fly rates, it is possible that Norris' x-BABIP next year could be lower than 2012. Combined with other indicators like SIERA (3.90), the signs appear to be good for a bounce back by Norris in 2013.
  • Jordan Lyles. Positive Regression. Lyles' actual BABIP was substantially higher than his x-BABIP. This, in combination with a severely negative FDP-Wins, suggests the potential for significant improvement in his performance next year. Like Harrell, Lyles frequently was hurt by defensive misplays with runners on base. Lyles is doing a lot of things right, and he should improve significantly with any degree of better luck and improved defense.
  • Dallas Keuchel. Negative Regression. Keuchel's .277 BABIP is substantially below his x-BABIP. This is a bad sign for potential improvement next year. Perhaps we can see a glimmer of hope in his ability to suppress line drives and induce a reasonable number of infield flyballs.
  • Wesley Wright and Wilton Lopez. Negative Regression. Lopez and Wright allowed a BABIP substantially below their x-BABIP. This could suggest that both may regress toward allowing more hits next season. However, it's possible that the x-BABIP comparison may not work as well for relief pitchers. Both Wright and Lopez allowed high line drive rates, even though they are groundball pitchers.
  • Fernando Rodriguez. Positive Regression. Some of the fans soured on F-Rod because he blew some high leverage appearances. But he really is a better relief pitcher than the 2012 results. F-Rod has the biggest variance between his BABIP and x-BABIP of any on the chart. F-Rod's low x-BABIP (.255) is driven by a high infield flyball rate (19.1%), combined with a reasonable line drive rate (18%). Maybe the infield fly rate and x-BABIP are a bit extreme, due to the sample size. But F-Rod fits the profile of a pitcher who can induce infield pops and suppress BABIP. F-Rod's SIERA is 3.53, another indicator of potential improved performance in 2013.

Home/Road Splits

Three Astros' starters--Norris, Lyles, and Harrell--exhibited among the largest ERA splits for home vs. road in the majors. For Norris and Lyles, the H/R split in BABIP was a major reason for the ERA differential. The BABIP splits: Norris (H) .263; (R) .326. Lyles (H) .284 ; (R) .319.

Norris is an extreme case for H/R splits. He has the worst road ERA (6.94) among qualified starters. Norris fell short of the inning threshold to rank as qualified starter at home. But if he had qualified, Norris would have the second best home ERA among MLB starters. That is amazing. Norris was almost the best major league starting pitcher at home and the worst starting pitcher on the road.

Norris has similar extreme differentials between x-BABIP and actual BABIP for home and road--except in opposite directions. Norris' x-BABIP/ actual BABIP: (H) .305 / .263; (R) .294 / .326. This would suggest that regression for both home and road BABIP will reduce the size of Norris home and road ERA splits.

But we really don't know why one team has three starting pitchers with such large ERA home/road splits (2 to 5 runs variance). Does Minute Maid Park provide an unspecified ballpark advantage? Does the Astros defense play much better at home? We could probably come up with more lines of speculation. The bottom line is, we don't know.