In our last installment, Stephen detailed why a pitcher’s W-L record is not necessarily indicative of their relative merit as a player. Indeed, rare is the case where a pitcher’s record runs parallel to their on field performance. ERA is often cited as a secondary indicator of a pitcher’s value, to be looked at after W-L. True, it is a better barometer of a pitcher’s success than W-L for one big reason: it isolates the performance of just the pitcher in question, not the team as a whole. Whereas entire teams win and lose games, ERA takes the team out of the equation as well as those runs that score due to errors. Its improvements over W/L record notwithstanding, ERA has its own share of blemishes, the majority of which have to do with the amount of data that is not included in calculating it.
To start off with, ERA obviously only takes into account earned runs. This is important because no pitcher, not a starter or reliever, is held accountable for any unearned run. They sort of just fall into a baseball black hole, joining the likes of Derek Bell and Brian L. Hunter, never to be heard from again. The whole point of statistics is to allow baseball teams, fans and any other inquisitive person to take an objective evaluation of a player. The error is a sometimes arbitrarily arrived at number, handed down by an official scorer at the home ballpark (paid and employed by the home team).
Already, a small chink in the armor of this venerable stat can be seen. To paraphrase the great Bill James, just think about where all the focus of anyone watching baseball is most of the time: the batter and the pitcher. The official scorer, like the guy swilling Budweiser in front of you at the park, isn’t focusing on how exactly Miguel Tejada has shifted just prior to the pitch in an attempt to get to a ball in play. In mere fractions of a second from batted ball to fielder, the official scorer gets to determine whether the ball in play and the resulting defensive attempt would have customarily resulted in a defensive stop or out. If he says yes, then the pitcher is no longer accountable for that run. Lots of human error is at play in this beloved statistic.
Additionally, ERAs in certain ballparks cannot be compared to ERAs in other ballparks (or years). There is a lot of difference between pitch movement, the physical characteristics of the ball, and other factors that make balls in play easier or harder to turn into outs or hits. Further, runs credited to ERA are sometimes scored against the starter, but allowed to score by relief pitchers who weren't "responsible" for them being on in the first place -- meaning that starter had no say in preventing them from scoring, but is being tagged none the less. Finally, sample size makes ERA a less than worthy statistic when comparing a 200 IP starter against a 60 IP closer. These quibbles are of great importance in figuring out a way to analyze a pitcher’s performance.
ERA was created in an attempt to separate defense and pitching. This is one of the reasons why it’s still a semi-useful statistic. With the influx of sabermetricians and statistically minded fans and executives, new ways of evaluating a pitcher’s performance have been developed. Support-Neutral, Defensive-Independent and Fielding-Independent metrics go beyond ERA to give a more in-depth analysis of a pitcher’s performance. By in depth, I mean taking a look at what a pitcher can control, even more than ERA can. Understanding BABIP is a starting point for this. The basic premise is pretty simple, yet is pretty startling for any baseball fan not familiar with sabermetrics. In essence, BABIP (batting average on balls in play) demonstrates the relative amount of luck that goes into balls in play being converted for outs. Consider that league average BABIP is generally reported to be with .290 and .300 and then take a look at this chart and its reported BABIP for pitchers. It’s all over the place, because even half way through a baseball season, luck hasn’t evened out for everyone.
To create a more concrete link to why ERA is a weak(er) stat because of the impact that BABIP has on it, let's look further at the batter-pitcher match-up. The "action" of a batter-pitcher match-up can be separated into two parts -- the first of which is the act of the pitcher delivering the ball to the hitter. Without a doubt, the pitcher has a great deal of control over this -- what pitch he throws, the velocity, spin, location, and deception are all within his ability to alter.
The second part of this interaction begins after the hitter makes contact with the bat. This is the part that both hitter and pitcher have a relatively small impact on- other than as defender and base-runner. What the other 8 fielders do to the hitters’ ball-in play are out of their hands. Before Stephen and I learned about BABIP, we’d often be watching an Astros game where Jack Wilson would hit a little duck-snort over Adam Everett’s head for a single. Half an inning later, Lance Berkman would line out to Adam LaRoche. I’d turn to him and say, "typical Astros luck." Well, I was partially right- it was obviously bad luck, which deep down I knew wasn’t just an Astros related phenomenon. What I didn’t know what just how much luck went into the batter-pitcher match-up.
So what does BABIP have to do with ERA? Well, ERA measures the runs that a pitcher is responsible for allowing to score. However, if a pitcher has very weak control over everything in a PA besides K, BB, and HBP, then how valuable of a statistic can it be in accessing the pitcher’s performance? The chart I asked you to click to earlier, which displayed randomly varied BABIPs and it was in an effort to drive home the point that BABIPs vary for really no discernable reason. If balls in play are unluckily landing for hits more often than they should, then we would expect a pitcher’s ERA to suffer disproportionately from his true skill-level or vice-versa if BABIP is extremely low.
Now, there is a heaping amount of gray area that go into saying BABIP is largely luck, but we’ll discuss those intimately in the "DIPS, LIPS, and FIP" next time. These measures seek to determine how well a pitcher pitched in the areas of a pitcher’s performance that they have an inordinate amount of control over: pitch speed, location and homeruns, but we’ll give you a small preview.
Hidden within BABIP are a few characteristics that need be mentioned. Earlier in this post, I attempted to impress upon the fact that BABIP itself is an essentially random statistic.
Well, it is, and it isn’t. What is not random about it is the less than a second’s worth of time between the ball leaving the pitcher’s hand before either being hit by the batter, or caught by the catcher. Factors such as where the ball is pitched relative to the strike-zone, how many pitches the pitcher has in his repertoire, and how often the pitcher gets ahead or behind in the count are factors that all pitchers have under their immediate control and these all impact the degree of luck associated with BABIP -- because in the end these afformentioned factors make a pitched ball easier or harder to make solid contact with for the hitter. The further sabmetricians have probed the batted-ball issue, the more they have come to believe that LD% is almost completely a factor of luck. However, buregoning evidence suggests that he can control ground balls, and outfield flys and in-field flys. This seems reasonable given GB% has a year to year correlation of .807, statisitically significant, indicating it is repeatable skill.(Source, also click for explanation of correlation if your fuzzy on it).
Further influencing the degree to which BABIP plays a part in ERA are the skill sets of K/batter and BB/batter, which carry year to year correlations of .790 and .676 respectively (Source). These two skill sets are statstically significant and again indicate that their outcome is based on the pitchers skill. To the extent to which a pitcher limits balls in play by walking batters and striking out batters, he influence the amount of luck that will enter into his ERA. This is all the more reason why the aforementioned means of analysis (support neutral wins and losses, defensive independent pitching statistics, etc.) are important and valuable.
Predictability is ideal for a franchise in evaluating a player, because they want to know ahead of time how a player will not only perform next season, but in seasons multiple years into the future, or whether past performances have been the result of skill or luck. Understanding, for instance, that ground-ball pitchers are overvalued in their ERA numbers because more unearned runs score when groundball pitchers are on the mound than do fly-ball pitchers is very important. Why? Because, as mentioned earlier, at the end of the game, they don’t subtract UER from the total score to determine the winner.
I hope I demonstrated that so much of what goes into turning out an ERA does not accurately reflect the performance skill of a pitcher. Much of what ERA reports is context dependent on the "type" of pitcher, the defense backing him, the skill of the bull pen, and a measurable degree of luck; making ERA uninformative and often misleading. Support-neutral and Defense-independent statistics do a better job of capturing what is really happening on the baseball diamond -- something that ERA cannot do.