When we last checked in on pitching, we ran you through the how’s and why’s of ERA’s deficiencies. Evan had the harder of our two tasks in breaking down ERA vs. explaining DIPS. It’s easy to explain how DIPS works and why it is that their measures most accurately capture the true skill level of a pitcher’s performance through it’s methodology, but it’s incredibly difficult to explain why ERA is a faulty statistic in general, because there so many variables that are captured by ERA that are irrelevant (scorer bias) and luck dependent (LD%, HR/FB%, etc.). Today, I will try and bring you deeper into what I consider sabermetrics most valuable contribution to baseball: defense independent metrics (defense independent pitching, luck independent pitching, and fielding independent pitching).
For reasons hopefully to explicated clearly below, DIPS does a far better job of capturing the true skill and performance of a pitcher than ERA, W-L, single rate stats (K/9, BB/9, WHIP, etc.), or even the aforementioned Support-Neutral family of statistics. While rate stats (except for WHIP) accurately gauge a singular skill of a pitcher, they don’t tell us about his entire skill set. ERA and W-L, as previously discussed, are poor, to down right awful, at gauging a pitcher’s skill level. Even the Support-Neutral family of statistics is still hampered by many of the same things that ERA is, because while it gives a better feel for how a single pitching performance by a pitcher helped his team stay in a game to win it, it can’t tell us whether that pitching performance was strong or weak based on the pitcher’s ability or a the variety of randomly varying factors that impact a pitching performance.
As alluded to by Evan, the vein of pitching-analysis we’re venturing down today was inspired by the BABIP/ERA phenomenon, first observed by Voros McCracken. In his original article on DIPS, Voros surmised that, "there is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play." This bold statement has since been heavily amended, but can be restated as something akin to: Major-league pitchers have little control over their ability to prevent line-drives which heavily affects their BABIP, where as they show a significant ability to control GB, IF, and FB, they however show little ability to control the outcome of any batted-ball. I know that just took out 3/4 of the kinds of balls in play and then reassigned them to the pitcher’s responsibility category. Which given our treatment of BABIP last time seems either counter-intuitive or like we were lying. Hopefully I’ll be able to clear it up.
When originally proposed by Voros the first time, he labeled the following pitching categories defense independent:
BB K HBP IBB HR
Those are all categories which are truly defense independent statistics as only the pitcher can cause/allow them to occur. The first incarnation of DIPS worked by finding the rate at which these statistics occurred for a pitcher and then subtracting out how many batters faced would have walked, struck out, been hit, intentionally walked, or homered from the total number of batters faced. From there he broke down singles, doubles, triples, and outs for each pitcher and set the rate at which these occurred to the league average BABIP. He did this because, prior to further investigation by many different analysts, it seemed like BABIP truly was out of the pitcher’s control entirely. What he had left was the number of BB, K, HBP, HR, IBB, 1B, 2B, 3B, and outs for a pitcher that would have occurred all things equal. With that he assigned each event a run value (what they were, I’m not sure, but I imagine it was the standard BaseRuns) to then calculate the number of Runs Allowed by a pitcher with the new numbers. This would result in a DIPS ERA, which could really be thought of as a DIPS RAA (runs allowed average). The measurement is still somewhat valuable as a quick way to see whether an extreme ERA is valid or not and is now cited as FIP (fielding independent pitching).
Understandably this was controversial, because conventional baseball wisdom assumed pitchers could control hits allowed (it’s why WHIP is such a popular fantasy baseball category). In order to prove/disprove Voros’ assertion, the last seven years has seen a tremendous amount of research go into DIPS. The results have yielded a more nuanced understanding of the batter/pitcher match-up and the subsequent dividing line between pitching and defense.
As up to date as I have seen it, here is how DIPS is calculated to reflect the more nuanced truth that has come to l ight since Voros’ original proposition of BABIP as pure chance.
A pitcher is assigned a league average LD%, reflecting that statistically LD’s are uncontrollable according to year to year correlations (Source). This luck-less amount of line-drives are then subtracted from the pitchers’ number of batted-balls. Next, the rate at which a pitcher surrendered Ground balls, Infield-flies (IF), Outfield-flies (OF), and bunts are are applied in the same manner as line drives to determine the number of each the pitcher would have allowed given the neutral amount of LD’s being subtracted. Then, to correct for the role that defense plays in each of the kinds of batted balls, league average results are applied to them. So if 45% of GB fall singles and the pitcher in question, after the adjustments, was given 100 GB’s, then he’d be credited for 45 singles. The resulting 55 would have the same thing done for 2B, 3B, HR, outs, double plays, and reached on error (ROE). The same for his adjusted number of LD, IF, OF, and bunts. With the pitcher’s new line of corrected K’s, BB’s, IBB’s, HBP, 1B, 2B, 3B, HR, ROE, and outs, a defense independent Runs Allowed is then calculated by assigning a run value to each event via BaseRuns. This methodology is DIPS 3.0.
The question that remains (if you’re still even reading) is why is it valid to count LD% as luck, but GB% and IF% and OF% as pitching skill? To this end, I will stick withMitchel Litchman’s study of the year to year correlation of differnt types of balls in play to bolster DIPS 3.0 amp;rsquo;s methodology. Litchman studied pitchers, over a 10 year period (1993-2002), who changed teams to study their batted-ball data. Why those who changed teams? In Litchman’s words it is so "we have essentially removed the home park and defensive influences from the correlations." His study involved over 100 pitchers who had a minimum of 300 balls in play in the consecutive seasons -- a large sample size to say the least. He than ran the year to year correlation on the different kinds of batted balls. His results indicated that pitchers show absolutely no control over LD’s, but exhibited a good degree of consistency (i.e. control) in IF, OF, and GB (each factor was listed from least amount of control to strongest).
So hopefully that clears up why it is that when we calculate a DIPS, the LD% is automatically league average, and then the pitchers own individual rate of surrendering IF, OF, and GB is left alone. However there is one issue that probably is still lingering in your mind. If GB, IF, and OF are under the control of a pitcher then why does DIPS 3.0 then assign league average rates of results to each batted-ball type (i.e. Why say if a GB goes for a single a league average 57% of the time and a pitcher surrendered 100 GB, therefore 57 defense independent GB singles, instead of however many singles he actually surrendered on GB?)? Again, we have to look back to Litchman who determined through the same study I’ve already discussed, that while pitchers show the ability to control what kinds of batted balls they allow, they show almost no control over the rate at which those balls in play go for outs. If pitchers cannot control the rate at which their batted balls go for outs, then they cannot reasonably be expected to control the outcome of the balls in play that do not go for outs. Thus, DIPS 3.0 corrects that.
Alright, we’ve come a long way. What we’ve covered so far, is that from the initial assertion that BABIP is completely a factor of luck (that the only pitcher/batter outcomes a pitcher determines is BB, K, IBB, and HBP), further research revealed that pitchers can control the kinds of balls in play they allow, just not there outcomes. I wouldn’t doubt that you’re wondering: "What the hell does that even mean, aren’t you saying the same thing which just some qualifiers?" Ok, probably not thinking that, but I did. While the outcome of batted balls are out of a pitcher’s hands, except for line-drives they can control the type of ball put in play. Since certain kinds of batted balls go for differing levels of hits/outs more often than others, pitcher’s can -- in a sense -- control their destinies. All they can do however is increase or decrease the probability that ball in play goes for an out, because (as I noted earlier)Litchman’s work indicates that increasing or decreasing outs on balls in play is not a skill pitcher’s possess. Instead, it is one that the defense backing him possess. Thus, DIPS corrects the outcomes of batted-ball types to league average, in order to neutralize the role that defense plays in a pitcher’s skill domain.
What DIPS leaves a GM, Manager, Scout, Fantasy Baseball Player, or Fan with, is a metric that captures the actual skill level of the pitcher. It has been adjusted to remove luck and the abilities of others from obscuring the work of a pitcher. How can we be sure of this? Because, DIPS 3.0 has a correlation of .8 with the next year’s ERA, where as ERA has a year-to-year correlation of .374 with itself. A word of caution to DIPS in any shape or form is that they are not an explanatory stat unless you dig deeper into to why there is a differential between ERA and DIPS. Although it is a predictive stat, it is only truly useful at predicting pitchers ERA given the proper context for their DIPS-ERA differential. Things like injuries, command problems, or poor pitch sequencing can all serve to artificially skew the DIPS-ERA differential while not being the result of chance. Not to mention who the defenders backing him are (Carlos Lee, I’m talking to you). On the whole though, looking at pitcher’s with positive DIPS-ERA differentials portends to poor future performance. One $120 million oversight by an organization refusing to employ sabermetric analysis in player evaluation is the infamous Barry Zito. In his contract year, Zito posted a 3.83 ERA, but it was deflated largely due to a ridiculous 78.5% LOB%, which was reflected in his 4.65 DIPS. Brian Sabean could and should have easily been able to observe that Zito ’s continued success depended on strong defense, as both his IF% had fell of the table as well as a steadily declining K/9, and that such a pitcher, no matter his past performance levels, does not warrant a $120 million contract.
Now before Astros fans start ridiculing the Giants, let’s not forget December 2006, when Jason Jennings came to town. Tim Purpura traded for Jennings off the strength of his 2006 3.78 which for Coors Field is probably like saying Roger Clemens had a 1.87 ERA in 2005. However it, like Zito’s ERA, was deflated due to a totally unsustainable HR rate, which is represented in his 2006 DIPS of 4.61. Would anybody else like to have Willy T, Jason Hirsch or Taylor Bucholtz still in light of this? (DIPS 3.0 Source, you have to go to the bottom and open the spread sheet). **Side Note, you could do the exact same thing with Woody Williams, 2006 ERA: 3.65, 2006: DIPS: 5.03...eeesssshhh** It works the other way too, the quickest example I could find was Freddy Garcia from 2004-2005. 2004 saw him post a dismal 4.64 ERA, whereas DIPS 3.0 had him at a 3.54. In 2005, Garcia posted 3.54 ERA. Not too shabby DIPS 3.0.
Ok, that was exhaustive for me and I’m sure for you. We’ll save LIPS for next time and jam it in with pitch/fx, which actually makes a good deal of sense to do. Addendum number one the syllabus should read: "DIPS > ERA" and "The Next Frontier: LIPS and pitch/fx.