clock menu more-arrow no yes mobile

Filed under:

Sabermetrics: Adjusting Astros' Starting Pitchers' 2012 Performance

Do B-Ref's adjustments for factors like ballparks, opposing offenses, and the team defense change our view of the Astros pitchers' performance?

Brett Davis-US PRESSWIRE

Sometimes we think we know how good a player performed, until we find out what we don't know. Some "external" factors affect pitchers' performance but may not be reflected in most pitching stats. Let's see how it affects our view of Astros' starting pitchers.

"Sample size" is one of the most frequent terms used in sabermetrics. And, for good reason. A lot of random stuff happens in a baseball game, and players' skills may be affected by situations beyond their control. If the time period for a statistic is too small, aberrant occurrences have too much impact. With a large enough sample size, we hope everything comes out in the wash, so to speak.

Baseball fans put a lot of emphasis on a pitcher's ERA in the previous season. Probably too much. We try to compensate for the ERA sample size deficiencies by looking at advanced pitching metrics like FIP, x-FIP, t-ERA, and SIERA. But even with these metrics, other factors like strength of opposing teams' offense, park effects, team defense, and the pitcher's usage will conceal the pitcher's performance. Even neutralized stats like ERA+ and FIP- don't control for all of these factors. We like to think that these distortions, like whether a pitcher faced first division opponents or not, tend to cancel out over a full season. But, based on adjustments used by Baseball-Reference to develop pitching WAR, that isn't always the case. Because some of the factors, like defense, aren't random, even large multi year sample sizes may not correct the distortions.

Baseball-Reference.com's pitcher value calculations perform a series of adjustments for opponent strength, ballparks, team defense, and pitcher role, which culminates in an adjustment called "RA9Avg." The RA9 Avg. for each pitcher is defined by B-Ref as:

"our best estimate of what an average pitcher would do against these opponents, with this defense, and in these ballparks."

RA9Avg. for Astros' pitchers in 2012 is shown at B-Ref here. (A more extensive explanation is here.) Given the sample size issue, I confined my review to Astros' starting pitchers, rather than relievers.

RA9Avg is used in a recent Beyond the Boxscore article on the variability of pitcher run environment on the same team. The article concluded that the Astros' and Rockies' pitching staffs had the largest range of pitcher run environments, with the Astros' difference encompassed by a RA9Avg of 4.9 for Lucas Harrell and 5.29 for Dallas Keuchel. In other words, Harrell faced the weakest run environment and Keuchel, the most difficult run environment, among Astros' starters.

For my review of RA9Avg, I examined Astros' 2012 starters Harrell, Keuchel, Lyles, and Norris, plus recently acquired pitchers Bedard, White, and Humber. This choice is based on the pitchers who will be in the mix for the 2013 rotation, thus excluding the pitchers whom were traded in 2012.

RA9Avg Adjustment

In order to examine how RA9Avg might affect our pitcher evaluation, I calculated a ratio of RA9Avg to league average Runs Allowed per 9, and used the resulting adjustment factor to calculate the change implied for each pitcher's Runs Allowed per 9. I show the AL average RA9Avg for comparison, since the Astros will be changing leagues in 2013.


RA/9

Adjust for RA9Avg

Adj. RA/9

Harrell

4.18

-0.443

3.737

Keuchel

5.91

-1.147

4.763

Lyles

6.18

-1.060

5.120

Norris

4.81

-0.554

4.256

White

6.06

-2.312

3.748

Bedard

5.44

0.037

5.477

Humber

6.53

-0.365

6.165





AL Avg.



4.47

All of the pitchers, except for Bedard, benefit from recognizing the RA9Avg adjustment. Because the Astros' team defense is below average, according to defensive metrics, all of the Astros' performances are disadvantaged without an adjustment. Alex White who pitched in front of a bad defense in ballparks with high park factors, receives the largest benefit from the adjustment. Keuchel, Lyles, and Norris also had large effects; however, among the three, only Norris' performance shifts to better than league average with the adjustment.

Most of these pitchers have a ERA higher than than their Fielding Independent Pitching stat in 2012. Below see a comparison of the RA9Avg adjustment with the "ERA - FIP" differential shown at Fangraphs. (Bedard, with an upward RA9Avg adjustment, and Keuchel, who has a ERA lower than his FIP, are excluded.) This may give us an idea as to whether the RA9Avg adjustment explains the differential between ERA and FIP.



RA9Avg Adj. E-F
Harrell
-0.443 0.02
Lyles
-1.060 0.56
Norris
-0.554 0.42
White
-2.312 0.28
Humber
-0.365 0.67

As shown above, the RA9Avg adjustment is double or more than the ERA - FIP differential for several pitchers. We could take this one of two ways: either RA9Avg impact is larger than we expected, or perhaps the Baseball-Reference adjustments methods are inaccurate.

Some interesting observations: on average, Harrell faced the weakest offensive teams, and Keuchel and White faced the strongest offensive teams; Norris was hurt the least by the defense--due to his K rate and fly ball rate; White and Humber pitched in the most hitter friendly ballparks, and Norris pitched in the most pitcher friendly ballparks.

Is the Adjustment Accurate?

There is always a catch, right?

I would treat the RA9Avg adjustments, above, as illustrative, rather than definitive. The examples demonstrate the potential for these adjustments to significantly affect single season results. I think this exercise provides us an idea as to which pitchers may have been helped or hurt the most by the run environment. But the B-Ref adjustments are not perfect.

I don't want to come off as criticizing the B-Ref adjustments too severely. A lot of effort went into creating adjustments which have conceptual merit. However, the methods have some potential flaws which may affect the accuracy. In particular, the sample size problem can be seen here too. For example, we know that defensive metrics can be volatile in small samples, and the defense adjustment assumes that a team's defense is more or less uniform from game to game. The assumption that an opposing team's offensive ability is uniform throughout the year is another possible shortcoming.

But, setting aside questions about the precision of the B-Ref adjustments, does this change your view of any of the Astros' starting pitcher candidates?