Sabermetric thoughts: relievers and defensive metrics
Time for some sabermetric deep thoughts. Maybe not so deep. And more like a couple of unrelated subjects that came to mind. But hopefully not superficial thoughts either. Just sabermetric thoughts.
Glenn DuPaul at Hardball Times listed his top five relief pitchers who could improve in 2013. The Astros' Fernando Rodriguez was on the list, and DuPaul writes:
Despite his struggles with walks and home runs, Rodriguez's peripherals (FIP, xFIP, SIERA, etc.) indicate that he was much better last season than his ERA would reflect.
Rodriguez was a solid reliever in 2011 with similar peripherals, but his 2012 strand rate (65.2 percent) caused his ERA to inflate. I think his velocity, strikeouts and peripherals are a better indicator for his future.
Houston could see Rodriguez return to being a very good reliever next year for very little cost: Rodriguez will make the league minimum.
Rodriguez had some bad moments in high leverage situations last year, and, as a result, doesn't get much love from Astros' fans. As I've written previously, Rodriguez's pitching performance wasn't at bad as we think. As we try to evaluate the 2013 relief corps, it's not unreasonable to project F-Rod as a useful part of the Astros' bullpen.
DuPaul has a simple approach to projecting relief pitcher performance. Reliever performance is notoriously volatile from year to year, in large part because the small sample size. Relievers typically pitch one-quarter to one-half as many innings as starting pitchers. He relies mostly on strike out percentages to evaluate relievers, pointing out that strike out percentage is one of the few statistics to have statistical reliability around the 60 innings mark.
In another Hardball Times' study, DuPaul finds that strike out percentage is a better predictor of future performance for relief pitchers than other measures like FIP, SIERA, and x-FIP that are typically used to project the future run prevention rate. Another surprising finding is that BB% adds relatively little to the prediction capability provided by K%. It's unclear why this is the case. Perhaps relief pitchers are unlikely to stay in the big leagues without showing a minimum level of control. Or maybe the conversion of a relief pitcher's walks into runs is controlled too much by subsequent relief pitchers, which adds to the volatility in predicting run prevention.
Here is a list of the Astros' top five K rate relief pitchers in 2012.
(K % / SIERA)
Storey 26.8% / 2.99
Ambriz 26.5% / 3.59
Cedeno 26.1% / 3.18
Rodriguez 25.2% / 3.53
Wright 24.2% / 2.83
The sample sizes are small (even in the relief pitcher world) for Storey, Ambriz, and Cedeno. Rodriguez is the only one of the five to exceed the 60 inning mark associated with stabilizing K rates. The Astros selection of Josh Fields in the Rule 5 draft is certainly consistent with relying on K% to evaluate relievers. His K rate of 32% and 39% in AA and AAA is really, really good.
Advanced Defensive Metrics vs. Scouting
The advanced defensive metrics, primarily DRS and UZR, evoke criticism in some quarters. When I read an article suggesting that WAR should encompass both scouting and advanced metrics to measure defensive value, I wondered "how different are scouting evaluations of defense compared to play-by-play metrics?"
We don't have the professional teams' scouting reports available to us--and certainly not in a form that can be compared to runs saved metrics. But we do have the fan scouting reports (FSR) collected by Tango, which are conveniently converted by fangraphs into a form which can be compared to UZR and DRS. (Unfortunately the FSR is not available until the next year.) FSR probably can be used as reasonable representation of the "eye test."
I used Fangraphs to develop a defensive leaderboard of 120 players for the period 2009 - 2011. Both UZR and DRS are highly correlated to FSR: for DRS, .73 correlation and .53 R-squared; for UZR, .70 correlation and .49 R-squared. Looking at several different comparisons (including annual runs saved at the team level), DRS usually has a better correlation with fans scouting reports than UZR. Total Zone results are much less similar to FSR (.38 R-squared). Other metrics like errors, RZR, and out of zone plays have little correlation to FSR.
On the one hand, the high level of correlation tells us that DRS and UZR are largely coming to the same conclusions as the "eye test." However, roughly one half of the variation in FSR is not explained by the advanced metrics. I'm not sure that this is surprising or a cause for concern, but it could be. Clearly, FSR and UZR/DRS reach different conclusions about the relative defensive value of some players. And, it's worth noting that UZR and DRS are more closely correlated with each other than they are with FSR.
Furthermore, we don't really know much about the precise reliability of scouting reports like FSR. I have wondered in the past whether FSR respondents are influenced by the UZR/DRS results they have seen previously. Given that scouting reports for each player are dominated by fans of his team, there is a real possibility of bias toward popular or unpopular players. For that matter, even professional scouting reports could be subject to similar biases, though perhaps to a lessor extent.