/cdn.vox-cdn.com/uploads/chorus_image/image/8370923/20120926_ajl_at5_029.0.jpg)
We are approaching the time of year when we will see "experts" predict the Astros' record. The Las Vegas oddsmakers have weighed in with an over/under W/L record of 59.5 wins (102.5 losses). Baseball Prospectus released it's initial W/L projection: 68-94 for its playoff odds system and a less regressed version at 99 losses.
One of the reactions I have read from some Astros' fans is "how can they project that the Astros will have a better W/L record this season?" One of David's newspaper colleagues provides an example of this line of thinking ("Anybody who is headed for Vegas please call me. I need to get down on the under...") I can understand the reaction. The Astros have subtracted productive players from a team that racked up 107 losses last year, and are headed to the toughest division in the toughest league. This leads some pessimistic fans to believe that a 110 - 120 loss season is in the offing. A New York Post article suggested that the Astros' would make a run at the 1962 Mets' catastrophic record. While anything is possible, I don't think that kind of extreme outcome is likely. I'll get around to some data to support my view; but, first, we have talk about statistical concepts.
In reacting to these W/L projections, I think fans tend to forget the concept of regression to the mean. All else equal, changes in teams' records, over time, will tend to move in the direction of a .500 W/L record. A common fallacy is to add or subtract wins from a team's record simply based on additions or subtractions from the roster. Most of the fans at TCB are familiar with the concept of regression when we examine players' individual statistics. The concept is applicable in a more global way to team projections.
The legitimate projection systems regress predicted W/L records, which compresses the range of expected records more compactly around .500. That doesn't mean the actual range of records will be that compact; random variation and other unexplained factors may cause some teams to over-perform and other teams to under-perform. David Cameron has made a semantic distinction between projections and predictions: "Projections are information about what we think we currently know, while predictions are speculation about things that we probably cannot know." In this case, we believe we can reasonably project the talent on a team's roster, but we can't predict which teams will over- or under-perform it's talent level. In a ballpark sense, projected team records may have an error range of plus or minus 9 games, with one third of the error related to the accuracy of the forecasts of player talent and two thirds of the error related to stuff we can't predict. With the worst record in the league in 2012, there is a pretty high liklihood that the Astros under performed, which is an automatic problem with the addition/subtraction fallacy mentioned above.
Looking at the Orioles, a team which is likely to face regression in an opposite direction to the Astros, Jack Moore's fangraphs article uses bullpen Win Probability Added (WPA) to demonstrate why the Orioles' 92 win record in 2012 might be projected to regress to 76 wins in 2013.
The Astros and The Plexiglass Principle
About 30 years ago, Bill James, the godfather of sabermetrics, developed six leading indicators to determine whether a team is likely to improve or decline. He coined "plexiglass principle," which is a more colorful term than regression. Simply put, this principle contends that teams that improve in one season tend to decline in the next, and vice versa. He also used the term "law of competitive balance" to suggest that teams with losing records tend to improve and teams with winning records tend to decline. The Astros, by the way, qualify for "improvement" on four of James' six leading indicators.
Moore's article shows that WPA, part of fangraphs' win probability family of statistics, is subject more strongly to regression than measures like WAR. This isn't surprising since WPA generally cannot be used to predict players' future performance. However, team WPA is highly correlated with wins and losses.
The 2012 Astros had the MLB's worst WPA in both the pitching and batting categories, for a total WPA of -26. In addition, the Astros were the worst clutch batting team, and the 5th worst clutch pitching team. The 2012 Astros also underperformed its Pythagorean Record--a projection of W/L record based on runs scored and allowed--by four games. WPA, "clutch," and Pythagorean deviations all have large elements of luck or randomness, and should be vulnerable to regression. As Moore states in reference to potential decline by an over performing team, " teams regress to the mean, but the combination of regression and the ever impermanent nature of clutch performance leads to a doubly hard fall." The same reasoning should apply to "doubly hard" regression in the opposite direction for a team like the Astros.
Analysis of WPA Regression
Team WPA reflects the accumulation of players' positive and negative contributions to the probability of winning each game. Certainly WPA is influenced by the talent level of a team, since good players have more opportunity to succeed. But WPA also reflects more transitory effects, like the distribution of hits, HRs, Ks, BBs, and other events across different leverage situations. If players have more success when the game is out of reach, for instance, the contributions produce minimal WPA. Teams can also accumulate high levels of positive or negative WPA if the team is involved in many close games and a lot of close and late situations within the games. "Clutch" is a subset of WPA which measures whether players' performance improved or declined in high leverage situations. In theory, really bad WPA and clutch performance should be subject to substantial regression toward average performance.
My impression is that batting WPA is most susceptible to regression in a short period of time. For my analysis, I examined the 29 teams with a batting WPA of -10 or worse over the 15 year period, 1997 - 2011. (The 2012 Astros had a -12.57 batting WPA.) I tallied the change in WPA for each team's subsequent season to determine if regression occurred.
Team |
WPA |
Yr. 2 WPA |
Difference |
Clutch |
Yr. 2 Clutch |
Difference |
Astros 11 |
-12.8 |
-12.57 |
0.23 |
-2.6 |
-4.19 |
-1.59 |
Cubs 10 |
-11.31 |
-0.87 |
10.44 |
-4.63 |
1.73 |
6.36 |
Mariners 10 |
-11.09 |
-9.01 |
2.08 |
0.14 |
0.72 |
0.58 |
Pirates 10 |
-13.51 |
-4.32 |
9.19 |
0.41 |
3.54 |
3.13 |
Orioles 09 |
-10.17 |
-8.55 |
1.62 |
-2.54 |
-0.15 |
2.39 |
Royal 09 |
-10.72 |
-7.53 |
3.19 |
-2.1 |
-2.22 |
-0.12 |
Nationals 08 |
-11.88 |
-7.77 |
4.11 |
-2.82 |
-3.71 |
-0.89 |
Royals 07 |
-10.43 |
-6.66 |
3.77 |
-1.4 |
1.15 |
2.55 |
Rays 06 |
-12.2 |
-3.04 |
9.16 |
-7.42 |
-1.94 |
5.48 |
Pirates 06 |
-11.37 |
-8.76 |
2.61 |
-6.69 |
-4.81 |
1.88 |
D-Backs 04 |
-16.78 |
-6.06 |
10.72 |
-1.3 |
-4 |
-2.7 |
Blue Jays 04 |
-11.42 |
-5.26 |
6.16 |
-1.28 |
-0.57 |
0.71 |
Mariners 04 |
-10.89 |
-5.48 |
5.41 |
-3.83 |
1.52 |
5.35 |
Brewers 04 |
-10.56 |
-0.28 |
10.28 |
-2.83 |
-1.29 |
1.54 |
Tigers 03 |
-20.5 |
-4.34 |
16.16 |
-6.4 |
-5.45 |
0.95 |
Brewers 02 |
-15.82 |
-3.68 |
12.14 |
-6.37 |
-0.69 |
5.68 |
Tigers 02 |
-11.56 |
-20.5 |
-8.94 |
0.95 |
-6.4 |
-7.35 |
Royals 02 |
-11.17 |
-0.25 |
10.92 |
1.66 |
3.51 |
1.85 |
Rays 02 |
-10.43 |
-9.78 |
0.65 |
2.43 |
-7.05 |
-9.48 |
Royals 01 |
-11.88 |
-11.17 |
0.71 |
-2.24 |
-3.51 |
-1.27 |
Phillies 00 |
-10.76 |
4.54 |
15.3 |
-1.08 |
0.11 |
1.19 |
Twins 00 |
-10.01 |
2.73 |
12.74 |
1.66 |
2.8 |
1.14 |
Twins 99 |
-16.63 |
-10.01 |
6.62 |
-2.92 |
1.66 |
4.58 |
Angels 99 |
-10.83 |
0.43 |
11.26 |
-1.28 |
-2.08 |
-0.8 |
Tigers 99 |
-10.17 |
-3.5 |
6.67 |
-5.18 |
-2.6 |
2.58 |
Rays 98 |
-13.27 |
-9.38 |
3.89 |
-1.99 |
-3.78 |
-1.79 |
Pirates 98 |
-10.89 |
-4.88 |
6.01 |
-1.5 |
-5.14 |
-3.64 |
D-Backs 98 |
-10.84 |
11.33 |
22.17 |
-4.22 |
2.8 |
7.02 |
Tigers 98 |
-10.45 |
-10.17 |
0.28 |
-3.66 |
-5.18 |
-1.52 |
Cubs 97 |
-10.35 |
6.11 |
16.46 |
-2.53 |
1.62 |
4.15 |
Average |
-12.02 |
-4.96 |
7.07 |
-2.39 |
-1.45 |
0.93 |
All but one of the 29 teams experienced regression in WPA during the next year. (That team was the 2002 Tigers, which followed up a 106 loss season with the 118 loss 2003 season.) The average improvement in WPA during the succeeding season was 7.07---approximately 59% regression to mean. On average, the teams' clutch performance improved from -2.39 to -1.45, which is about 39% regression.
I also examined the Pythagorean Record for each of the teams. On average, these teams underperformed their Pythagorean projection by 2.07 wins. And, on average, the "next year" Pythag deviation improved to one half win below the Pythagorean Record. This is equivalent to a 26% regression in the Pythag deviation.
WPA has a direct effect on W/L record. This analysis shows that teams with a very bad season of batting WPA tend to experience improvement in batting WPA during the subsequent year. Since the Astros have experienced two consecutive seasons of league worst batting WPA and clutch batting, hopefully the team will benefit from significant improvement in WPA and clutch batting during 2013, as most similar teams did during the past 15 years.
Are the forces of regression sufficient to overcome the impact of moving to the AL? Who really knows. At some point in the future, I might venture my own projection of the Astros' W/L record. However, history and WPA provide a glimmer of hope for the 2013 season.