UPDATED: Tweaking the Pythagorean Formula
UPDATED: So I ran the numbers for all the teams in 2008. The short of it is this: if you strip out blowouts, you get closer to the real results for some teams, particularly when they outperformed the regular Pythagorean expectations (Astros, LA Angels, Marlins). Other teams however, the method went the wrong direction, and in a big way (particularly the Braves, but the Mariners too). So this isn't really an improvement on the system, but now we know what happens when you disregard games decided by 9 runs or more. I learned a lot about the Pythagorean formula along the way and now I don't have this nagging question in the back of my head, so for me it was a win. Sorry if you feel I wasted your time!
For the curious, here are the numbers:
| Team | Actual Record | Pythagorean Record | Pythagastro Record |
| Arizona Diamondbacks | 82-80 | 83-79 | 82-80 |
| Atlanta Braves | 72-90 | 78-84 | 83-79 |
| Baltimore Orioles | 68-93 | 72-90 | 74-88 |
| Boston Red Sox | 95-67 | 97-65 | 95-67 |
| Chicago Cubs | 97-64 | 100-62 | 96-66 |
| Chicago White Sox | 89-74 | 90-72 | 84-78 |
| Cincinnati Reds | 74-88 | 71-91 | 74-88 |
| Cleveland Indians | 81-81 | 86-76 | 84-78 |
| Colorado Rockies | 74-88 | 73-89 | 75-87 |
| Detroit Tigers | 74-88 | 78-84 | 76-86 |
| Florida Marlins | 84-77 | 81-81 | 84-78 |
| Houston Astros | 86-75 | 78-84 | 85-77 |
| Kansas City Royals | 75-87 | 71-91 | 71-91 |
| L.A. Angels | 100-62 | 89-73 | 92-70 |
| L. A. Dodgers | 84-78 | 87-75 | 87-75 |
| Milwaukee Brewers | 90-72 | 88-74 | 85-77 |
| Minnesota Twins | 88-75 | 90-72 | 93-69 |
| New York Mets | 89-73 | 90-72 | 87-75 |
| New York Yankees | 89-73 | 88-74 | 87-75 |
| Oakland Athletics | 75-86 | 76-86 | 79-83 |
| Philadelphia Phillies | 92-70 | 94-68 | 88-74 |
| Pittsburgh Pirates | 67-95 | 66-96 | 70-92 |
| San Diego Padres | 63-99 | 66-96 | 66-96 |
| San Francisco Giants | 72-90 | 67-95 | 69-93 |
| Seattle Mariners | 61-101 | 66-96 | 69-93 |
| St. Louis Cardinals | 86-76 | 87-75 | 86-76 |
| Tampa Bay Rays | 97-65 | 92-70 | 88-74 |
| Texas Rangers | 79-83 | 75-87 | 76-87 |
| Toronto Blue Jays | 86-76 | 94-68 | 86-76 |
| Washington Nationals | 59-102 | 61-101 | 62-100 |
OK, so this is going to be a statistics-oriented FanPost and it's going to discuss the Pythagorean win percentage. So don't say I didn't warn you. I originally buried it in a comment-thread, but Dying Quail suggested it was FanPost material, so here it is.
Everybody knows that in 2008 the Astros actual record was significantly different from their Pythagorean record. Actual record: 86-75. Pythagorean record: 77-84. That's a huge swing, and it has been the basis of many people predicting a significant drop-off for the Astros in 2009. Some people have tried adjusting the Pythagorean formula to get more accurate results by fiddling with the exponent in the formula. I'm going to try to do it a different way (one that is a bit more conducive to working on pen and paper).
The Pythagorean formula only looks at how many runs a team scored and how many runs a team gave up over the course of its season to determine how many games they were "expected" to win. This methodology, however, is subject to distortion by blowout games where one team scores a lot of runs and the other team scores few. These blowout games, though, are not normally indicative of how a team plays in most of their games.
This got me wondering whether the Pythagorean records would come closer to the real records if we stripped out these blowout wins and losses. In theory, removing the blowout wins and losses could give you a better picture of how the team plays most of the time and gives less emphasis to those fluke-y games. So I did this for the 2008 Astros.
First, I had to figure out which games to strip out. You don't want to remove too many of the blowouts or you could easily distort the record the wrong way. So I decided to remove the top 5% of blowouts. The way to do this is by looking at each game and calculating the run differential, whether it was a win or a loss. If the Astros score 4 and the Dodgers score 2, the run differential is 2. If the scores were flip-flopped, the run differential would still be 2. Then you calculate the average run differential over the course of the season. In this case, the average was 3.39 runs.
The next step is to calculate the standard deviation. By definition, 95% of the game run differentials should fall within 2 standard deviations of the average. The standard deviation in the case of the 2008 Astros was 2.41 runs. So, in theory, 95% of games would have a run differential up to 3.39 + (2 x 2.41) = 8.22 runs.
So I went through the Astros games and removed all the games where the run differential was 9 runs or more. In a 161 game season, this ended up being 8 games, or 4.9% of the season (interestingly, the Astros only had a single blowout win and the remaining seven blowouts were losses). In the remaining 153 games, the Astros scored 688 runs and gave up 659 runs. Although in real games, the Astros scored fewer runs than they gave up, if we remove the 5% of blowout games, we find that the numbers are reversed. This suggests that the blowout games had a truly significant distortion effect on the Astros' pythagorean record.
When you plug these numbers into the standard Pythagorean Win Formula, you get a win percentage of 52.4%. Over the course of a 161-game season, you get a win loss record of 84-77. This is much much closer to the Astros actual record of 86-75. I’m not sure if this method would work for all teams, but it certainly worked here. I'll leave it to others to see if this method can be extended to the full league.
24 comments
|
2 recs |
Do you like this story?
Comments
Perhaps
the better thing to do is to take those 8outliers as given, and calculate the record based on the remaining 153 games the Astros played last season. You want to throw out the outliers so you can accurately eliminatethe “luck” (or whatever it is) from those “standard” games, but you don’t want to take away those games from the record.
so, under that method
the Astros “should” have gone 80-73 in those middle 95% of games.
The outlier blowouts were:
05/25 – L to PHI
06/01 – L to MIL
06/15 – L to NYY
07/11 – L to WAS (this exercise is really bringing back some bad memories)
08/10 – W to CIN
08/12 – W to SFG
08/15 – L to ARI
09/16 – L to FLA
Then the adjusted Pythag would be 82-79. I think that looks about right.
It would be interesting to know how this exercise would end up...
if you did it for every team last year. Would it produce closer results to actual records overall…who knows? I don’t intend to take the time to do it…nor am I suggesting that you should spend your time on it. But maybe some enterprising blogger out there might pick up the idea.
An interesting point about the blow outs is that a manager’s action can have a significant effect on the ultimate run differential. Cooper was prone to leave starting pitchers in to take a beating, presumably in order to save the bullpen. That may or may not be good strategy, but it definitely affects the margins. A manager who decides to put a position player in to pitch in a blow out might have the same effect. Another manager (perhaps the manager of a high scoring team) might be less inclined to throw in the towel and will still bring in his better relief pitchers to try to keep the differential within striking distance. The Pythag implicitly assumes that the manager acts in some sort of “average” fashion in those blow outs. In any event, I put less confidence in drawing conclusions from blow outs which are dependent on managers’ predilections.
So I did it for three more teams
- The 2007 Arizona Diamondbacks, who, like the 2008 Astros, really outperformed their Pythagorean formula with a little help from Jose Valverde,
- The 2003 San Francisco Giants, a 100-win team, just to see how this works on a good team, AND
- The 2005 New Yok Mets, who underperformed their pythagorean record
For the 2007 D-backs:
Actual Record: 90-72
Pythagoras: 79-83
Blowout definition: again, it’s a run differential of 9 or higher
Removed 9 games: 2 W, 7 L
Adjusted Pythagoras: 85-77
For the 2003 Giants:
Actual Record: 100-61
Pythagoras: 94-67
Blowout definition: 8 runs
Removed 9 games: 5 wins, 3 losses
Adjusted Pythagoras: 93-68
And For the 2005 Mets
Actual Record: 83-79
Pythagoras: 90-72
Blowout definition: 9 runs or more
Removed 9 games: 7W, 2 L
Adjusted Pythagoras: 83-79 (!)
Conclusions from this relatively small sample:
- Stripping out the blowouts before you use the Pythagorean record gets you closer to the real record than if you don’t
- It seems like the threshold run differential for a blowout win/loss should be around 9 games, though for better teams, the threshold could be around 8. It remains to be seen what happens if you apply this to a full league, but my guess is that it stays in that neighborhood. This could mean that this sort of Adjusted Pythagoras could be readily applied throughout the season in calculating “power rankings” with a better effect than the regular pythagorean formula.
by AstroAndy on Apr 6, 2009 4:38 PM CDT up reply actions 1 recs
Damn
Great work. I think you’re on to something.
The Crawfishboxes
A good friend of mine used to say, "This is a very simple game. You throw the ball, you catch the ball, you hit the ball. Sometimes you win, sometimes you lose, sometimes it rains." Think about that for a while.
by Stephen Higdon on Apr 6, 2009 4:52 PM CDT up reply actions
I've really got to study now
But if I were to keep working on this, I’d find a way to automate it. I don’t know of any way to do this off the top of my head and it takes me about 5 minutes to do a team. Do you think someone at BeyondtheBoxscore might be able to figure out a way to automate this?
Another question I’d like to see answered is whether there’s a measurable correlation between Pythagorean record and number of blowout wins/losses.
Anyway, if this catches on, the formula needs a name…I’m suggesting The Pythagorastro Formula.
I posted a fan shot of this there in the hopes of getting some attention for it
I’ll email Sky to see if someone on his team wants to take a crack at it.
The Crawfishboxes
A good friend of mine used to say, "This is a very simple game. You throw the ball, you catch the ball, you hit the ball. Sometimes you win, sometimes you lose, sometimes it rains." Think about that for a while.
by Stephen Higdon on Apr 6, 2009 5:03 PM CDT up reply actions
The 2003 Giants are interesting
I re-ran the numbers on them using 9 runs as the blowout definition instead of 8, to make it consistent and comparable to the numbers I got on the Astros, Mets, and D-Backs. That lands you at the exact same record as the regular pythagorean formula: 94-67.
So it’s interesting that when the teams are a little more middle-of-the-pack, the Adjusted Pythag seems to get you closer to the actual record. But when it’s a really good team, both Pythagorean systems seem to be unable to predict the additional wins.
Another Idea
It seems a bit unfair to just eliminate the blowout games. The fact is, they still happened and I think the argument for removing them is that they got out of hand and might not have been played like a normal game.
Another possibility would be to reduce the run differential of the blowout games to the 2 SD threshold. This just makes more sense to me than just forgetting they happened.
yes, that is a good point.
I’m not sure if that is the proper adjustment or not….I would have to think about that. But your point has merit.
Interesting idea
however, I think you’d only end up trimming a small handful of runs off either the runs scored or runs allowed column and would not actually get results much different than the standard Pythagorean formula. My goal was to improve on the Pythagorean Formula by coming up with something that would take Runs Scored and Runs Allowed as inputs and just look at what was usual and stripping out the unusual games…I don’t think there’s really much “unfair” about it.
Capping the run differential could work, but we’d have to establish a lower run differential threshold…6 runs, for instance.
My other thought as I was working on this is that every team is going to have roughly 8-10 “blowout” games a year, and it’s possible that the distribution of wins and losses among those blowouts is random. Most teams in blowouts will have a record like 4W-4L, 5W-3L, and so the blowout wins will generally balance out the blowout losses when it comes to runs scored/allowed, the inputs for pythagoras. But when you have uneven distribution of blowout wins and losses, say 2W-7L, you end up with 5 blowout losses that aren’t balanced out by blowout wins and at 9 runs apiece, you end up having a disproportionate effect on the ratio of runs allowed to runs scored…and this leads to teams baffling commentators by “defying Pythagoras”.
Here’s another way to think about it: If the effect of a normal game on the pythagoras formula is 1X, blowout games have an effect like 5X, even though the real standings only gain 1 win/loss regardless of blowout status. This just sets the effect of blowout games at 0. A perfect world would have it at 1X, but we can get close enough by just setting it at 0. After typing that out, I realize that this example probably doesn’t clarify much.
The goal here is to get a rough but accurate prediction of end-of-season wins and losses from simple inputs (runs scored, runs allowed). Whether or not it’s fair, I think you’ll probably get a good enough result from looking at 95% of the games that are relatively normal and completely disregarding the 5% that are unusual. You could probably find a way to weight those blowout games to get a more accurate prediction of end-of-season standings, but I think this way gets you close enough.
Point taken
I guess what I’m getting hung up on is that this technique excludes data in a somewhat arbitrary manner. I know its based on standard deviations, but the fact is if you lose by 8 runs, it really hurts you, but if you lose by 9, its a total wash.
I completely understand wanting to account for blowouts, but I think it might take a more advanced method to really get to the place we want to be.
The choice of excluding data beyond two standard deviations was a little bit arbitrary. I could have set it at 2.5 standard deviations (around 10 or 11 runs) and only excluded 1% of the games or I could have set it at 1.645 standard deviations and excluded a full 10%. I did have to just make a decision on that, so it is arbitrary in that regard.
But I had a rationale for choosing 2 standard deviations. Only excluding 1% of the population wouldn’t give me results that were noticeably different from the Pythagorean formula and excluding a full 10% seemed to me to be cutting off too big of a portion of our sample. So it wasn’t that arbitrary.
In this case, though, I think it’s fair to say that the proof is in the pudding. The formula that used the scores for all the Astros games predicted a 77-84 record. The record that my method predicted was 84-77. Their real record was 86-75. My method got a lot closer than the basic Pythagorean method.
There are other methods that try to correct for blowouts by assigning different weights to the runs allowed and runs scored in the formula. They may do a better job of predicting records than my method. I think these other methods have two distinct disadvantages.
First, they’re hard to do with elementary math skills and pen and paper. Have you ever tried to take something to the 2.78th power? While I appreciate the power of databases, my computer computational skills don’t allow me to do things like factor in a dynamic run-scoring system (like the Pythagenport system does). When I was a kid, I learned a lot of basic math from playing with the numbers on the backs of my baseball cards…as much as I respect things like VORP and WARP, I have to have others calculate them for me. This is a method that a kid with some advanced high-school math can both do and understand.
Second, though other formulas may be more accurate, it’s hard for more casual baseball stat fans (like me) to draw meaning from things like X = .45 + 1.5 * log10 ((rs+ra)/g). I wanted to take a different tack and use the simple standard pythagorean formula, but make corrections that can be tied to a readily understandable phenomenon. I used to be a physicist, and most physicists tend to find a certain kind of beauty in simple equations. My method relies on a simple premise that often pops up on message boards…we know that blowouts distort things, what happens if we cut those games out?
Sorry for the long, philosophical posts…this stuff can really light a fire in me sometimes.
Baseball season is here! Rejoice!
Updated!
I’m throwing this comment in here because it’s not obvious from the FanPost sidebar that anything changed
You sold yourself short, IMO
Arguably, your revision is an improvement, based on what you show for 2008—albeit perhaps a minor improvement.
On average your reconstructed pythag is slightly closer to actual results. The average difference between the standard pythag and the actual is 3.4. The average difference between the reconstructed pythag and the actual is 3.3. OK, it’s slight, but is an improvement.
Moreover, I am impressed by the number of times that the reconstructed pythag hit the actual record on the nose. The reconstructed pythag hit the actual record 5 times. The standard pythag hit the actual record zero times.
The standard pythag is better than the reconstructed pythag 13 times, and the reconstructed pythag is better than the actual pythag 13 times. (The remaining times, the difference is the same.) That’s a tie. But it certainly doesn’t support the idea that the standard pythag is better than your version.
My conclusion is that your revision has some merit and is recognizing something which is problematic in the standard pythag. Perhaps the exact way you have treated the issue is not the optimal solution, but maybe it points to an area of the pythag which can be improved. Based on what you have done, I am more convinced that there are some weaknesses to the standard pythag record as “predictor.”
Wow, thanks clack
I guess this is why people publish their results…others might see more in them than the original researcher did. I definitely had a bit of analysis fatigue after spending so much time in excel. I had pretty much seen that I got closer as often as I got farther away and stopped there.
The Braves were a really interesting case to me. Somehow, they managed to severely underperform their Pythagorean expectations despite having 5 big blowout losses (including one game they lost 3-18) and only one blowout win. Usually we would expect overperformance with that kind of distribution. I took a second look at the numbers and it’s still not really clear what’s going on there.
An interesting tidbit I found: This Bill James paper that suggest that, historically, teams that outperform their pythag expectations do better on average the following year than teams with similar run production but who did not outperform their pythag expectations. Possibly good news for the ’Stros!
Before I comment
let me preface my remarks by noting that I have a graduate degree in statistics, which makes me armed and dangerous. Read on at your own amusement.
From a purely mathematical point of view, the ONLY difference between any Pythagorean analyis and reality lies exclusively in the difference in the distribution of margins of victory assumed in the basic theorem and the actual distribution encountered in any real situation. To reconcile any real result with the base theorem would require analyzing the difference between each of the margins of victory (and their number of occurrences) in the real situation vs that assumed in the basic Pythagorean theorem.
The only problem in doing the reconciliation is that it is likely impossible to learn the the real distribution because (a) James’ original work encompasses a span of many years and (b) he refined the correlation between actual results and historical by tweaking the exponentials rather than by examining the real distributions underlying them.
Since the only variables considered are runs scored, there can’t be any other influences which need examination. Your first step of eliminating outliers (blowouts) is a logical one,but, in reality, the base theorem assumes a few outliers (but we will never know how many) in its distribution. To be exactly accurate, we would have to exclude all the outliers beyond those assumed in the base theorem.
Because of all the above factors it is logically correct to assume:
(1) A team wining more (or less) than expected most likely had less (or more) blowout losses than were expected by the original theorem.
(2) A team winning less (or more) than expected most likely had more (or less) close wins than expected by the original theorem.
Also, because the distribution of actual scoring changes through the years, it is necessary to modify the exponent in the Pythagorean theorem from time to time to maximize the correlation between expected results and actual. This explains (and I believe is the only possible explanation) for the several changes in the Pythagorean exponent that have occurred in the past.
It takes more than pitching to win a pennant, but not much!
School is in session!
Great points there….I had definitely not considered the fact that the outliers were factored into the base theorem.
Do you know if anyone has done any work to identify what is a “normal” amount of blowout losses or close wins?
Sorry
but I don’t know – tried a brief Google search and couldn’t find anything. Seems like the data should be available somewhere.
Just a brief heads up – if the distributions of runs scored are normal in the long run, and likewise with runs allowed, it does not follow (in fact it is unlikely) that the distributions of margins of victory would be normal. This means of course that the tools of analysis for normal distributions (means, standard deviations, etc.) can’t be employed in sampling or analyzing data from them.
Another thought as to what constitutes a blowout. Suppose we say a blowout is any loss by more than 5 runs. There are some games where a team gets ahead by, say, 7 runs and then takes out some of its starters and winds up winning by 3 or 4. That feels like a blowout, just as a game where a team scores 6 runs in the top of the 15th inning and goes on to win by that margin doesn’t really seem like a blowout. Interesting considerations.
It takes more than pitching to win a pennant, but not much!
Even though I disagreed with your methodology
It is never a waste of time to propose intelligent, well-thought proposals. It’s how advances are made.
Ideally, I would have liked to have weighted the run differentials somehow so that I didn’t just hack off 8 games per team per season. That would have required me to do a lot more math than I wanted to do…this has basically been my final procrastination project of the semester, so I’ll be buckling down and doing work here shortly.


























