clock menu more-arrow no yes

Filed under:

Probabilistic Model of Range

New, 2 comments

The road towards knowledge tends to be a pretty winding one.  

Take this Probabilistic Model of Range stuff, which I'd never heard of 'til Friday.  

What happened, was that the Fire Joe Morgan people, who always piss me off greatly for their snarky dogmatic approach, had written something that took issue with the idea of Justin Morneau as AL MVP.

Well, what's wrong with that? you may ask.  

Nothing, on the face of it; certainly Morneau is one of your least deserving MVP's in quite some time.  And even FJM's idea that maybe Derek Jeter should have won the thing--although I heartily disagree with it--is a lot more defensible than Morneau's having taken home the trophy.

But where I got all worked up was when this Junior avatar appealed to Jeter's defense to make his argument.

On top of all this, Jeter plays a premium defensive position, one that is difficult to fill, with adequate skill. Baseball Prospectus has him at 39 fielding runs above replacement, which is actually better than adequate.

And that's where I blew up.  Just because Baseball Prospectus cooked up some cockamamie number, this means Derek Jeter is a good defensive shortstop?  

Anyone who's seen Derek Jeter play short knows that he has microscopic range to go with merely average tools.  

And if Baseball Prospectus gives him an excellent FRAR score (the number was actually 30, not the 39 quoted, for what little it's worth), then that most likely means that the algortihm they're using to calculate the number is wrong.

So my point in this long missive I then wrote to FJM was this:  Feel free to use numbers all you want in discussing a player.  Employ all the tools at your disposal.  But please make sure that there is at least some relation between the real world and the numbers you so forcefully put forward as describing it.

People, and baseball fans especially, quote numbers all the time, but very few bother using their noggin to evaluate whether the numbers they're quoting make any kind of empirical sense.  So these purported pundits will tell you with a straight face that you're more likely to score from second with one out than from third, or that, again, Derek Jeter is an asset on defense.

So at this point I wrote fellow SBNation blogger Marc Normandin with my suspicions that FRAR might not be all it was cracked up to be, and he--as one of the most open-minded and least dogmatic stats guys you could ever want to meet--pointed me toward David Pinto's Probabilistic Model of Range.  For those who might be familiar, this is Pinto's extension of the old Ultimate Zone Rating system, and for those who are not, it's a method that in Pinto's words

calculate[s] the probability of a ball being turned into an out based on six parameters:
  1. Direction of hit (a vector).
  2. The type of hit (Fly, ground, line drive, bunt).
  3. How hard the ball was hit (slow, medium, hard).
  4. The park.
  5. The handedness of the pitcher.
  6. The handedness of the batter.
For each ball in play, the program sums the probability of that ball being turned into an out, and that gives us the expected outs. Dividing that by balls in play yields expected defensive efficiency rating (DER). That is compared to the team's actual DER. A good defensive team should have a better DER than it's expected DER.

A quick look showed that Pinto's system ranked Adam Everett number one in baseball in this differential he calculates between expected and actual outmaking rates.

That not only made me happy as an Astro fan, it jibed with my knowledge of the real world, so I decided to look further at numbers that Pinto has posted at his site over the last two weeks.

After the fold (Lord this is long, already!), you'll see the infield positions with the top three, the bottom three, Gold Glove Winners, and all Astros listed.

Just to emphasize, the key number, and the way the charts will be sorted, is "difference," the difference between the plays the model says the player SHOULD have made and the plays he DID make.

Probabilistic Model of Range, Third Basemen
Model is Based on 2006 Data Only, Min. 1000 Balls in Play
Uses Distance for Fly Balls
Rk Player In
Play
Actual
Outs
Predicted
Outs
Outs/
Inplay
Predicted
Outs/Inplay
Difference
1. Joe Crede 3962 436 397.55 0.110 0.100 0.00971
2. Freddy Sanchez 2527 285 265.88 0.113 0.105 0.00757
3. Pedro Feliz 4278 420 391.93 0.098 0.092 0.00656
7. Scott Rolen* 3788 390 371.79 0.103 0.098 0.00481
9. Morgan Ensberg 2917 289 276.96 0.099 0.095 0.00413
15. Eric Chavez* 3607 362 353.27 0.100 0.098 0.00242
34. Aubrey Huff 2133 193 203.79 0.090 0.096 -0.00506
35. Aaron Boone 2748 221 235.26 0.080 0.086 -0.00519
36. Tony Batista 1354 114 124.03 0.084 0.092 -0.00741
37. Rich Aurilia 1109 101 112.09 0.091 0.101 -0.01000
*Gold Glove Winner

In a post not too terribly long ago, I had talked about maybe giving not enough credit to Morgan as a much better than average third baseman just based on having stumbled across his range factor and zone rating from 2006. Interesting to see that this PMR stuff kind of validates that.

It also reinforces the idea that most of us have that Aubrey Huff just ain't much defensively at third base. Considering that if we resigned Huff, it'd basically mean we were hoping for a comeback year from Aubrey, AND that he's pretty clearly Morgan's defensive inferior, I say if you're gonna hope for a comeback year, you might as well hope it's Morgan who has it.

Probabilistic Model of Range, Shortstops
Model is Based on 2006 Data Only, Min. 1000 Balls in Play
Uses Distance for Fly Balls
Rk Player In
Play
Actual
Outs
Predicted
Outs
Outs/
Inplay
Predicted
Outs/Inplay
Difference
1. Adam Everett 3801 500 464.88 0.132 0.122 0.00924
2. Bill Hall 3311 404 375.73 0.122 0.113 0.00854
3. Craig Counsell 2274 310 290.98 0.136 0.128 0.00836
7. Ben Zobrist 1395 173 165.55 0.124 0.119 0.00534
13. Omar Vizquel* 3974 441 430.32 0.111 0.108 0.00269
15. Jack Wilson 3485 454 447.27 0.130 0.128 0.00193
33. Derek Jeter* 4009 450 464.37 0.112 0.116 -0.00358
35. Marco Scutaro 1773 207 218.44 0.117 0.123 -0.00645
36. Felipe Lopez 4245 438 469.73 0.103 0.111 -0.00747
37. Aaron Hill 1273 140 152.71 0.110 0.120 -0.00999
*Gold Glove Winner

Yes Adam's the man (and check out the early returns on Ben Zobrist!), but Jack Wilson at number fifteen gives me some pause. Wilson is to my mind clearly the second best shortstop in the National League. Shit, on days when Adam's feeling a little under the weather, he's probably the best. So I wonder at PRM's inability to capture that for us, although I'm more than willing to cite the low number for Jeter, because it matches my predispositions so well.

Probabilistic Model of Range, Second Basemen
Model is Based on 2006 Data Only, Min. 1000 Balls in Play
Uses Distance for Fly Balls
Rk Player In
Play
Actual
Outs
Predicted
Outs
Outs/
Inplay
Predicted
Outs/Inplay
Difference
1. Tony Graffanino 1702 186 161.05 0.109 0.095 0.01466
2. Neifi Perez 1374 166 152.40 0.121 0.111 0.00990
3. Jamey Carroll 2806 396 372.04 0.141 0.133 0.00854
4. Orlando Hudson* 4128 552 520.38 0.134 0.126 0.00766
6. Mark Grudzielanek* 3595 367 344.87 0.102 0.096 0.00616
14. Chris Burke 1012 128 124.23 0.126 0.123 0.00373
30. Craig Biggio 3162 360 376.73 0.114 0.119 -0.00529
35. Jorge Cantu 2859 283 311.98 0.099 0.109 -0.01014
36. Ty Wigginton 1075 105 117.30 0.098 0.109 -0.01144
37. Todd Walker 1279 128 147.72 0.100 0.115 -0.01542
*Gold Glove Winner

Burke over Biggio, no surprise, that's good, that computes.

Probabilistic Model of Range, First Basemen
Model is Based on 2006 Data Only, Min. 1000 Balls in Play
Uses Distance for Fly Balls
Rk Player In
Play
Actual
Outs
Predicted
Outs
Outs/
Inplay
Predicted
Outs/Inplay
Difference
1. Kendry Morales 1338 124 110.53 0.093 0.083 0.01007
2. Albert Pujols* 3864 306 267.43 0.079 0.069 0.00998
3. Lance Niekro 1313 98 85.07 0.075 0.065 0.00984
22. Lance Berkman 2722 198 194.25 0.073 0.071 0.00138
26. Mark Teixeira* 4436 310 305.13 0.070 0.069 0.00110
30. Mike Lamb 1488 98 98.33 0.066 0.066 -0.00022
44. Conor Jackson 3295 231 254.95 0.070 0.077 -0.00727
45. Sean Casey 2806 168 191.82 0.060 0.068 -0.00849
46. Jason Giambi 1467 71 88.40 0.048 0.060 -0.01186
*Gold Glove Winner

Isn't Lance a better defensive first baseman than is shown here? And I HOPE that Mark Teixeira, your American League Gold Glove winner, does not play first at a level so close to that of Mike Lamb.

* * *
I'll take a look at the outifeld spots later this week, but it is worth noting that I don't present this PMR stuff as any end-all/be-all for evaluating defense. But it does look intriguing, and perhaps you--like I--can find room to incorporate some of what it says into the way you approach the idea of major league defense.