The Crawfish Boxes: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: College Football Preseason Top 25 Rankings

Baseball Statistics and You

I talk about John Sickels a lot. He runs a sibling-site here on SBNation, Minor League Ball. He also has published a prospect book for the last few years that is a gold mine of data on the minor leagues. Oh, and used to work for Bill James. That pretty much makes him a guy I'd be lucky to emulate.

Which is why I gave his recent post about sabermetrics more than a passing thought. I read all the responses to it. I read both stories over at The Hardball Times here and here. I read this one at BPro and the one over at THE BOOK. Did I mention the these two? It seems like I wasn't the only one to esteem John and to give his opinion merit.

I took my time to respond, though, because I wanted to give this more thought. Here is what I came up with. It's not the usual TCB post as it only tangentially touches on the Astros. Bear with me.

Star-divide

One of the reasons my wife hates discussing any esoteric topic with me is that I can usually see both sides and choose whichever fits my mood to argue. So, I see what Will Carroll is trying to say and agree to some extent. After all, I have been trying slowly to work more advanced metrics into my game stories for an actual, ink-on-paper news source. Imagine that? But, I also can see where Tango is coming from. Dropping everything down to the lowest common denominator often diminishes the value of the thing in the first place. By 'dumbing down' stat talk, you're not giving your audience enough credit. But, it's more than just that.

I’m about to do something that no scientific method allows; I’m going to make broad assumptions. For instance, I assume that you, the readers, enjoy my work and respect my opinions. Because of that, you expect me to be some sort of authority on what I’m saying. Basically, I should know what I’m talking about if I’m going to say anything at all. 

I try to think about that every time I write an article. Sometimes, I may ask rhetorical questions or try to get your input, but many times, I’m giving you my opinion on a story or circumstance and want that to be an informed opinion.

Essentially, David The Baseball Fan and David The Writer become two different people. As a fan, I can read the stuff that Bill James, Tom Tango and the rest of the sabermetric community are doing with interest. I can wander through lists of career Win Shares in the Historical Baseball Abstract and wonder how they got there. I can look at things like marginal run values or regression analyses and know there’s good stuff in there for me to understand the game better. Many times, though, I’m not quite there yet.

Don't get me wrong, I'm a geek. A Isaac Asimov-reading, computer programming, Star Trek geek. My skills just lie more on the programming side than they do math. I'd rather set up my excel table to do all the advanced calculations for me and worry over it just once.

Like John, I was a liberal arts major. I was only required to take two math classes in college and, thanks to a great AP Calculus teacher, tested out of both semesters. That’s right, I haven’t taken a math course since high school. I’ve never taken a statistics course. For a lot of these new metrics, I have to do some serious studying to refresh myself on what it is the math is saying.

At the same time, there is a lot of data out there that is simple and useful. Take the Pitch F/X stuff. I’ve seen some writers out there like JC Bradbury say that the Pitch F/X information hasn’t really yielded anything conclusive yet. For me, though, that couldn’t be further from the truth. I was worried that I wouldn’t be able to wrap my head around what the data meant. However, the stuff you can visualize from the info is much more accessible than I had expected.

With Pitch F/X, you can see where a pitcher throws the ball, what his pitches look like and how much movement they have, how fast he throws the ball and what his average speed is and on and on. Graphing the data turns it into something any baseball fan can understand and appreciate. Much like the GameTracker application, it’s an intuitive, graphic way to follow what’s going on. 

Those are the kinds of things I can write about. I also have found that calculating the stats myself really helps me get behind how the metric is put together. When I’m tracking minor league players, I have a spreadsheet that will calculate Runs Created, RC27, wOBA and even OPS+. That last one seems easy, but I’ve had to correct for park effects too. My minor league spreadsheet evolves a little more each season, as I get more comfortable using certain things, as I reject others as useless. Just like any good scientist.

But, ultimately, I’m required to speak about something to an audience. That's why it may take me longer to assimilate something like SIERA into my writing. Or to discuss whether I like wOBA or TrA better. They're both just a little bit over my head statistically right now, but give it time. I'll get there.

I know it's a saber-sin but David the Baseball Fan still gets excited for stuff like batting average and RBIs. A couple years ago, I was doing my minor league stat charting thing and saw this guy I hadn't heard of flirting with a .400 average for the month of April. A couple months later, I was disappointed to see Matt Cusick get sent to the Yankees for LaTroy Hawkins. That's what the stat community misses sometimes. Fans like following batting average, RBIs and home runs. It's easy to understand, there's a history to it that you miss with other stat races. No one talks about the time Barry Bonds posted an OBP of .609 like they do when Ted Williams hit .406.

At the same time, we, as analysts, should help people understand that those stats don't necessarily mark a player's talent. Batting average is influenced by things like BABiP. A better measure of how important a player is to a team is OBP and SLG. RBIs are tied more to the teammates higher in the batting order and the random chance of scoring opportunities. Making the distinction between a player's true talent and traditional stats shouldn't diminish either.

Take another minor leaguer I discovered and took quite a shine to a few years back. This guy didn't have great power or a great batting average. He did have an excellent OBP and didn't strike out much. Later that summer, he was traded to the then-Devil Rays for Aubrey Huff. Though I discovered both Ben Zobrist and Matt Cusick in similar ways, it was Zobrist's on-base skills that caught my eye. I had a hunch that Zobrist could be a good player; he turned into Zorilla. I thought Cusick's average skills were cool, but I'm under no illusions about his pro potential.

Maybe you TCB readers have come to similar conclusions. We're a pretty stat-happy group here. At the same time, it's nice to appreciate baseball's simplicity at times. I would like to hear from you, though. Leave a comment with your own love-hate relationship with statistics. What pulled you in? Where do you draw the line? Do math and baseball go hand-in-hand for you?

0 recs  |  Comment 9 comments |

Story-email Email Printer Print

Comments

Display:

Chicks dig the long ball

I had the same feeling about Zobrist.

I’ve tried to model my analysis off how Tom Verducci and Joe Posnanski. Both have a very strong understanding of advanced statistics, but still present information in an easy to read way. A lot of my understanding of advanced statistics is from the arguments people make “well this player is due to have a lower batting average this year because his BABIP was higher than normal last year.” For me reading the description of an advanced stat and then going back and looking at an example allows me to understand it better. Pitch F/X is something new for me, I’ve read all the articles all the way through, but a lot of times I’m sitting there scratching my head afterward trying to understand the information presented. I do know though that looking at it enough times I’ll get a better understanding of it, it’ll just take a little time.

by timmy_ on Mar 4, 2010 7:20 AM CST reply actions  

This
But my pet peeve is that many people who learn and adopt these stats, take them as gospel and don’t understand that the stats can, and should be, taken with a grain of salt (meaning that they should be combined with other information). I’m not pointing to the people who develop the new stats, because they generally understand the statistical limitations. But how many times do we see posters adhere to WAR numbers like they are the final word. Don’t get me wrong, I love fangraphs and I like the WAR valuations; but it comes with the fact that many readers misuse the statistic,or at least, don’t fully understand the limitations of applying the values in isolation.

by vivaelpujols on Mar 7, 2010 12:29 AM CST up reply actions  

Just to provide a contrarian view
  • The word ‘advanced’ is presumptuous and usually inflated and wrong. Complex or using sabermetric which is a sort of brand name makes more sense most often
  • I didn’t read Tango’s take (is that his real name?), but its not a matter of dumbing it down. He and Cameron and the rest of the crew routinely use unnecessarily obscure language. It makes them appear haughty and condescending, but it also serves to limit discussion and evaluation. A lot of it wouldn’t stand up to real world common sense and much of it would have no chance in a discussion with professional statisticians.
  • A lot of the writing is godawful. The valuable points and discussions are often lost because the language skill level is abysmal. How about taking a Journalism 101 course or just buy an AP Stylebook or a copy of Strunk and White at the used book store. And use language carefully and accurately. Terms like regression to the mean are usually used differently in sabermetrics than in other statistical discussions.
  • After the J 100 course take an intro to symbolic logic or poke around wikipedia or something. Conclusions are routinely stated that aren’t supported by the premises. Just asking people to state what premises they used for their conclusions often leads to personal attacks or disdain. Me, you and everybody else can benefit from plainly stating the supporting premises minus shorthand when disagreements arise.
  • The scientific method as a phrase refers to a thing. No doubt there are innumerable scientific methods, but the phrase refers to something specific. Indeed it is something that is routinely condemned by sabermetric writers.
  • BA is the main component of OBP. BABIP affects OBP. What BABIP tells you has gone through a big evolution over a short period of time. At each point in time whatever the believed meaning, it was assumed to be true. If at any of those points you make a suggestion that the player may not be reflected in that evaluation leads to the common dismissal as ignorant and unsophisticated. Retrospect is MIA.
  • The whole thing is incredibly incestuous. Buying a book like an Idiot’s Guide to Statistics, Huff’s How to Lie with Statistics or poking around non-baseball stat sites is a better way to understand the discussion IMO. Instead explanations by one part of the clique is links to the rest of the clique. When someone says, I won’t explain – go read something at “The Book” I just forget about the discussion. Discussing a lot of it is like debating religion with fundamentalists. If someone wants to argue some belief and tells me to read something at that religious bookstore nothing ever comes of it. There really is an awful lot that is just a matter of faith like the data quality of BIS or STATS. Its laughable when someone claims that they are using velocity data when its Ignatius Reilly checking a box that choose between soft, medium and hard.

by ol Pete on Mar 4, 2010 11:23 AM CST reply actions  

ol’ Pete, I think some of what you are saying goes to the arrogance/dismissive style of some people who use these stats. That is going to turn people off, and it reflects a level of belief in the statistical methods which exceed an objective understanding of the accuracy and confidence in those methods. I think there may be less of that approach now than in the past when the scouting vs. stats debate was at its acidic height. (Perhaps you may think I’m using rose colored glasses though.)

The reference to the evolution of BABIP “principles” is a good example. My suggestion is to read and understand sabermetric studies, but don’t develop a closed mind about the results. The BABIP studies have always fascinated me, but I also maintained a skepticism in the back of my mind about the rigid conclusions which were drawn. That was particularly true of DIPS theory, and now we continue to see studies finding more exceptions and nuances to the concept. I fully expect that the theory will be totally displaced at some point by a richer explanation of pitching. Perhaps pitch f/X will play a role in that process.

by clack on Mar 4, 2010 11:43 AM CST up reply actions  

I’m taking a STAT 110 course this semester and what I’ve learned in just half a semester so far has already helped me to better understand some of the statistical discussions, so I can attest to what you’re saying Pete.

by timmy_ on Mar 4, 2010 1:25 PM CST up reply actions  

I'm a sabermetric writer for THT, so I can obviously relate to some of the these comments

1) I agree that “advanced” is a poor term and it’s potentially alienating.

2) I disagree that Tango and Cameron use excessively obscure language. Cameron writes for the typical sabermetric fan, who is going to be like Sickels and not interested in all the nuances of the stats. I think that Cameron is excellent at explaining the concepts behind the stats to readers – although at times he can be a bit dogmatic. Tango is a researcher and primarily writes for his own little blog that only hard core sabermetricians visit. He has no obligation to “dumb down” his posts, as he is not writing for a general audience. When he posts elsewhere, or when he wrote The Book, it’s pretty clear that he goes out of his way to try to relate the typical sabermetric fan.

3) Some of the writing is bad, some is good – it’s no different than in mainstream writing (have you read Bill Plashke?!?). I personally take pride in my writing, and despite being primarily a researcher (not a commentary guy like Cameron or Neyer), I spend a lot of time trying to construct a flowing argument. Some guys don’t care about that as much, and simply try to get the point across quick and dirty. Saying a lot of the writing is terrible is too vague to mean anything, especially without means for comparison.

4) The point about BABIP is a good one, and I personally hate the meaning that it has taken on as a “luck stat”. All stats measure something different, and all of them have some luck and some skill involved in them. BABIP has more luck in it than K rate, but it’s not the 100% to 0% ratio that has been perpetuated by DIPS stats.

5) I agree that their is way too much faith put in certain stats (like the BIS data, or UZR estimates). When somebody says “Zobrist was better than Pujols last year”, I want to lose my mind. Zobrist was better than Pujols last year by FanGraphs WAR. That WAR includes a lot of assumptions (like Zobrist really played like a +27 run defender) that may or not be true. To quote that stat at face value without understanding it’s limitations and error bars is my biggest pet peave.

by vivaelpujols on Mar 7, 2010 12:45 AM CST up reply actions  

Significance

My main complaint about baseball statistics is the lack of confidence levels for results. How likely is a result to be some random coincidence? How can we know that two figures are statistically different? How many decimal places really matter? There are ways to tell, but you never hear about that.

Before you criticize someone, walk a mile in their shoes. If they get mad, you're a mile away AND you have their shoes.

by Caradoc on Mar 5, 2010 12:44 PM CST reply actions  

I agree completely

I’d love to see some margins of error and more attention paid to significant figures.

The Crawfishboxes
A good friend of mine used to say, "This is a very simple game. You throw the ball, you catch the ball, you hit the ball. Sometimes you win, sometimes you lose, sometimes it rains." Think about that for a while.

by Stephen Higdon on Mar 5, 2010 4:25 PM CST via mobile up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the Crawfishboxes, the SBNation blog for the Houston Astros.
Start posting about the Astros »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recent FanPosts

Astrobritrs2_small
Only one side of the coin?
Small
Barret Loux
Colevatar_small
Thank You, Astros Fans
Small
Astros Payroll
Small
Bored at work: Minor Questions for the Board
Small
Is the rebuilding over?
Adsc_0111pick_off_try_small
Yankees place Lance Berkman on 15 day DL
Colevatar_small
Question for Astros fans
Astrobritrs2_small
Is it Naive to Think Your Guy Didn't Juice?
Johns_small
MMP Appreciation

+ New FanPost All FanPosts >

TCB Tweets!

  

Current Series

NL Central Standings

W L PCT GB STRK
Cincinnati 78 55 .586 0 Won 4
St. Louis 69 62 .526 8 Lost 5
Houston 62 71 .466 16 Won 3
Milwaukee 62 71 .466 16 Lost 3
Chicago 57 77 .425 21.5 Won 1
Pittsburgh 44 89 .330 34 Lost 1

(updated 9.2.2010 at 9:47 AM CDT)

SBNation.com Recent Stories

ATLANTA - APRIL 22:  Omar Infante #4 of the Atlanta Braves celebrates after scoring against the Philiadelphia Phillies at Turner Field on April 22, 2010 in Atlanta, Georgia.  (Photo by Kevin C. Cox/Getty Images)

Albert Pujols, Joey Votto, And The Triple Crown Villains Who Plot Their Ruin

Washington Nationals' Nyjer Morgan, center, is led off the field after a brawl during the sixth inning of a baseball game against the Florida Marlins, Wednesday, Sept. 1, 2010, in Miami. (AP Photo/Wilfredo Lee) +4 updates

Nats, Marlins Brawl After Nyjer Morgan Charges Mound

Photo

Aroldis Chapman Touches 104, Earns First Win As Reds Rally Past Brewers

More from SBNation.com >


Humble Blog Managers

Godzillaemptyfridge_small Evan Hochschild

Lovelance_small Stephen Higdon

Old_school_dome_logo_small David Coleman

Editors

Nsapcs13_large_small clack

H_astros_small Subber10

Astros2_small timmy_

Astros_logo121009_small OremLK

Profphoto_small allphilla