clock menu more-arrow no yes mobile

Filed under:

Astros Sabermetrics: K% and Lineup Building

Historical data shows that a high team strikeout rate leads to a lower number of runs scored. The Astros seem to recognize this as they slowly rebuild their organization with more contact-oriented hitters.

Domingo Santana's strikeout woes may not make him a bad player.  But they may make the Astros' lineup worse anyway.
Domingo Santana's strikeout woes may not make him a bad player. But they may make the Astros' lineup worse anyway.
Troy Taormina-USA TODAY Sports

While many bytes have been exercised explaining why, from a sabermetric standpoint, high-strikeout batters are not necessarily bad if they counteract their K totals with other elite skills, all of the most-cited articles focus on the batter’s individual offensive contribution.  From an Astros’ standpoint, the ideas explained by those noble researchers are being validated before our eyes with the offensive surge provided by Chris Carter and the emergence of George Springer as a budding star.

Carter boasts a 27% strikeout rate since the All Star Break, but with an MVP-level .298/.349/.649 slash line and 176 wRC+ that places him 2nd-place in the Majors during that time, behind only a guitar-slinging Carlos Santana of the Indians.  Before going down with his injury, rookie Springer boasted a 126 wRC+ with 20 home runs in only 78 games, despite his 33% strikeout rate.  Both of these players balance their whiffing ways with elite power and plate discipline.

What goes unsaid is, just because individual players can overcome high strikeout rates to become above-average individual performers, that same philosophy does not automatically apply to an entire lineup.

On July 1, 2014, the Astros trotted out a lineup consisting of the following:  (K rates are approximate)

Player - (K%)

Jose Altuve 8%
Alex Presley 15%
George Springer 30%
Jon Singleton 30%
Matt Dominguez 20%
Jason Castro 30%
Chris Carter 35%
Domingo Santana 30%
Marwin Gonzalez 20%

One wonders if the high strikeout rates of this lineup would, over the long haul, prevent sustained rallies from taking place because of a decreased number of balls put in play that might land for hits.  Intuitively, it seems like that would be the case, but then intuitively for decades people considered high-strikeout players to be inferior to low-strikeout ones in all cases.  Many still do, regardless of other offensive contribution.

Astros GM Jeff Luhnow himself expressed his preference by mentioning in an interview a couple months ago that he wanted to move the lineup away from the "feast or famine approach" that it currently features.  His recent draft choices and trade targets for the most part have spoken towards a contact- and discipline-oriented philosophy.

Nitty Gritty

To explore whether or not team strikeout rate influences a club's ability to score runs, one must first deal with the changing offensive environment in baseball.  In 1962, MLB's average team strikeout rate was 14%.  In 2014, it sits at 20.3%, the highest number in history.  So it is not enough to say, "In 1962, the average MLB club scored 723 runs with a low strikeout rate, but in 2013 with a high strikeout rate, the average runs scored was 675."  It's just not that simple.

To find out the impact, a good way to start is to assume a normal distribution of team strikeout rates and team runs scored for many years in a row.  For this post, the data examined was the time period between 1962 and 2014.  For example, in 2013, the average MLB club scored 675 runs, with a standard deviation of 72.5 runs, but with a 19.9% average strikeout rate, with a standard deviation of 1.75%.

From there each club can be evaluated in terms of how many standard deviations above or below average they are in both strikeout rate and runs scored.  The image below explains a normal distribution curve in terms of standard deviations (SD).  A club that is 1 SD worse than average in K% would be said to be in the 16th percentile; that is, 84% of clubs will have a lower (better) strikeout rate than that club.  This tool allows one to compare strikeout rates and runs scored across eras of different offensive environments.



What was discovered when examining all MLB clubs over 53 years is summarized in the table below:


For clarity, because it's hard to wrap one's head around standard deviations in terms of baseball runs and strikeouts, here are those same numbers, only applied to the 2013 baseball run environment.


From this, it can be concluded that there is a clear correlation* between a lineup's collective strikeout rate and it's ability to score runs.  It is also useful to note that the counts of clubs that fall in the ranges listed validate the assumption of a normal distribution.

* The author does not claim to be a statistician, and would need help to define the actual R-squared correlation between the two stats.  It's enough for him, being a "big picture" guy, to note that there is one.

Trend over Time

This data sample shows the average relation between team strikeout rates and runs scored over 53 seasons.  Although the K%/RS correlation seems to hold up over time, it can fluctuate a bit over short time periods of a few seasons:


Clubs with strikeout rates 1 to 2 standard deviations worse than average have consistently scored fewer runs than average clubs over time.  Weirdly, clubs with extremely low strikeout rates have fluctuated the most.  Glancing at how some of the extreme-low K-rate clubs were composed from 2005 to 2008 explains some of this.  From Fangraphs, here are the clubs with the lowest strikeout rates, with the lowest-scoring clubs highlighted:


It so happened at that time, that the highlighted clubs contained an unusually high number of contact-oriented players who struck out seldom, but who otherwise provided little offensive value (wRC+ lower than 100).  These players included a who's-who of offensive lightweights, all of whom amassed over 300 plate appearances per year for their clubs:  Luis Castillo (twice), Jason Bartlett, Nick Punto, Jason Tyner, Mike Redmond, Juan Pierre (twice), Rafael Furcal, Russell Martin, Old Luis Gonzalez, Andre Ethier, Old Nomar Garciaparra, James Loney, Ichiro Suzuki, Jose Lopez, Yuniesky Betancourt, Kenji Johjima, Jose Vidro, Jeremy Reed, Randy Winn, Pedro Feliz (twice), Omar Vizquel (twice), Ray Durham (twice), Bengie Molina, Dave Roberts, Ryan Klesko, Rich Aurilia, Luis Matos, Javy Lopez, B.J. Surhoff, Old Mike Lowell, Paul Lo Duca, Alex Gonzalez, Jeff Conine, Damion Easley, J.T. Snow, Edgardo Alfonso, Jason Ellison, and Lance Niekro.

Safe to say that those guys did not appear on most fantasy rosters during those seasons, despite the large number of plate appearances they gathered.

More importantly than the weird drop at that time when extremely low strikeout clubs actually scored fewer runs than just-kinda-low strikeout clubs, the trend over time shows that the less a lineup strikes out, the more runs they score.

However, it's a case of diminishing returns.  A club may get down to an 8% strikeout rate, but they won't suddenly score 900 runs because of it.  There are too many other factors involved with scoring runs for that to happen.  However, dismissing the impact of strikeouts on lineup construction would be to ignore historical scoring trends.

And the Astros?

The 2014 Astros, particularly with a lineup such as the one posted above from July 1st, fall into the category of "between 2 and 3 standard deviations worse than average K%," a category that was excluded from the analysis due to the extremely small sample of clubs that actually qualified for it.  A reasonable extrapolation of the data posted above would be to assume that the Astros would be somewhat less successful than average in scoring runs.

If one used the league average so far this season of 509 runs scored per team and the standard deviation of 40.4 runs, math show that the Astros are doing a little better than their expected scoring.  At 501 runs, they are beating their projection of 490 runs (based only on the K%/RS standard deviation stuff above, not other factors) by 11 runs, and are in fact only .2 standard deviations below the MLB average right now.  Some possible reasons for this include Carter and Springer's abilities to overcome their strikeouts to create runs, recent improvements by Grossman, replacement of Jonathan Villar by the Marwin Gonzalez / Gregorio Petit combo (both around 20% K-Rate or better), and the addition of Jake Marisnick (13%).  In other words, changes in lineup construction has helped boost the Astros into being a better-scoring team than they were to start the season.  (3.8 runs per game before the All-Star Break, 4.7 runs per game since)

Just for fun, here are some lineup projections including only players currently in the Astros' system, their projected strikeout rates, and how it will affect their runs scored in terms of standard deviations.


This exercise is barely worth the ink that you will print it on to remember this post forever, because so much will change between today and 2015, much less 2017, and because it doesn't account for the differences in overall talent of the players added or replaced.  But it is one way for those familiar with the Astros system to understand how changes in lineup construction might affect the Astros' ability to score runs, based on comparison with historical data.


For the Astros, it seems that Luhnow may be on to something.  The lineup, as currently constructed, places the Astros at a disadvantage compared to other teams because of the rate at which it collectively K's.  The Astros are already making moves to improve in this area, particularly in the highest-profile minor leaguers. The implied change in philosophy makes one wonder if prospect wunderkind Domingo Santana truly figures into the club's long-range plans, or if he will be moved in trade so as to not compound the Astros' already epic-level K rate. Carlos Correa (15%), Rio Ruiz (20%), Jake Marisnick (15%), Preston Tucker (20%), Colin Moran (15%), Conrad Gregor (15%), Andrew Aplin (13%), Tony Kemp (12%), Delino DeShields (20%), and Tyler Heineman (12%) are figure to get significant playing time in the major leagues at some point, and represent a significant decrease in club strikeouts compared to most of the players who comprise the major league roster in 2014.

The key idea to take away from this post is that while an individual player may be able to overcome strikeout woes to be an above-average offensive producer, historical data shows that the same does not hold true when applied to entire lineups.  A club that ranks in the bottom 3rd-percentile in strikeout rate can historically expect to score as many as 35 runs per season fewer than an average MLB club in today's scoring environment.  A club that ranks in the top 3rd-percentile in strikeout rate can expect to score as many as 35 runs per season more than the average MLB club.

Is a high strikeout rate bad for an individual batter?  It depends.  Is a high strikeout rate bad for an entire lineup?  Most assuredly, yes.