Bill James was the creator of similarity scores, which were adopted as a feature of Baseball-Reference.com's player pages. For example, Brett Myers' similarity score at B-Ref indicates that his most similar pitcher is Aaron Harang. Brooks Baseball player cards, a must-see web site for fans interested in researching pitch F/x results, now has an experimental feature that provides a similarity comparison based on Pitch F/x results from last season.
The Baseball-Reference similarity scores for pitchers are based on performance-based stats, like wins, losses, strike outs, walks, shutouts, hits allowed, etc. Similarity scores based on pitch F/x are quite different because (in theory) they are based on physical measures of "stuff" rather than performance results. The Brooks Baseball similarity scores are described as a beta feature, and are likely to change in the future as various components are refined and weighted. So, don't ascribe too much certitude to the similarity results.
However, continued improvement in the use of pitch F/x results could have many future uses, particularly in the art of projecting player performance. For example, in theory, a player's actual performance could be regressed toward the performance of pitchers with similar pitch F/x parameters, giving us more accurate projections. Also, "comparable players" are used in some projection systems to develop aging variables. Some studies have shown that pitchers with good velocity are more likely to exceed their projections than pitchers with slow stuff. This suggests that more information on pitchers' "stuff" could enhance the accuracy of current projection systems.
The new feature is described by Dan Brooks in a Hardball Times note:
The scores are generated by comparing a vector of pitch speed, frequency, release, spin angle, and spin rate using MATLAB’s knnsearch algorithm to identify neighbors. Currently, we’re presenting the top five neighbors for each pitcher.
The note goes on to invite readers to punch in pitchers' names and tell them how it is working. We hear about the new Astros' front office attempting to meld scouting reports and statistical analysis in their evaluations. Incorporating Pitch F/x into sabermetric statistical projections is another form of combining scouting-type analysis with statistical analysis.
How about looking at the most similar pitchers for Astros' pitchers? Given the early stage of this feature, think of the comparisons below as a conversation piece rather than a firm conclusion. Also, the lower the similarity score, the greater the similarity. Similarity scores above 300 appear to be relatively distant, and I ignore similarity scores a lot higher than that level. An additional caveat is that many of the Astros' pitchers have limited experience in the majors, and the small sample size may be reason for skepticism of the similarity scores.
The similarity comparison are shown after the jump.
Alexander Torres, Martin Perez, Gio Gonzalez
Torres, with the Rays, is relatively close to Wandy (186), and the Rangers' Perez and the Nationals' Gonzalez are a little more distant (227 and 257); all are LHP. Torres is a pretty nice starting pitcher prospect who was called up by the Rays last year. Perez has been a highly regarded Rangers' prospect during his minor league career (at one time, a Top 20 Baseball America prospect). Gio Gonzalez is a young lefty who had good success with the A's and was traded to the Nationals.
Livan Hernandez, Josh Tomlin, Kevin Correia
New Astros' teammate Livan Hernandez is the closest to Lyles (223), with the other two pitchers at 227. Isn't it ironic that the Astros' youngest pitcher is most similar to its oldest pitcher? All three of these RHP, like Lyles, are strike throwers who rely on control more than velocity.
Mitchell Boggs, Kyle Gibson, Alexi Ogando
Kyle Gibson, a pitching prospect for the Twins who injured his arm and underwent TJ surgery, probably should be thrown out because he has a tiny sample size (8 pitches). Boggs, a reliever-starter for the Cardinals, is most similar (136) to Norris, and he throws lots of sliders. Alexi Ogando (192), the Rangers' hard throwing starter-reliever, seems like an apt comparison to Norris; he is primarily a fastball / slider pitcher.
Gustavo Chacin, Matt Harrison
Chacin, the former Blue Jays and Astros LHP, and Harrison, the Rangers' LH starter, are actually quite distant from Happ (324 and 327), and one could equally conclude that Happ is not really very similar to other pitchers.
Jesse Crain, Scott Atchison, Jarrod Parker, Lucas Harrell, Kyle Lohse
Weiland's sample size is fairly small, since he only pitched a few games for the Red Sox; so, I listed all five most similar pitchers, even though Parker, Harrell, and Lohse have similarity scores above 300. Crain is set up relief pitcher for the White Sox. Atchison is a Red Sox relief pitcher, and Parker is a highly regarded starting pitcher prospect who was recently traded from the D-Backs to the A's. Harrell is an Astros RHP, and Lohse with the Cardinals, has a history as a journeyman mid- to low- rotation starter.
Dave Bush, Josh Tomlin, Tim Stauffer, Jordan Lyles, Charley Morton
The similarity scores range from 150 - 280. All of these pitchers are righthanders Bush has been an inconsistent, but serviceable, starter-long reliever in his journeyman career. He can be very good at times, as evidenced by three near misses of a no hitter. Tomlin, Stauffer, Lyles, and Morton are control oriented starting pitchers.
Madison Baumgartner, James Russell
Like J.A. Happ, Duke doesn't have much in the way of similar pitchers. Lefties Baumgartner and Russell have scores of 380 and 388--in other words, distant similarity. Baumgartner is a former top prospect for the Giants who has been in the San Francisco rotation for a couple of years. Russell is a former University of Texas pitcher who is a lefty reliever in the Cubs' bullpen.
Eric Hacker, Felix Hernandez
Myers also doesn't have very close comparable players. Hacker probably should be ignored, since he only pitched 5 innings for the Twins last year. Felix Hernandez is an impressive similar player, but his score is fairly distant (384).
Trevor Bell, Luke Gregerson
Brandon McCarthy, Chris Schwinden, Clay Buchholz
McCarthy, a highly regarded prospect in his younger days, is an above average starter in the A's rotation. Schwinden is a young starting pitcher who was a Mets' rookie last year. Clay Buchholz is a young Red Sox starting pitcher with top of the rotation talent. Does it seem interesting that the similar pitchers are all starting pitchers, unlike Rodriguez? The scores are, respectively, 238, 285, 326.
Zach Putnam, Jared Hughes, Carl Pavano
I will ignore Putnam and Hughes because they have pitched less than 10 innings. That leaves Pavano, the much injured and well traveled starting pitcher, who has been quite serviceable as a starter for the Twins over the last two years.
Zachary Phillips, Sergio Escalona, Ross Detwiller
These lefties have relatively close scores with Abad (91 - 206). Phillips is an Orioles' LOOGY with a small sample. Escalona is the Astros' lefty reliever who had a good year in 2011 but will undergo TJ surgery. Detwiller is a 25 year old former 1st round draft pick who has been used with reasonable success as a starting pitcher by the Nationals.
Alfredo Aceves, Jason Isringhausen
These similarity scores are so distant from Lyon (390, 418) that we probably shouldn't pay much attention to them. Acevres is a young starter-reliever for the Red Sox. Isringhausen had a long career as a closer, and made a comeback attempt last year from serious injuries.
No Similar Players
Wesley is the most dissimilar Astros pitcher. The closest player is Sean Marshall--a good lefty reliever--but his score is 1,115, which basically means that he isn't similar.
So, do you see any surprises here?hel