BDD Recent Archive arrow BDD Recent Archive arrow The Golden Age of Baseball Analysis
Tuesday, 09 February 2010
Search our site
Syndicate
The Front Office
BDD Recent Archive
Joe Hamrahi's Archive
Staff Writers' Archive
BDD Poll of the Week
The Golden Age of Baseball Analysis PDF Print E-mail
Contributed by Rob McQuown   
Thursday, 25 January 2007
We are living in an amazing era in the history of baseball.  And this is nowhere more evident than in the advances made in the field of baseball analysis.  For reference, let me take you back to a time not so long ago, to when I read my first Bill James book – 1986.  Mr. James was in top form, writing about all things baseball, one article per team, in his annual Abstract.  The team for which he’d always rooted, the Kansas City Royals, had just won the World Series, and he was writing cutting-edge baseball research material and people were paying to read it!  He was writing about exciting stuff such as lefty-righty splits and ballpark effects, and how Darryl Strawberry could still be helping a team while hitting .220!  He was playing with his “Favorite Toy”, computing “Runs Created”, and even talking about how – gasp – minor league stats could have meaningful implications to major-league performance!

I have always loved math.  And baseball.  And yet it had never occurred to me to analyze baseball as if it were a “science” before reading Bill James.  Sure, Bill “stood on the shoulders” of some of the people who came before him in the world of analysis, such as Palmer and Thorn.  But Bill James is a great writer, and his love for the topic was infectious, even through the written word.  I ate it up.  I couldn’t wait for his books to come out each year.  I started playing simulated baseball games (primarily Strat-O-matic), where I could see how – even in a very imperfect model of the game – some of the basic truths that Bill had presented worked themselves out. 

Fast-forwarding to December of 1991, I graduated from college with my engineering degree.  Not really feeling called to go work as an engineer right away, I answered an ad from the STATS scoring network’s newsletter that read, “The Only Programs I Know are the Ones on TV.”  Figuring that was an odd caption for a job offering for a programming job, I applied nonetheless (or maybe because of it).  Arriving in Lincolnwood, IL, in time for the baseball season of 1992, I became a proud member of the STATS “family”.  And, while the scoring network had been built up to hundreds of reporters, there were only 6 full-time employees before I joined, so this wasn’t exactly a huge corporation.  But for a math and baseball lover as I was, getting full access to the STATS data was like heaven!

See, from the early days of his writing, Bill James had a vision that better data would be collected.  With John Dewan, he started STATS, Inc., and made it happen.  By 1987, every single pitch was being tracked, along with the batted ball direction, distance, type, and whether it was hit hard, soft, or “medium” (all as estimated by the reporter), and some other information above and beyond the old-time box scores.  There were three versions of every single game tracked for quality assurance.  That was the “scoring network” that James had envisioned.  And this system was to change the entire expectation of what data was available for baseball research! 

I stayed at STATS for 4 baseball seasons, before going my merry way as part of the dot-com explosion in 1995.  During that time, in between my “on the clock” work, I would “explore” some of the cutting-edge research that was going on.  There wasn’t much free time, but I would work up run-expectancy and win-expectancy matrices for various environments, figure out various baserunning break-even points, depending on game situation and environment, do my own set of MLEs, which Steve Moyer would ask for each season to help in his Rotisserie draft, etc. etc.  But I was always thinking and wondering about what would be possible if teams started actually paying attention to the broader premises of performance-based analysis.  Sure, some teams bought information from STATS.  And every week, I’d whip up a report for Roland Hemond, then of the Orioles, with a section for the manager, a section for the pitching coach, and a section for the hitting coach.  But those were mostly “team trends” for their opponents that week, such as on which counts they preferred to attempt steals.  Anyway, I left this fun environment after the 1995 baseball season.

I really had “shelved” my baseball fandom for about 8 years, until getting back it in 2003.  Frantically playing catch-up since then, since so much has been published, has been eye-opening.  So many exceedingly smart people are dedicating much of their time and energy toward baseball research these days.  And it’s clear that baseball analysis has followed the tendency of most other information on the Web.  A decade ago, it was very difficult to find research on something because so little research had been done.  Now, it’s sometimes very difficult to find research on something because so much research has been done!  But in that, it dawned on me in some ways, we’ve entered a “Golden Age” of baseball analysis, where everyone’s getting into the picture, and the dross is being removed as the ore is purified.  Consider some areas:

“A Walk is as Good as a Hit”

Well, most times, this isn’t really true, of course.  And many “throwback” managers, such as Dusty Baker and Mike Scioscia, seem to place very little importance on the abilities of hitters to control the strike zone for purposes of working a walk.  [As an aside, I find a certain irony to the fact that both these men, while players, provided a significant portion of their offensive contributions via the base on balls.].  But this was one area that Bill James was passionate about, from his earliest writings.  And mainstream baseball publications have started to transform.  Baseball America now includes “on base percentage” in their stats reports.  And “shows good patience” is a common scouting report these days, where “aggressive” usually meant “good”, back in “the day.”  Free agents are able to make millions of dollars now without having high batting averages.  Brad Wilkerson can get traded for Alfonso Soriano, and the multitude of fans appreciate that it’s not a clear “win” for the side getting Soriano, even though Wilkerson has a career batting average barely over .250.  The two “camps” haven’t fully merged on this topic quite yet, as the popular book Moneyball pointed out.  In that book, Michael Lewis described how the Oakland A’s have exploited this difference in perception as to the value of walks.

Pitcher Abuse

I remember when I first started thinking about the topic of pitcher abuse.  It was 1992, and I’d just begun to unpack the wealth of information available in the STATS, Inc. databases (now much of the same information is available to the world thanks to the amazing retrosheet.org site).  I found a stretch of 6 games from August and September, 1987, wherein Fernando Valenzuela had been allowed to throw 150+ pitches in 5 of them.  People frequently questioned Freddy’s true age, but I think that it’s clear his left arm was much “older” than the rest of his body, regardless of what year he was born.  This area of the game has seen a dramatic shift since pitch data started being commonly available less than 20 years ago.  Now, when Aaron Harang goes over 130 pitches in his last start before the All-Star Game (from which he’ll get extra rest), bloggers and talk radio callers around the country are up in arms (no pun intended)!  And since so much of a team’s ability to contend rests on the arms of their best starting pitchers, this is an area that will keep getting attention, with high-tech considerations like biomechanics, and pitch type being added to raw pitch count numbers as data to be processed and analyzed.  Smart management will keep trying to optimize the way they use their pitchers.  This results in on-field changes, even by some of the old-guard.  Look at the pitch count totals from Jim Leyland’s Tigers compared to some of the Pirates teams he managed, for example.

Defensive stats:

When I was at STATS, I considered this the Holy Grail of baseball statistical analysis.  I would look at the information the reporters would send in.  I would consider the biases of the individual reporters.  I would manipulate the data.  I would read reports my co-workers would write with the same data, insisting that Player X was an outstanding defender, based on the data at hand.  I would write some such reports myself, and be somewhat convinced in their veracity.  But it was tough, because there was almost no correlation between the players who were known to be good defenders by “scouts” and those who showed up as good defenders on the “Zone Ratings”, or whatever other metrics one might have composed based on the direction, distance, and velocity data.  So, who was right? 

Well, whenever I explained Zone Rating to someone, I was re-convinced that it was a reasonable metric.  How could a player who gets to 85% of the balls hit near him be a better defender than a guy who gets to 90%?   Well, first off, it helps to know how Zone Rating works, and Chris Dial at The Hardball Times wrote a very lucid and succinct explanation.  In short, the field is broken up into 3.75-degree “pie slices”, and each batted ball is estimated as being in one of those slices, and the distance it travels is also estimated by the scorer.  Having scored a lot of games myself, I know that it’s far from a science, much less figuring out whether to call something a “line drive” or a “fly ball”, and figuring out the distance measure (to within 10’).  Then, consider that half the impact on a player’s zone rating is likely to come from a very small number of hometown scorers (often one person would score all or most of the games for a team).  And then, there’s the question of whether a ball has been hit “hard”, “medium”, or “soft”.  It’s all very imprecise, and subject to much bias, as with umpires and strike zones.  So, in my intellectual curiosity with defensive stats, I finally arrived at a place of considering them as an input, but not thinking that I’d ever been completely satisfied until the physics of the ball were able to be tracked, meaning the batted ball’s velocity, direction, and rotation, along with the initial placement of the fielders. 

After the 2005 season, a big step forward was taken.  John Dewan, now with a company he started called Baseball Information Solutions (after STATS was sold to Fox Sports years ago), published the results of a fielding data project that BIS had been working on, called The Fielding Bible.  In order to compile the data used in the book, BIS created a “video analysis” department.  The BIS video analysts reviewed each play on video, removing some of the bias, and adding (this year)  “fliners” as another batted ball type.  A step up over the older Zone Rating, to be sure, but still not quite as precise as I’d like to see before I’m fully convinced.

Then, I read in The Hardball Times Baseball Annual 2007 that a baseball fan with an engineering background started something called “hit tracker” (www.hittracker.com) to get the speed of the batted ball, as well as the horizontal and vertical trajectories when it’s hit.  Still, this involves “spotters” providing data, so there’s a lack of precision.  But the day cannot be too far away when at least one team will pay to have the appropriate video equipment and software installed in their ballpark to give them the best information possible on batted balls, and hence the best information possible on defensive range.

A Good Bullpen Makes a Good Manager
One of Whitey Herzog's most well-known sayings was, "Give me a good bullpen, and I'll be a good manager. But give me a great bullpen, and I'll be a great manager."  Well, the original “Pythagorean Theorem” (yes, just like the one you learned in basic math) provided a relationship between runs scored, runs allowed, and winning percentage for a team.  Yet, relatively recently, it’s been shown that (obviously) runs prevented in crucial situations have a much higher impact on team winning percentage.  For relievers, this can be quantified by a “leverage” rating, which (in extremely simplified form) indicates that not all runs are created equal, and the ones that are score during the periods when closers are in the game are worth significantly more than “normal” runs allowed.  Some of the Baseball Prospectus researchers have done good research into relievers, and you can find “leverage” by reliever at their site along with the listings of how good each reliever makes his manager look (i.e. how many wins he adds).

Batting Average on Balls in Play

It’s always been known, and oft-repeated that approximately 1 ball per week going through the infield can turn a .250 hitter into a .300 hitter (25 balls per season, 500 AB = .50 batting average points).  But it took until 1999 for Voros McCracken to point out that pitchers really have a very limited range of influence on this statistic – Batting Average on Balls in Play, or BABIP for short.  Initially, he postulated that they have none, but subsequent research has shown them to have some, just much less than previously believed.  And yet still, pitchers are signing huge contracts or losing jobs based on “bad luck”.  And as for those hitters who are gaining the 50 points of batting average, those guys are cleaning up in the free agent market.  With hitters, it’s trickier to test for “fluke” BABIP results.  But both Gary Matthews, Jr. and Mark DeRosa had huge gains in their BABIP this season, and both signed lucrative free agent deals with teams that tend to value batting average very highly – the Cubs and Angels, respectively.  In the sense of “Golden Age”, it’s interesting to note that the more statistically-oriented teams weren’t involved in the bidding on these two players.  The irony to me is that this is one of those cases where the players “in the trenches” know that luck has a lot to do with their stat line, as now do the “analysts”, and yet there’s something about signing “a .300 hitter”, even when all the data indicates that he won’t do it again.

A Good, Line-drive Swing

In my initial days of crunching baseball numbers, I never could figure it out when a would read a scouting report on a player that would glowingly rave about his “line drive swing”, and then see a .240 batting average, a .300 on-base percentage, and a .370 slugging.  Yet sometimes, these players would get raved about by coaches and managers.  And sometimes they would turn things around and hit .290/.350/.470 the next season, which is good in any era.  So, what have people found recently?  Well, the folks at The Hardball Times have found that BABIP tends to correlate to the percentage of line drives hit by a ballplayer plus a constant.  So, now, Sabermetricians and traditionalists can both look at a young ballplayer like Brian Anderson and agree that while he hit just .225 in his rookie year, there’s some hope that he’ll do better with the bat.  Scouting reports can reference his “good, line-drive swing”, with which he can “drive the ball to all fields”, and expect him to pick up that average in the near future.  Meanwhile, statheads can reference his MLE’s and suggest that 2006 was an aberration.  And they (okay, “we”) can look at his “line-drive percentage” (shown as 21% on thehardballtimes.com), and his BABIP (shown as .277), and realize also that his average “should” have been about 30 points higher.  (In fact, in the second half of the season, his BABIP came more into line, and he hit .262 for the 2nd half after struggling to .174 in the first half.)  So, again, this is a place where cutting-edge analysis and scouting reports can agree. 

Stolen Bases

Finally, in the area of stealing bases, scouting and stats have helped to improve each other.  First off, teams now steal fewer bases.  The fact is that attempting steals when the chance of success is under 67% is very frequently a bad “percentage” play.  This varies by game situation a lot, and then part of the issue is figuring out what that percentage is in the first place.  In that area, “baseball men” have always claimed, “You steal the base off the pitcher.”  And now times to home by pitchers are carefully tracked, as well as release times by catchers.  And surely enough, the times for catchers vary by only a few tenths of a second, and a catcher with a “release time” 0.2 seconds better than another catcher could be considered “good”, while his counterpart could be considered “slow”.  But the point is, the times to first and the catcher release times are all tracked, and are now included in any complete scouting report.  If basestealing is the part of baseball where managers are “gambling” the most, the old “intuition” has now been replaced (or at least supplemented) with some very accurate percentages (based on the timing data) and better knowledge of “pot odds” (based on the win expectance from various game situations – one of which was discussed here). 

Will everything come together in the end, as more and more data are available?  I recently re-read The Diamond Appraised, and while Tom House and Craig Wright were being great sports about it, it seemed almost as if creatures from two different planets were trying to communicate.  Or at least people from completely different cultures with sets of mores that were contradictory.  But that book was published a long time ago, in terms of “Internet Years”, and many smart people have done lots of research on baseball-related subject matter since then.  And teams have been more open about incorporating the new findings.  Someday soon, perhaps scouting and statistics will become almost homogeneous, with the most observant people scouting for data that then informs the best algorithms designed by mathematical geniuses.  And perhaps computer video analysis will replace some of the human factors, so that “quick wrists” at the plate will be precisely quantified, as opposed to being a subjective reaction by a trained scout.  And rotational velocity on pitches as they leave the hands of pitchers will be measured and tracked?  Already, there has been talk of how Barry Zito’s ability to “paint the black” with his pitches (data publicly available through mlb’s pitch tracker) may lead him to out-perform “normal” indicators of pitcher goodness, as Glavine has done for years.

With all the news these days about the things athletes are doing to themselves to help themselves work harder, having baseball teams apply some of the lessons learned from performance analysis has to be considered “working smarter, not harder”.  The age-old dilemma of having people set aside their egos still looks like the biggest obstacle.  The “traditionalists” are resistant to change, as a group, still.  And the “stats” people can often be heard boldly decrying anyone who doesn’t agree with them as an “idiot”, or worse.  Yet, in the final analysis, the synergy of the two is very likely going to be what helps teams optimize their winning while staying within their budget.  And, when that happens, it’s golden!

Questions and comments for this article may be submitted to Rob McQuown at .  Past articles for this author can be found under “Guest Writers” at the Baseball Digest Daily site.

Last Updated ( Tuesday, 20 February 2007 )
< Previous   Next >