Hall of Stats

About the Hall of Stats

When the Hall of Fame ballot comes out each year, the debates start. Columnists compare today’s players to those of a “better time” and lament about the weakening of the Hall of Fame’s purity. The thing is, that “purity” doesn’t actually exist.

For every Babe Ruth, Ty Cobb, Cy Young, and Walter Johnson in the Hall of Fame there’s a Tommy McCarthy, Lloyd Waner, Jesse Haines, and Rube Marquard.

There’s a huge gap between Babe Ruth and Tommy McCarthy. How do you determine what “Hall-worthy” really is then? If you enshrine every player better than McCarthy, your Hall of Fame will have a population of 1,478. If you hold everyone to the same standards as Babe Ruth—or even Tony Gwynn—you won’t be adding Hall of Famers very often.

And what fun is a Hall of Fame if you never let anyone in?

Let’s make an example out of Alan Trammell. He’s currently struggling to get into the Hall of Fame and will likely be passed over by the BBWAA. Does he compare to Hall of Fame shortstops Honus Wagner and Cal Ripken? No, he doesn’t. But does he have to? Is he clearly better than Rabbit Maranville, Phil Rizzuto, and Travis Jackson? Yes. How about Joe Tinker, Joe Sewell, and Dave Bancroft? Yes, yes, and yes. All six are in the Hall of Fame. Alan Trammell was better than all of them.

To determine whether or not Trammell should be a Hall of Famer, we now have to consider both how much worse he is than the elite players at his position and how much better he is than some players already inducted.

It’s an absolute mess. The only way to fix it is to kick everyone out and start from scratch. So that’s what I did.

They say, “It’s the Hall of Fame, not the Hall of Stats.”

But what if it was?

The Hall of Stats removes everyone from the Hall of Fame and re-populates it based on a mathematical formula.

You decide which Hall is better…

The Formula Back to Top

The Hall of Stats is populated by a mathematical formula based on the Baseball-Reference versions of Wins Above Average (WAA) and Wins Above Replacement (WAR). WAA combines all aspects of a player’s game—hitting, pitching, baserunning, fielding, positional value, and more—and estimates how many more wins that player was worth than an average player. WAR takes that a step further and estimates how many more wins the player is worth than a replacement player. (I wrote an article with more detail about Wins Above Average vs. Wins Above Replacement.)

The precursor to the Hall of Stats was called the Hall of wWAR. wWAR stands for “weighted Wins Above Replacement”, which basically means the formula starts with WAR and applies a series of weights. wWAR is still a big part of the Hall of Stats, but it now has a completely different formula.

wWAR = adjWAR + (1.69*adjWAA)

Before I go into what adjWAR and adjWAA are (and where the 1.69 comes from), I want to explain what Hall Rating is.

Hall Rating

Hall Rating is simply wWAR expressed in a more intuitive way (you’ll see Hall Rating displayed on the Hall of Stats, but not wWAR). The Hall of Stats borderline for induction is represented by a Hall Rating of 100. This is similar to how 100 represents league average in OPS+ or wRC+.

With a Hall Rating of 402, you could say that Babe Ruth’s career was worth about four Hall of Fame careers. Meanwhile, Billy Pierce essentially sits on the Hall of Stats borderline with a Hall Rating of 101. Hall of Famer Lou Brock is not included in the Hall of Stats becuase his Hall Rating is just 71.

adjWAR (Adjusted Wins Above Replacement)

adjWAR attempts to capture the value of the player above a replacement player. It starts with a player’s WAR and undergoes a series of adjustments:

adjWAA (Adjusted Wins Above Average)

While adjWAR measures total career value, adjWAA aims to measure peak value. It begins with Wins Above Average and also undergoes some adjustments:

The 1.69

The Hall of Stats equally weighs a player’s career value (adjWAR) and peak value (adjWAA). These numbers, however, are on different scales. adjWAA is multiplied by 1.69 to adjust for this.

To get 1.69 (actually 1.6904555774852), I collected all Hall of Fame inductees (as of 2012) and divided their total adjWAR by their total adjWAA.

Similarity Scores Back to Top

Baseball-Reference uses Bill James’ similarity scores on their player pages. While Baseball-Reference and Bill James are both wonderful, I don’t think their similarity scores are all that useful.

What James’ scores show is that two players’ raw numbers were similar. Here’s an excerpt from the point system used to identify a pair of "similar" batters:

  • One point for each difference of 2 home runs.
  • One point for each difference of .001 in batting average.

The issue here is that these numbers are not adjusted for era, park, or anything else. A .300 batting average with 8 home runs in the deadball era made you a star. A player with those same numbers in the steroid era actually may have been a below average player, depending on his position.

Speaking of position, here is part of James’ positional adjustment:

  • 240 - Catcher
  • 168 - Shortstop
  • 132 - Second Base

The 240-point adjustment is applied to all players who primarily caught, regardless of the player’s time spent behind the plate or at other positions.

How We Do It

The Hall of Stats similarity scores are calculated with one thing in mind: value. We don’t care how many home runs a player hit or what his batting average was. We care how many runs above average his total offensive game was. Similarly, we don’t care what his primary position was. We care about the run value of the time he spent at each of his positions.

Our similarity scores are calculated using:

The closer a pair’s score gets to zero, the more similar the players are. Because most of the inputs are centered around league average, the better a player gets, the harder it is for him to have closely similar players. For example:

(Note: Similarity scores are currently available for all players with 1500+ plate appearances or 500+ innings pitched.)

Special thanks to Tim Vaughan (@MechanicalTim) for giving us a crash course in how to calculate similarity scores.

More About the Project Back to Top

The Team Back to Top

Note that each of these bios was written by Adam so he could gush about some of his favorite people.

Adam Darowski

Ever since introducing the Hall of wWAR in March of 2011, Adam Darowski has been obsessed with the idea of the Hall of Stats. A web designer and developer by day for PatientsLikeMe, he researches, writes, designs, and codes about baseball in his spare time (and for such sites as High Heat Stats and Beyond the Box Score. Adam tweets about baseball at @baseballtwit and everything else at @adarowski.

Jeffrey Chupp

A former co-worker of Adam’s at PatientsLikeMe, Jeffrey is currently a Rails developer for Terrible Labs. Adam and Jeffrey previously collaborated on the Red Sox Hall of wWAR, a Baseball Hack Day project. Jeffrey is simply the Babe Ruth of Rails developers. There is nobody better. You can follow Jeffrey on Twitter at @semanticart.

Michael Berkowitz

If Jeffrey is the Babe Ruth of Rails developers, then Michael is the Mike Trout. A current co-worker of Adam’s at PatientsLikeMe, Michael is also a singer/songwriter. You can follow Michael on Twitter at @hal678.

The Tech Back to Top

Adam developed the concept of the Hall of Stats, researched, crunched numbers, designed, and styled. Jeffrey took Adam’s designs and numbers and actually built the site (while helping Adam learn to be a little more self-sufficient along the way). Jeffrey also handled most of the Javascript duties. Michael took on special projects like player search, similarity scores, season stats, positional pages, and player rankings.

The site is built with Ruby on Rails, Haml, Sass, jQuery, and CoffeeScript.

Open Source

The Hall of Stats is open sourced and available on GitHub.

Data Downloads

I’ve received multiple requests to make my data available. The following files are available as a CSV:

Thank You Back to Top

Thank You to my three favorite Seans—Sean Forman (@sean_forman) of Baseball Reference, Sean Smith for originally creating this WAR framework, and Sean Lahman (@seanlahman) for his work on the Lahman Baseball Database. Also, thank you to Dan McCloskey (@_LeftField) and Sky Kalkman (@Sky_Kalkman) for letting me bounce ideas off them along the way. Thank you to the brilliant readers of High Heat Stats and Beyond the Box Score for providing wonderful feedback ever since I introduced wWAR. Finally, a huge thank you to Jeffrey and Michael (and Tim!) for helping me build the site of my dreams.