Similarity scores were created by Bill James and first introduced in his 1986 Abstract. This is how James describes the method:
"One of the most common arguments for any Hall of Fame candidate is the argument that Joe is comparable to Jim and Jim is in the Hall of Fame, so Joe should be, too. Similarity scores are a way of assessing the objective elements of an IfAthenB argument."
* Normally, similarity scores use career stats. For the Negro Leagues Database, career stats per 162 games are used.
This description is taken from Bill James' book "Whatever Happened to the Hall of Fame?"....
We start with this position: that if two players had identical career totals, and if both played the same defensive position, their similarity would be scored at 1,000. For each difference between them, we subtract something.
For each difference of 20 games, subtract one point.
For each difference of 75 at bats, subtract a point.
For each difference of 10 runs scored, subtract one.
For each difference of 15 hits, subtract a point.
For each difference of 5 doubles, subtract one point.
For each difference of 4 triples, subtract one point.
For each difference of 2 home runs, subtract one point.
For each difference of 10 RBI, subtract one point.
For each difference of 25 walks, subtract one point.
For each difference of 150 strikeouts, subtract one.
For each difference of 20 stolen bases, subtract one point.
For each 1point (.001) difference in batting average, subtract one point.
For each 2point (.002) difference in slugging percentage, subtract one point.
We adjust for the difference in defensive position in two stages. First, we assign a "position value" for the player's primary defensive position:
Catcher 20
Shortstop 14
Second Base 11
Third Base 7
Center Field 5
Right Field 4
Left Field 3
First Base 1
DH 0
Then we figure the difference between the position value of the two players, multiply that by 12, and subtract that from the total
One difference in this calculation is instead of looking at just each player's primary position, we find the average "position value" of each player, weighted on games played at each position.
Player's games played by position:
1B: 200
3B: 500
CF: 100
RF: 800
Position value = (200 * 1 + 500 * 7 + 100 * 5 + 800 * 4) / 1600 = 4.625

