Glossary - Seamheads Negro Leagues Database

Glossary

Glossary Links
Baseball-Reference FanGraphs SABR Guide Wikipedia The Hardball Times
MLB.com Baseball Prospectus ESPN Sabermetrics (Wiki) Baseball-Almanac

Similarity Scores (Batters)

Intro

Similarity scores were created by Bill James and first introduced in his 1986 Abstract. This is how James describes the method:

"One of the most common arguments for any Hall of Fame candidate is the argument that Joe is comparable to Jim and Jim is in the Hall of Fame, so Joe should be, too. Similarity scores are a way of assessing the objective elements of an If-A-then-B argument."


Formula

* Normally, similarity scores use career stats. For the Negro Leagues Database, career stats per 162 games are used.

This description is taken from Bill James' book "Whatever Happened to the Hall of Fame?"....

We start with this position: that if two players had identical career totals, and if both played the same defensive position, their similarity would be scored at 1,000. For each difference between them, we subtract something.

For each difference of 20 games, subtract one point.
For each difference of 75 at bats, subtract a point.
For each difference of 10 runs scored, subtract one.
For each difference of 15 hits, subtract a point.
For each difference of 5 doubles, subtract one point.
For each difference of 4 triples, subtract one point.
For each difference of 2 home runs, subtract one point.
For each difference of 10 RBI, subtract one point.
For each difference of 25 walks, subtract one point.
For each difference of 150 strikeouts, subtract one.
For each difference of 20 stolen bases, subtract one point.
For each 1-point (.001) difference in batting average, subtract one point.
For each 2-point (.002) difference in slugging percentage, subtract one point.

We adjust for the difference in defensive position in two stages. First, we assign a "position value" for the player's primary defensive position:
Catcher 20
Shortstop 14
Second Base 11
Third Base 7
Center Field 5
Right Field 4
Left Field 3
First Base 1
DH 0

Then we figure the difference between the position value of the two players, multiply that by 12, and subtract that from the total

Baseball Gauge Calculation

One difference in this calculation is instead of looking at just each player's primary position, we find the average "position value" of each player, weighted on games played at each position.

Example

Player's games played by position:
1B: 200
3B: 500
CF: 100
RF: 800

Position value = (200 * 1 + 500 * 7 + 100 * 5 + 800 * 4) / 1600 = 4.625


If you have any questions regarding Negro Leagues statistical or biographical data, please contact gary@seamheads.com. For any other questions/comments/suggestions, please contact the web developer at BaseballGauge@gmail.com.

All biographical data, copyright 2011-2018 Gary Ashwill.

Playing statistics for 1887-1922 and 1926-1938, as well as all Cuban League games (1902-1928) and Negro League vs. Major League games (1887-1944), copyright 2011-2018 Gary Ashwill.

Playing statistics for 1923 (except Negro League vs. Major League games), copyright 2011-2018 Patrick Rock.

Playing statistics for 1933 and 1943, copyright 2013-2018 Scott Simkus.

Playing statistics for 1924-1925, 1939-1942, and 1944-1946 Negro Leagues (not including Cuban League and Negro League vs. Major League games), copyright 2011-2018 Larry Lester, Wayne Stivers, Gary Ashwill.


Defensive Regression Analysis data used here was obtained with permission from Michael Humphreys, author of Wizardry

Win Shares are calculated using the formula in the book Win Shares by Bill James