Fun with Similarity Scores
by Warren Menzer
<! Article Starts Here !>
And now for something completely different.
One fun stat that Bill James created was the Similarity Score, which compares the statistics of two players' seasons
or careers, with a score of 1000 being a perfect match. You start with 1000 points and subtract for differences as follows
(thanks to www.baseballreference.com for the info):
One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.
It's a simplistic comparison  it doesn't adjust at all for the overall offense of the league, or the park the player plays
in, but it's fun. You can find some pretty interesting seasons this way. In 1969, two shortstops, Larry Brown and
Freddie Patek, had nearly identical seasons:
Player  G  AB  R  H  2B  3B  HR  RBI  BB  K  SB  AVG  SLG 
Larry Brown  132  469  48  112  10  2  4  24  44  43  4  .239  .294 
Freddie Patek  147  460  48  110  9  1  5  32  53  86  15  .239  .296 
That's a similarity score of 995.
I decided to look at the starting infields of every team this century, and try to find the two infields with the most
comparable statistics. This has absolutely no point, but it's fun.
Not surprisingly, the best match came from a team who had the same starting infield in consecutive years  the 1985 and 1986
Braves:
Player  G  AB  R  H  2B  3B  HR  RBI  BB  K  SB  AVG  SLG 
Bob Horner  130  483  61  129  25  3  27  89  50  57  1  .267  .499 
Bob Horner  141  517  70  141  22  0  27  87  52  72  1  .273  .472 

Glenn Hubbard  142  439  51  102  21  0  5  39  56  54  1  .232  .314 
Glenn Hubbard  143  408  42  94  16  1  4  36  66  74  3  .230  .304 

Ken Oberkfell  134  412  30  112  19  4  3  35  51  38  1  .272  .359 
Ken Oberkfell  151  503  62  136  24  3  5  48  83  40  7  .270  .360 

Rafael Ramirez  138  568  54  141  25  4  5  58  20  63  2  .248  .333 
Rafael Ramirez  134  496  57  119  21  1  8  33  21  60  19  .240  .335 
The four similarity scores average to 983. That's pretty amazing. The Braves couldn't have been too surprised with the production
of those guys in 1986.
If you'd like a more recent example, here are the 1998 A's and the 1979 Orioles (similarity score: 976):
Player  G  AB  R  H  2B  3B  HR  RBI  BB  K  SB  AVG  SLG 
Jason Giambi  153  562  92  166  28  0  27  110  81  102  2  .295  .489 
Eddie Murray  159  606  90  179  30  2  25  99  72  78  10  .295  .475 

Scott Spiezio  114  406  54  105  19  1  9  50  44  56  1  .259  .377 
Rich Dauer  142  479  63  123  20  0  9  61  36  36  0  .257  .355 

Mike Blowers  129  409  56  97  24  2  11  71  39  116  1  .237  .386 
Doug Decinces  120  422  67  97  27  1  16  61  54  68  5  .230  .412 

Miguel Tejada  105  365  53  85  20  1  11  45  28  86  5  .233  .384 
Kiko Garcia  126  417  54  103  15  9  5  24  32  87  11  .247  .362 
The infield numbers may be similar, but that's where the similarity ends  the A's went 7488 that season, while the
Orioles went 10257.
So that's today's lesson: Similarity Scores are cool  tell all your friends! Back to your regularly scheduled programming...
<! Article Ends Here !>
