Fun with Similarity Scores

by Warren Menzer

<!-- Article Starts Here --!> And now for something completely different.

One fun stat that Bill James created was the Similarity Score, which compares the statistics of two players' seasons or careers, with a score of 1000 being a perfect match. You start with 1000 points and subtract for differences as follows (thanks to www.baseball-reference.com for the info):

One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.

It's a simplistic comparison - it doesn't adjust at all for the overall offense of the league, or the park the player plays in, but it's fun. You can find some pretty interesting seasons this way. In 1969, two shortstops, Larry Brown and Freddie Patek, had nearly identical seasons:

PlayerGABRH2B3BHRRBIBBKSBAVGSLG
Larry Brown1324694811210242444434.239.294
Freddie Patek1474604811091532538615.239.296

That's a similarity score of 995. I decided to look at the starting infields of every team this century, and try to find the two infields with the most comparable statistics. This has absolutely no point, but it's fun.

Not surprisingly, the best match came from a team who had the same starting infield in consecutive years - the 1985 and 1986 Braves:

PlayerGABRH2B3BHRRBIBBKSBAVGSLG
Bob Horner13048361129253278950571.267.499
Bob Horner14151770141220278752721.273.472
Glenn Hubbard1424395110221053956541.232.314
Glenn Hubbard143408429416143666743.230.304
Ken Oberkfell1344123011219433551381.272.359
Ken Oberkfell1515036213624354883407.270.360
Rafael Ramirez1385685414125455820632.248.333
Rafael Ramirez13449657119211833216019.240.335

The four similarity scores average to 983. That's pretty amazing. The Braves couldn't have been too surprised with the production of those guys in 1986.

If you'd like a more recent example, here are the 1998 A's and the 1979 Orioles (similarity score: 976):

PlayerGABRH2B3BHRRBIBBKSBAVGSLG
Jason Giambi1535629216628027110811022.295.489
Eddie Murray159606901793022599727810.295.475
Scott Spiezio1144065410519195044561.259.377
Rich Dauer1424796312320096136360.257.355
Mike Blowers12940956972421171391161.237.386
Doug Decinces1204226797271166154685.230.412
Miguel Tejada1053655385201114528865.233.384
Kiko Garcia12641754103159524328711.247.362

The infield numbers may be similar, but that's where the similarity ends - the A's went 74-88 that season, while the Orioles went 102-57.

So that's today's lesson: Similarity Scores are cool - tell all your friends! Back to your regularly scheduled programming... <!-- Article Ends Here --!>

Send Warren your opinions, comments or verbal abuse at warren@JBaseball.

© 2000-2023 JBaseball