
Fun with Similarity Scores
by Warren Menzer
<!-- Article Starts Here --!>
And now for something completely different.
One fun stat that Bill James created was the Similarity Score, which compares the statistics of two players' seasons
or careers, with a score of 1000 being a perfect match. You start with 1000 points and subtract for differences as follows
(thanks to www.baseball-reference.com for the info):
One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.
It's a simplistic comparison - it doesn't adjust at all for the overall offense of the league, or the park the player plays
in, but it's fun. You can find some pretty interesting seasons this way. In 1969, two shortstops, Larry Brown and
Freddie Patek, had nearly identical seasons:
Player | G | AB | R | H | 2B | 3B | HR | RBI | BB | K | SB | AVG | SLG |
Larry Brown | 132 | 469 | 48 | 112 | 10 | 2 | 4 | 24 | 44 | 43 | 4 | .239 | .294 |
Freddie Patek | 147 | 460 | 48 | 110 | 9 | 1 | 5 | 32 | 53 | 86 | 15 | .239 | .296 |
That's a similarity score of 995.
I decided to look at the starting infields of every team this century, and try to find the two infields with the most
comparable statistics. This has absolutely no point, but it's fun.
Not surprisingly, the best match came from a team who had the same starting infield in consecutive years - the 1985 and 1986
Braves:
Player | G | AB | R | H | 2B | 3B | HR | RBI | BB | K | SB | AVG | SLG |
Bob Horner | 130 | 483 | 61 | 129 | 25 | 3 | 27 | 89 | 50 | 57 | 1 | .267 | .499 |
Bob Horner | 141 | 517 | 70 | 141 | 22 | 0 | 27 | 87 | 52 | 72 | 1 | .273 | .472 |
 |
Glenn Hubbard | 142 | 439 | 51 | 102 | 21 | 0 | 5 | 39 | 56 | 54 | 1 | .232 | .314 |
Glenn Hubbard | 143 | 408 | 42 | 94 | 16 | 1 | 4 | 36 | 66 | 74 | 3 | .230 | .304 |
 |
Ken Oberkfell | 134 | 412 | 30 | 112 | 19 | 4 | 3 | 35 | 51 | 38 | 1 | .272 | .359 |
Ken Oberkfell | 151 | 503 | 62 | 136 | 24 | 3 | 5 | 48 | 83 | 40 | 7 | .270 | .360 |
 |
Rafael Ramirez | 138 | 568 | 54 | 141 | 25 | 4 | 5 | 58 | 20 | 63 | 2 | .248 | .333 |
Rafael Ramirez | 134 | 496 | 57 | 119 | 21 | 1 | 8 | 33 | 21 | 60 | 19 | .240 | .335 |
The four similarity scores average to 983. That's pretty amazing. The Braves couldn't have been too surprised with the production
of those guys in 1986.
If you'd like a more recent example, here are the 1998 A's and the 1979 Orioles (similarity score: 976):
Player | G | AB | R | H | 2B | 3B | HR | RBI | BB | K | SB | AVG | SLG |
Jason Giambi | 153 | 562 | 92 | 166 | 28 | 0 | 27 | 110 | 81 | 102 | 2 | .295 | .489 |
Eddie Murray | 159 | 606 | 90 | 179 | 30 | 2 | 25 | 99 | 72 | 78 | 10 | .295 | .475 |
 |
Scott Spiezio | 114 | 406 | 54 | 105 | 19 | 1 | 9 | 50 | 44 | 56 | 1 | .259 | .377 |
Rich Dauer | 142 | 479 | 63 | 123 | 20 | 0 | 9 | 61 | 36 | 36 | 0 | .257 | .355 |
 |
Mike Blowers | 129 | 409 | 56 | 97 | 24 | 2 | 11 | 71 | 39 | 116 | 1 | .237 | .386 |
Doug Decinces | 120 | 422 | 67 | 97 | 27 | 1 | 16 | 61 | 54 | 68 | 5 | .230 | .412 |
 |
Miguel Tejada | 105 | 365 | 53 | 85 | 20 | 1 | 11 | 45 | 28 | 86 | 5 | .233 | .384 |
Kiko Garcia | 126 | 417 | 54 | 103 | 15 | 9 | 5 | 24 | 32 | 87 | 11 | .247 | .362 |
The infield numbers may be similar, but that's where the similarity ends - the A's went 74-88 that season, while the
Orioles went 102-57.
So that's today's lesson: Similarity Scores are cool - tell all your friends! Back to your regularly scheduled programming...
<!-- Article Ends Here --!>
|