Thursday, August 24, 2006

Why We Need A Quality of Opposition Adjustment To Rank All Time Goal Scoring Seasons

One sabermetrics and hockey problem is rating the best goal scoring seasons of all time. One very good attempt at solving this problem was done by the hockey outsider (Peter Albert). He shows Brett Hull's 1990/91 86 goal season to be the best goal scoring year ever. This is a very plausible result.

When one looks further at the top 10 best goal scoring seasons of all time, it becomes clear that there are systematic problems. Nine of the ten best goal scoring seasons are shown to have occurred since 1970. None occurred in the original six years. The other year was Babe Dye's 38 goal year back in 1924/25. Seven of the ten best years on the list were within three years of an expansion. Its clear that people tended to score more adjusted goals in expansion weakened years. During the original six years, there were no expansions, so they didn't make this list.

There are three specific years that I was expecting to see on or near the top ten from the original six era. I was expecting that Maurice Richard's 50 goals in 50 games season in 1945, Bernie Geoffrion's 50 goal season tying his record in 1961 and Bobby Hull's 1966 54 goal season breaking that 50 goal mark for the first time might feature prominantly on the list. Surprisingly, none of these three years are near the top in Peter Albert's top 50 list (only Hull made it in 40th spot). There are a couple other original six seasons represented on the list. Gordie Howe's 49 goal year in 1953, Jean Beliveau's 47 goal year in 1956 and Bobby Hull's find spots on the list in 12th and 30th position respectively. How could an entire era be so underrepresented? Was there really no great goal scorer in that era or does the analysis systematically overlook them? I think the situation is that the era is overlooked. With only six teams in the NHL, there were fewer bottomfeeding teams to play against where a goal scorer could pad his totals. In more recent times, particularly during expansions, some bottomfeeder teams existed. This is not to say that the average team or average player is any worse in either era, it is a reflection of the fact that more bad players and teams have existed in the larger league.

How would one attempt to adjust for the quality of opposition? Specifically for the presence or absence of bad players and teams that one could pad their statistics against. Its not an easy question. In baseball (where sabermetrics is a much more exact science) there are a few indicators of a league quality that naturally come out of the statistics. For example, one can use the ratio of double plays to errors as a measure. In a good league (ie the majors) there are more double plays then errors, but in a beer league errors are common and double plays are rare. There isn't a clear example of a statistic that has been accurately kept since the beginning of pro hockey that can capture the quality of the league.

The best I came up with after thinking about this problem for a while is the standard deviation of the player's ages from a defined mean age for a pro hockey player. In weak years of the NHL more really young player make the league and more aging players are able to hang on longer to continue their careers after their value is depleted. In strong years, there are few young players and the aging players tend to be forced into retirement instead of having the opportunity to stick around for another year or two. This is shown in the example of Gordie Howe. He was 18 when he first made the NHL in 1946. This was a league that was beginning to grow in strength after the Second World War had depleted its ranks for a few years. Gordie played until age 43 when he retired from the Detroit Red Wings in 1971. Three years later, he was lured out of retirement to play in the weaker WHA. He continued to have successful years in the WHA until it folded in 1979. He played one more year in the NHL when it expanded to include the surviving WHA teams and retired in 1980 at age 52. These final years would likely not have happened if the NHL (and WHA) were not weakened due to the rapid expansion of the 1970's.

I think this technique of using a standard deviation in age as a proxy for league quality would fail in the early days of pro hockey. In the early days, pro players often played around 20 games and their incomes from hockey were not enough to sustain them the whole season and players had a second job that they also worked. It was not uncommon for a still talented player to retire because of pressures from the other job. Moving to a new city for hockey would interfere with their career. Maybe they were at a point in their career where taking the time off for a 20 game season was too much. So many players retired to pursue their second often more lucrative career. This keeps the standard deviation of ages low and also keeps the quality of the league low.

Although I think using the standard deviation of player's ages as a proxy for league quality is probably the best solution to try to correct data for quality of opposition, I don't think it would fully solve the problem. I think it is a lot of work for little gain. I think the problem would still exist after this attempted correction.

The list of the best goal scoring seasons that the hockey outsider produced is a good one. It is good work. Its most glaring problem is the lack of a correction for quality of opposition. This correction is not an easy one to make. It may not be possible to make it in an unbiased manner from existing statistics. I would love some smart person to prove me wrong, but I am not sure its likely. This method produced a list of players who tend to be from expansion seasons and overlooks those players from the original six era. Nevertheless, it is a pretty good attempt at solving a complex problem.

Comments:
interesting....

Season -- Age Stand. Dev.
==========================
1969-70 -- 5,22
1966-67 -- 5,17
1943-44 -- 5,17
1968-69 -- 5,1
1927-28 -- 5,1
1970-71 -- 5,1
1972-73 -- 4,9
1965-66 -- 4,87
1967-68 -- 4,87
1926-27 -- 4,8
1971-72 -- 4,79
1964-65 -- 4,67
1922-23 -- 4,66
1973-74 -- 4,63
2005-06 -- 4,61
1928-29 -- 4,58
1951-52 -- 4,58
1944-45 -- 4,56
2003-04 -- 4,51
2001-02 -- 4,48
1923-24 -- 4,47
1924-25 -- 4,45
2002-03 -- 4,45
1963-64 -- 4,44
1936-37 -- 4,43
2000-01 -- 4,42
1977-78 -- 4,41
1925-26 -- 4,39
1998-99 -- 4,39
1917-18 -- 4,39
1997-98 -- 4,38
1999-00 -- 4,37
1975-76 -- 4,35
1976-77 -- 4,34
1942-43 -- 4,34
1974-75 -- 4,32
1929-30 -- 4,32
1935-36 -- 4,27
1996-97 -- 4,27
1930-31 -- 4,27
1979-80 -- 4,25
1945-46 -- 4,21
1921-22 -- 4,21
1953-54 -- 4,2
1962-63 -- 4,17
1995-96 -- 4,14
1961-62 -- 4,11
1980-81 -- 4,08
1981-82 -- 4,03
1978-79 -- 4,03
1994-95 -- 4,02
1937-38 -- 4,02
1947-48 -- 4
1946-47 -- 3,99
1932-33 -- 3,99
1931-32 -- 3,97
1949-50 -- 3,94
1934-35 -- 3,94
1950-51 -- 3,92
1939-40 -- 3,9
1938-39 -- 3,89
1948-49 -- 3,88
1960-61 -- 3,87
1982-83 -- 3,86
1959-60 -- 3,84
1952-53 -- 3,84
1984-85 -- 3,83
1933-34 -- 3,83
1918-19 -- 3,81
1983-84 -- 3,78
1954-55 -- 3,77
1991-92 -- 3,77
1993-94 -- 3,76
1986-87 -- 3,74
1957-58 -- 3,69
1992-93 -- 3,69
1955-56 -- 3,67
1989-90 -- 3,67
1988-89 -- 3,67
1990-91 -- 3,63
1985-86 -- 3,63
1919-20 -- 3,62
1987-88 -- 3,62
1958-59 -- 3,61
1940-41 -- 3,58
1956-57 -- 3,51
1941-42 -- 3,43
1920-21 -- 3,31
 
Standart Dev. by "Adjusted Team Wins"

Season -- "Adj.Wins" Stand. Dev.
===============================
1917-18 -- 16,483
1918-19 -- 16,425
1919-20 -- 21,062
1920-21 -- 13,948
1921-22 -- 10,623
1922-23 -- 12,631
1923-24 -- 10,804
1924-25 -- 15,243
1925-26 -- 11,225
1926-27 -- 11,001
1927-28 -- 10,397
1928-29 -- 11,149
1929-30 -- 15,751
1930-31 -- 13,611
1931-32 -- 6,189
1932-33 -- 8,645
1933-34 -- 7,032
1934-35 -- 10,972
1935-36 -- 7,404
1936-37 -- 7,014
1937-38 -- 11,765
1938-39 -- 13,838
1939-40 -- 12,822
1940-41 -- 11,871
1941-42 -- 8,352
1942-43 -- 8,532
1943-44 -- 17,001
1944-45 -- 17,646
1945-46 -- 8,389
1946-47 -- 7,953
1947-48 -- 7,266
1948-49 -- 8,185
1949-50 -- 6,685
1950-51 -- 14,419
1951-52 -- 11,055
1952-53 -- 7,084
1953-54 -- 10,549
1954-55 -- 14,264
1955-56 -- 10,823
1956-57 -- 10,23
1957-58 -- 9,033
1958-59 -- 6,143
1959-60 -- 9,254
1960-61 -- 11,755
1961-62 -- 11,454
1962-63 -- 9,206
1963-64 -- 8,842
1964-65 -- 9,557
1965-66 -- 10,625
1966-67 -- 9,177
1967-68 -- 7,797
1968-69 -- 10,137
1969-70 -- 11,395
1970-71 -- 12,695
1971-72 -- 12,741
1972-73 -- 12,072
1973-74 -- 11,465
1974-75 -- 12,004
1975-76 -- 12,826
1976-77 -- 12,324
1977-78 -- 13,079
1978-79 -- 10,866
1979-80 -- 8,858
1980-81 -- 10,086
1981-82 -- 9,988
1982-83 -- 10,63
1983-84 -- 11,358
1984-85 -- 9,58
1985-86 -- 10,234
1986-87 -- 6,757
1987-88 -- 7,701
1988-89 -- 8,158
1989-90 -- 7,642
1990-91 -- 9,072
1991-92 -- 7,783
1992-93 -- 11,24
1993-94 -- 8,284
1994-95 -- 9,509
1995-96 -- 9,523
1996-97 -- 6,14
1997-98 -- 7,939
1998-99 -- 7,58
1999-00 -- 8,863
2000-01 -- 9,093
2001-02 -- 7,828
2002-03 -- 7,964
2003-04 -- 8,193
2005-06 -- 8,616
 
As usual, I think pnep provided a wealth of information but without enough explanation of it.

When you list the standard deviation in age, do you include every player who played even one shift in the NHL? Or is there a cutoff for number of games played to be included? In the earliest years of the NHL does information exist to give us accurate ages of all players? If not how is this handled? Is the mean age used in these figures the same for each season or does it flucuate year by year as the population fluctuates?
I assume the units are years - right?

Now standard deviation of adjusted team wins. You are adjusting for what? Length of schedule? Changes in rules adding shootouts and points for losing in overtime? Anything else? Now these numbers are in thousands, yet teams dont win thousands of games a year. Why?

I think the standard deviation in adjusted wins is a measure of competitive balance in the league. Is there parity? That does not necessarily show that the league quality is good. You can have parity in a bad league. You can have one or two really great teams in a high quality league.

I want to better understand these numbers. I think they might be very useful, but I need to know exactly what they represent before i misuse them.
 
"do you include every player who played even one shift in the NHL? " = Yes

"In the earliest years of the NHL does information exist to give us accurate ages of all players?" - I use ages from www.hockeydb.com, "Total Hockey" Book

"You are adjusting for what? Length of schedule? " - Yes
 
What are the units for your adjusted win calculation?
 
Units?

Season 1944-45:

MTL - 38 Wins ~ 62 Adj Wins
DET - 31 Wins ~ 51 Adj Wins
TOR - 24 Wins ~ 39 Adj Wins
BOS - 16 Wins ~ 26 Adj Wins
CHI - 13 Wins ~ 21 Adj Wins
NYR - 11 Wins ~ 18 Adj Wins

Standart Dev. by "Adjusted Team Wins" = 17,65
 
The comma is a decimal point. I thought you had teams with thousnads of adjusted wins. ... DOH
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?