Thursday, May 26, 2005

Sabermetrics and hockey

In the comments on yesterday's post, Jes Golbez points to a study one person named pnep put on the web at hfboards.com on his metric for who should make the hall of fame. It contradicts my intuition picking Glenn Anderson ahead of Dale Hawerchuk. I believe this is due to a flaw in the methodology of the study, but at this point I am unclear on exactly what the methodology was. I hope to address this in detail later when I further understand this study.

The idea behind this study is one I would like to support. I am a big fan of sabermetrics in baseball. Sabermetrics is the statistical analysis of baseball. Jim Albert describes it here. Bill James is one of my favorite authors. He is the most accomplished sabermetrics author in the world today. I strongly recommend anyone who is interested read his works.

I have wondered many times if it was possible to apply sabermetrics to hockey. The best attempt that I have seen on the net is Daryl Shilling's hockey project. I think it is imperfect, but if we keep in mind the errors that go with numbers produced by such attempts it is a nice effort. I think that it is not possible to have nearly as precise a theory in hockey as there is in baseball (though I would love to be proven incorrect). To explain why, I will quote from Bill James in his Historical Baseball Abstract

The difference between a good statistical analyst and a poor statistical analyst is that a good statistical analyst ... understands this, and a bad one implicitly denies it.

A good statistical analyst, in studying the statistical record of a baseball season, asks three or four essential questions:
1) What is missing from the picture?
2) What is distorted here, and what is accurately portrayed?
3) How can we include what has been left out?
4) How can we correct what has been distorted?

We all know many things and many different types of things that are not reflected in the statistical record. Acknowledging this, a good statistical analyst is sometimes able to reach out and draw areas of the game which were previously undocumented inside the tent, inside the focus of the statistical record. Sabermetrics is sometimes able to invent a way to correct for one or another distortion of the statistical picture.

The bad statistical anaylst , ot the other hand, will assume that what the statistical tells him must be true and complete- and by making that assumption, will forfeit his ability to add anything significant to the record.


We must look at hockey statistics. Playing winning hockey at its simplest requires being good at scoring goals and being good at preventing your opponent from scoring goals. If your team scores the most and allows the least goals your team will win. We must look at individual statistics and see how good they are as a proxy for a given player's ability to score and prevent goals.

Preventing goals is almost impossible to quantify. Right off the bat, we hit a huge roadblock. There is little to no reliable way to quantify how many goals a player has prevented being scored.

Most hockey stats attempt to quantify goal production. And they do a good (although sometimes misleading) job of this. An example of a misleading conclusion relating to goal production would be two average players. One player plays with all star linemates. The other plays with crappy linemates. The player with all star linemates will be creditted with assists and goals when his linemate did most of the work and he was just along for the ride. The player with crappy linemates will not get the same chance to score or assist. Even if he sets up a sure goal, there is no guarantee his crappy linemate will score. On the surface, these two identical average players will not appear identical because the one with the better linemates scored more often. This is misleading. In principle, I think this can be corrected for, although I am not certain I have ever seen it done successfully.

Hockey is different from baseball. It is much harder to quantify. As a result, I don't think as precise a sabermetric theory is possible. I think because of the misleading nature of some hockey stats and because many important aspects of the game are unquantifiable, many results of attempted sabermetric hockey studies produce garbage. Its not easy to shuck off the garbage from the input statistics and make something meaningful come out.

Comments:
This is also what makes it difficult to come up with a good simulator. Just how to you quantify chemistry between players the leadership potential? The intangibles are too many to count.
 
The difference between hockey and baseball from a statistical perspective is that the latter consists of a series of completely individual actions (i.e. at bats), whereas in hockey each action has several components and several contributors. Hockey statistical analyses must find ways to incorporate peripheral statistics such as giveaways, takeaways, scoring chances, etc. into the metric.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?