# Statistics Are Dumb

Yes, I wrote that title. Not only did I write it, but I mean it.

“Dumb” basically means lacking intelligence. Most NHL statistics, frankly, require no intelligence. Let’s look at the complicated math involved in the basic statistics on NHL.com’s summary page.

• Goals: Watching and counting
• Assists: Watching and counting
• Points: Adding goals and assists (or, alternately, watching and counting)
• Plus/Minus: Watching and counting
• Penalty minutes: Watching and counting
• Power play goals: Watching and counting
• Shorthanded goals: Watching and counting
• Game-winning goals: Basic addition, watching and counting
• Overtime goals: Watching and counting
• Shots: Watching and counting
• Shooting percentage: Basic division, watching and counting
• Time on ice: Watching and counting
• Shifts per game: Watching and counting
• Face-off percentage: Basic division, watching and counting

Basically, if you’re capable of turning on your TV and counting things, you can create almost any NHL statistic from scratch. If you’re capable of doing that and then later using the division key on a calculator or computer, you can create any NHL statistic from scratch. I’ve listed a bunch above, but they’re all basically the same – goalie stats involve counting shots, goals and minutes played, real-time statistics all consist entirely of counting, and so on.

What about all those fancy advanced statistics that get thrown around? Scoring chance percentage, Fenwick, Corsi, EVPTS/60 – those are more complicated, right?

No.

Scoring chances involve somebody watching the game and counting. Scoring chance percentage simply involves taking the number of good scoring chances, and dividing them by the total number of scoring chances. In other words, if you know how to count and can press a division key on a calculator, you can have a firm grasp of this “advanced” statistic.

What about Fenwick? Well, you take those shots and missed shots that somebody counted up, and then you add them together – just like plus/minus. Corsi is the same thing, except that it includes blocked shots as well.

Points per 60 minutes of even-strength ice-time (or EVPTS/60) is almost as simple – one takes all the points a player scored at even-strength, and divides them by ice-time at even-strength to create a scoring rate. It is, once again, counting and pressing the divide key on a calculator. Pretty much as simple as can be.

But let’s go back to scoring chances. In an article yesterday, I did something audacious – I added up scoring chances for and against for Oilers’ defensemen. In the comments section, Robin Brownlee jokingly advised one commenter (i.e. not me) to do the following:

Your only option is to watch the games and draw your own conclusions.

Personally, I think that’s a great idea for everyone. It’s a little obvious, perhaps, but still a great idea.

It is, after all, what I do. I look for specific things – which players play the best opponents, what part of the ice players start their shifts in, how often players helps their team create a scoring chance, and how often players make mistakes that lead to chances against. As a rule, I try and get a gut feel for the game based on those things (others too, of course – which players take bad penalties, who wins faceoffs, etc.). Rather than watch the game multiple times and count those things up, I rely on others to do it – the NHL keeps track of a lot of these things (as mentioned above, by watching the game and counting) and people like Dennis King and Gabriel Desjardins catch the rest. I find that a firm number (i.e. Eric Belanger won 7 of 10 faceoffs) is better than my gut feeling (Eric Belanger wins a lot if faceoffs), so usually I’ll use the firm number instead of simply repeating my gut feeling. It’s the same thing with scoring chances – I know that Cam Barker’s getting heavily out-chanced by his opposition, but rather than say something like “man, that Cam Barker looks really bad” I’ll look up Dennis’ work and say “Cam Barker has been on the ice for 35 chances for and 49 against, which is one of the worst totals on the Oilers!” Afterward, rather than add “and he looks bad even though he’s got an easier job than other defensemen” I might use a number – like how many times he’s started shifts in the offensive zone, or how often he’s played the other team’s top line.

Of course, when I say “Barker has been on the ice for 35 chances for and 49 against” rather than “man, Cam Barker looks really bad,” someone comes along to tell me I should “watch the games.” I laugh, because it’s funny.

• @Oilanderp

Ok I think I see your point Mr. Willis. I have just a few minor questions about the process however.

Do you generally use a staff or a wand when conjuring this mystical data, or do you prefer an open hand?

What effect does the colour of one’s wizard robes have on the collection and analysis of these statistics? I myself tend toward a simple nondescript hooded black robe, but I have heard examples of those who even went so far as powder blue with large emblazoned gold stars.

Do you generally use Tolkien elvish runes to divulge the Hockey Gods secrets, or do you prefer the more mainstream ancient Sumerian scripts (before translating into modern math)?

When creating a protection circle before summoning any minor underworld quasi-deities (so as to acquire more advanced data), have you noticed any effect on said data as pertains to the choice of regular chalk and sea-salts versus sugared virgin blood and bat guano?

Any help in these matters is greatly appreciated.

• Michael

As the saying goes, ‘There are three types of lies; lies, damned lies and statistic’s’

• Jonathan Willis

I suspect most people who quote that phrase don’t know its history. Wikipedia has a fairly strong write-up on the subject.

It was originally in reference to three types of witnesses at a trial: liars, damned liars, and experts, with the meaning being that an expert witness can use the persuasive power of statistics to bolster an otherwise weak case.

It’s certainly true that statistics – particularly when carefully selected by those with knowledge and given to those without – can be used in that way.

For instance – I might say “Cam Barker (or Theo Peckham, or Jeff Petry) leads the Oilers with a plus-3 rating; he’s obviously their best defenseman!” and if you knew nothing about hockey, or hadn’t seen the Oilers play, you might accept me at my word because of my use of a simple statistic. Of course, I’d be lying – by eye, or by comprehensive statistics, Barker’s clearly been one of or the worst Oilers defenseman.

What I’m here discussing are comprehensive statistics – in other words, getting as much of the picture as we can by using stats that directly relate to things we see when we watch the game. In other words, the exact opposite of the quote you just used.

I’m glad you mentioned it, though – too many people use that quote as an excuse to ignore statistics altogether, when really the point being made is the need to examine them thoroughly to check for validity.

Unless, of course, you think originally that the quote meant one should ignore all scientific experts, which would obviously be crazy.

In short: Skepticism is healthy, ignorance is not.

• stevezie

Exactly right, Clyde.

I won’t bother looking up the youtube for that quote since apparently we have someone for that now.

• Romulus' Apotheosis

umm… Willis… I’m going to just go ahead and ask… Is there something objectionable about my question from the other day?

It seemed to me to be a perfectly reasonable novice question concerning the relation between two sets of statistical data that on their face appear at odds.

It was an attempt to suss out

1) the operational assumptions subtending the statistical data and their relations; and

2) the missing variable (from a novice perspective) that would account for what I acknowledged must be a superficial anomaly.

You gave me a coherent answer, that I repeated back to you, giving you the opportunity to correct my assumptions further. My recapitulation:

I get it… the reliability of the two stats operate on different scales of time and you expect the +/- to come down over time to meet the poor chances for/against.

Now, however my comment is not only a driving factor in this article but you appear to offer either a bastardized quote or paraphrase of my initial quote:

For instance – I might say “Cam Barker (or Theo Peckham, or Jeff Petry) leads the Oilers with a plus-3 rating; he’s obviously their best defenseman!” and if you knew nothing about hockey, or hadn’t seen the Oilers play, you might accept me at my word because of my use of a simple statistic. Of course, I’d be lying – by eye, or by comprehensive statistics, Barker’s clearly been one of or the worst Oilers defenseman.

so, what’s the deal?? What does it all mean? I never said anything like that, and I certainly hope you didn’t get that impression. If it’s just a coincidence that this entire thread is circling around my question so be it. But, if you are trying to imply something… I’d like to hear the substance of the implication and the argument behind it.

• Jonathan Willis

Terribly sorry if I gave you that impression! Your question was entirely legitimate – and your recap of my answer entirely as I intended it.

Plus/minus is just a good example of a statistic that gets misinterpreted, and often, by people who are new to looking at hockey through a statistical lens. I certainly meant nothing personal by using that same example in the new article.

• Romulus' Apotheosis

Thanks for the reply… good to know I was on the right track and didn’t put anyone off.

I ask a lot of questions on here, often in a silly manner. But it’s a genuine effort to solicit information. Like a lot of people who discover themselves on here I have never really thought much about hockey beyond “It’s awesome!” and having knowledgable people inform you is a real asset.

• Jonathan Willis

Here’s a smart stat.

100% of the 20 players on the Nov 17 game day Ottawa Senators roster will hear the chant of
Go Oilers Go!! from the fans at Rexall Place.

unfortunately, I don’t know how to Corsi or
QualComp this 🙁

Go Oilers Go!

• Derzie

As anyone who studied stats in HS or University, stats are anything you want them to be. You can use them to prove any theory, whether merited or not. But that ‘feature’ of statistics is only exploited by those with ill intentions or empty heads. The stuff on this site is not in that category.

• Jonathan Willis

@ Clyde Frog:

Gotcha – my bad for misunderstanding the question. I didn’t take it as an attack.

Gabe Desjardins (behindthenet.ca, arcticicehockey.com) and Vic Ferrari (vhockey.blogspot.com), among others, do that sort of thing fairly regularly. My own statistical background is shallow enough that I need a lot of hand-holding to do complex statistical modeling, so I typically don’t do much of it myself.

• Clyde Frog

@ Willis,

I may have been one of the guys giving you crap yesterday about nerding up the stats column…let’s be frank…everyone who posts on this site is at least a bit of a hockey nerd, and loves their hockey stats…I apologize for my douchie-ness…I guess I only like stats when they make my favorite players look good, I am far too biased! I guess my point yesterday is that there is no reason to jump on some guys because we had a 3 game losing streak, and 80% of Edmonton knows that 95% of stats can make the other 5% look like 50%…if you know what I mean.

• Jonathan Willis

@ Shredder:

No problem.

Just so you know, I don’t do much based on a team’s short-term record. Good teams have bad streaks and bad teams have good streaks all the time. So if there’s a three-game losing streak, I (as a rule) will intentionally ignore it, because it’s not long enough to tell us anything with certainty.

When I’m writing, I write whatever strikes me as being of interest that day. Sometimes, it’s negative when the team’s playing poorly, but that’s usually just a result of chance rather than intent.

• Clyde Frog

So are all the advanced stats kiddies keeping their numbers close to their chest? The more I look the less understanding of the underlying math they use to verify the numbers there is.

I can find all the calculations to recreate their statistics, but have little interest trying to verify the numbers and model them. (A very time consuming task)Just hoping you know if they post them as if your already doing the regression testing and more excel or whatever statistics tool you are using normally provides that information.

There seems to be little information on how to determine outliers, significant observations, confidence and more. I get that the aformentioned information is kind of overkill, but it is kind of necessary for understanding how all these theories work together and impact each other.

From my cursory knowledge and searches all I can seem to find is articles where the stats have been applied and conclusions drawn then argued about endlessly in the discussion threads.

• SmellOfVictory

The numbers repository is at behindthenet.ca, if that’s of interest to you. Don’t know what the exact method of collection is (shot totals could be easily enough grabbed from the NHL, but I’m not sure about blocked/missed shots).

In terms of confidence interval, the sample size is pretty huge when it comes to shots over the course of a couple of seasons. I don’t have a strong background in statistics, but I don’t think a confidence interval is particularly relevant when you’re looking at such large samples; especially when you have precise (albeit somewhat fallible in terms of possible human error in shot recording, etc) data on the entire population.

• Clyde Frog

Depends on your definition of sample size, thats what I am asking. If you taking a week, a month or 3 months worth of data and comparing it to a career and drawing statistical inference from it there is a big issue between those sample sizes.

So for those issues it really does matter because what advanced stats wants to do is break things down and state how that performance is working versus the “norm”, will the future maintain this pattern or is it a outlier?

I hate the internet for taking out tone, so I hope that didn’t come off dickish.

Just really interested in what amount of ice time or other stats you need to be even 80% confident you have the unbiased point estimate of the mean captured in your sample. Also what kind of qualifiers the advanced stats guys are using to ensure context isnt lost when we dive at players.

• Jonathan Willis

I think it bears noting that your post title has a double meaning: stats are also dumb in that they do require proper interpretation. If someone throws out “this guy has a 3.5 corsi/60 rating”, that’s all fine and good, but there is no context to it whatsoever. Context (which can be provided by other stats to a great degree) and understanding of the implications of the statistics are the things that make advanced stats a little more complicated than simply counting.

• Jonathan Willis

*to clarify, the requirment for context means they’re “dumb” because they can’t do all the work for you simply by reading them off behindthenet