Lies, Damned Lies, And Statistics

Jonathan Willis
November 17 2011 12:04PM

It’s a famous saying, one popularized by the great American writer Mark Twain, who (perhaps incorrectly) attributes it to former British Prime Minister Benjamin Disraeli: “There are three kinds of lies: lies, damned lies, and statistics.”

What does it mean?

Let’s look at the context, which is “Chapters from My Autobiography,” published in 1906. Via the marvellous Project Gutenberg:

I was very young in those days, exceedingly young, marvellously young, younger than I am now, younger than I shall ever be again, by hundreds of years. I worked every night from eleven or twelve until broad day in the morning, and as I did two hundred thousand words in the sixty days, the average was more than three thousand words a day--nothing for Sir Walter Scott, nothing for Louis Stevenson, nothing for plenty of other people, but quite handsome for me. In 1897, when we were living in Tedworth Square, London, and I was writing the book called "Following the Equator" my average was eighteen hundred words a day; here in Florence (1904), my average seems to be fourteen hundred words per sitting of four or five hours. I was deducing from the above that I have been slowing down steadily in these thirty-six years, but I perceive that my statistics have a defect: three thousand words in the spring of 1868 when I was working seven or eight or nine hours at a sitting has little or no advantage over the sitting of to-day, covering half the time and producing half the output. Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: "There are three kinds of lies: lies, damned lies, and statistics."

Let’s boil down that full quote to its elements:

  • Twain found in his early years that he was writing 3,000 words per day on average. That figured dropped over time to 1,800, and then again down to 1,400 words per day.
  • Therefore, using words per day, Twain was slowing down as he aged.
  • However, that statistic was flawed – Twain was actually spending fewer hours writing each day, and when he used words per hour, rather than words per day, he found that his writing had not slowed at all.
  • Reflecting on the misleading nature of words per day, Twain referenced a saying he liked: “There are three kinds of lies: lies, damned lies, and statistics.”

It is an example we can compare directly to hockey. A player might go from scoring one point per game one year to 0.8 points per game the next year, leading some to conclude that his scoring rate has dropped. If, however, his ice-time has gone down by the same amount, we could point to that and reflect, as Twain did, on “lies, damned lies, and statistics.”

Put another way, it was never intended as an argument against using statistics, just an argument that statistics can mislead if not checked carefully.

An interesting side note: I mentioned above that Twain may have been incorrect in attributing the quote to Benjamin Disraeli. The website “The Phrase Finder” notes that there is no record of that quote in any of Disraeli’s published documents or letters. It credits Leonard Courtney with the first published use of the phrase.

Why is that interesting? Courtney (pictured above with the glorious facial hair), a politician and academic, used the phrase in 1895. In 1897, he became President of the Royal Statistical Society, a professional body for statisticians.

In other words, there’s good reason to laugh if someone uses that quote as a refutation of all statistics – as it was instead intended to show the need to identify the right statistic, and primarily used by men who did so.

74b7cedc5d8bfbe88cf071309e98d2c3
Jonathan Willis is a freelance writer. He currently works for Oilers Nation, the Edmonton Journal and Bleacher Report. He's co-written three books and worked for myriad websites, including Grantland, ESPN, The Score, and Hockey Prospectus. He was previously the founder and managing editor of Copper & Blue.
Avatar
#1 SrCain
November 17 2011, 12:06PM
Trash it!
0
trashes
+1
0
props

FIRST !! impressed?

Avatar
#2 jdubbs
November 17 2011, 12:11PM
Trash it!
0
trashes
+1
2
props

" And tell 'em Lanny sent ya "

Avatar
#3 me
November 17 2011, 12:15PM
Trash it!
0
trashes
+1
0
props

I have used that quote but never to refute ALL statistics and in reality I can't imagine anyone educated doing so.

Anyone studying anything, whether it be hockey stats, statements to the press, etc should always be careful that the whole picture is being represented.

Avatar
#6 The Brahma Bull - Team Bring It !
November 17 2011, 12:25PM
Trash it!
0
trashes
+1
0
props

I wonder what Brownlee thinks about this.

Avatar
#8 kittensandcookies
November 17 2011, 12:43PM
Trash it!
0
trashes
+1
0
props

The problem with statistical analysis in hockey is that it's basically all post-hoc.

Yeah, I know the Flames suck. I know the players suck. I don't need "Corsi" to know this. I can look at their game record.

The players have bad "stats" because the team is bad. The team has bad stats because the players are bad. QED.

Avatar
#10 TS
November 17 2011, 01:16PM
Trash it!
0
trashes
+1
0
props

The real problem is that so few people are really able to distinguish valid statistical work from invalid. For example, sport staties try to predict future behaviour of dynamic systems with techniques that are only valid for static data. The data on, say, player performance, is almost certainly heterogenous, meaning it can't tell you what is going to happen next with any degree of reliability. Any degree. We're not just talking matters of confidence intervals here, but completely meaningless. Any apparent patterns are just that: apparent. That's why it can all seem so reliable for a while, then go to hell in a handbasket. Basically, you can't get there from here. Or not with that map, anyway. Anyway, good luck. (I know you'll just keep on doing it anyway. - http://www.nytimes.com/2011/10/23/magazine/dont-blink-the-hazards-of-confidence.html?_r=4&ref=general&src=me&pagewanted=all)

Avatar
#11 duckyfan
November 17 2011, 01:23PM
Trash it!
0
trashes
+1
0
props

Reminds me of the comment of Marc Crawford on TSN, last night, in regards to Boom Boom Geoffrion..."There are three things I want you to do...skate and shoot..."

Avatar
#12 kittensandcookies
November 17 2011, 01:27PM
Trash it!
0
trashes
+1
0
props

@Jonathan Willis

That's not a statistical prediction.

First off, you don't even know the p-value of the numbers you are using, so how can you even begin to start to calculate the odds of his play dropping?

Secondly, you've just done another post-hoc analysis. You're saying "Aha! See, he's having a bad season! I can ascertain this by his past play!"

Avatar
#13 Archaeologuy
November 17 2011, 01:36PM
Trash it!
0
trashes
+1
6
props

Here's one (or two) of my biggest problems with SOME of the stats crowd.

1) Elitism. Some members of the group seem to radiate a sense of elitism about stats that is off-putting. Stats are nice, but like JW says, it's most basic division. It's not rocket surgery or brain science. There always seems to be a new stat du jour that if you arent quoting then you must be some kind of infantile. Not a fan of that.

2) There is an extreme overvaluation of fringe stats that mean very little to even pro-active members of the Hockey world. Goals win games. Zone starts, PDO, Corsi, and Fenwick dont win games. The game is decided by whoever has the most goals at the end of regulation and OT if needed. Most "Advanced" stats are secondary, maybe even tertiary stats that offer a numerical value for context. They are contextual in nature. Who was on the ice against who, where did they start their shift, etc. These stats supplement "Standard" stats. They better inform them when possible, but they are not at all more important than the standards.

3) The Stats are still subjective. Dividing goals by ice time is objective. Deciding what that number actually means towards the analysis of a hockey player is subjective. Hits? Subjective. Take-Aways? Subjective. Scoring chance? Subjective. Heck. Not long ago Bruce McCurdy wrote an intriguing article with the conclusion that being outshot had very little bearing on the outcome of a game. What does that do to a stat like Corsi that tracks shots for and against? Whatever it does, it will be subjective. The stat itself is pretty straight forward. How it's applied, understood, and collected can be incredibly subjective.

4) Too many Stats-happy opinioneers are lacking in their ability to use the written word. Thankfully Jonathan Willis writes for the Nations, because he is one of the few (it seems) that is both a Writer and a proponent of Advanced Stats. It is not MY problem as a reader that advanced stats are often presented in the most reader unfriendly manner. It is the problem of the author that a perfectly good idea or argument is not reaching potential readers due to a lack of style or ability. Ryan Batty, I believe, in a recent piece suggested that story telling in sports is less required today more than ever. I am here to call bullsh*t.

The hockey world needs eloquent writers who can appeal to larger audiences more than ever, and it needs those writers to have a solid grasp on the Advanced Stats. A stat is just a number until it is infused with meaning, and story tellers are desperately required to give these things meaning.

Avatar
#15 Shredder
November 17 2011, 02:14PM
Trash it!
0
trashes
+1
0
props

@Archaeologuy

Well written, this is exactly right. Tell stories.

Avatar
#16 godot10
November 17 2011, 02:37PM
Trash it!
0
trashes
+1
1
props

A statistic without an estimate of the error bar basically is a useless number.

A lot of stats guru's throw around statistics without providing any estimates of the error, making the assumption that the error bar is small enough to make their statistic valid.

But that is a dangerous assumption,.

Avatar
#17 godot10
November 17 2011, 02:41PM
Trash it!
0
trashes
+1
0
props

Determining the error bar on a statistic is usually the hard work that never gets done.

But that is the real work, the real proof of the statistic.

Avatar
#18 hockey project
November 17 2011, 02:44PM
Trash it!
0
trashes
+1
0
props

First post after a few years of lurking...

I don't think there's any denying that certain elements of the pro-stats crowd CAN be smarmy. "Come on you moron, don't you know that his Diefenbaker Index is over 6.5?!" We've all met them, but unreasonable people from all camps are a pain to deal with. To the same extent, certain members of the "Lies, damned lies" group are just as bad.

Anyway... Traditional hockey boxcars long ago took on qualities of language. By saying that a guy is a 50 goal scorer or 100 point man, or had 300 penalty minutes, we get an immediate idea of what that player is: the sniper, the scorer and the goon. That language is firmly established, fills in key details immediately, has had meaning to most of us since we were little boys, and we haven't been called upon to expand our vocabulary very much over those years. Given that, I imagine that Zone Starts and Corsi and PDO can sound a bit like a foreign language to some people.

This isn't to say that people who disagree with the use of advanced stats simply don't understand them; I don't think that at all. For my part, I'm uncomfortable with pretty much any extreme viewpoints in life, and it's the same for me in this area, too. Being 100% in the All Stats group or all Saw-Him-Good club doesn't help anybody.

Archeologuy: I couldn't agree with you more about finding stats-based writers with the ability to actually write. I've liked stats for many years in baseball, worked on my own in hockey for many years as well, but to just read a blizzard of numbers with android-like text in between? /shudder

Avatar
#19 RexLibris
November 17 2011, 02:56PM
Trash it!
0
trashes
+1
1
props

For a team like the Oilers had in 2009-2010 stats were very useful in determining whom they ought to flush and who would be best kept around. The statistics in that situation coincided with what most fans to that point had realized: that players like Patrick O'Sullivan, J. F. Jacques, Ethan Moreau, and Robert Nilsson were better to take out of the equation now than to continue with.

The Flames will likely have to do some similar roster algorithms to determine the best assets to keep and the best ones to polish up and put in the storefront.

Avatar
#20 @Oilanderp
November 17 2011, 03:07PM
Trash it!
0
trashes
+1
1
props

I have no comment at this time.

Avatar
#21 Beavis
November 17 2011, 03:12PM
Trash it!
0
trashes
+1
0
props

The use of Twain here made my day, especially because the article doesn't just throw the Twain quotation out there, but actually takes time to explain it in its proper context. In the English teaching world we call this close reading, which really just means looking at every word carefully and figuring out exactly what is being written. I think the same technique should be applied to the use of Advanced stats. It's one thing to have them, but quite another to look at them closely and figure out exactly what they mean beyond the raw numbers. Willis and Lowetide are especially good at this.

Avatar
#22 Romulus' Apotheosis
November 17 2011, 03:18PM
Trash it!
0
trashes
+1
0
props

Somewhat on topic:

For anyone is interested in the most intense and thorough investigation into the social capital of "expertise" and the various games of dissimulation (both conscious and unconscious) at play in the marshalling of data, facts and value, I can recommend nothing more than Orson Welles' 1974 filmic essay F For Fake:

Bogdanovich talking about it:

http://www.youtube.com/watch?v=Rur4wPupBCg

The movie in full:

http://www.youtube.com/watch?v=z2EZ9rFBRlI

Highly recommended!

Avatar
#23 the-wolf
November 17 2011, 03:46PM
Trash it!
0
trashes
+1
0
props

Good article and good post by Archaeloguy.

To get to the meat and potatoes of the argument: stats are too often misused and abused by people trying to support their argument.

Stats very much have their place, but must be employed with very clear objectives beforehand as to what you want them to accomplish, used in a concise, consitent manner and carried out without any bias or hopes of reaching a foregone conclusion.

Otherwise, stats are worse than useless, they are misleading and give false information.

Avatar
#24 Clyde Frog
November 17 2011, 03:53PM
Trash it!
0
trashes
+1
0
props

@Archaeologuy

I'll agree with you that how they are presented is incredibly subjective and most amateur statistic fans seem to make sweeping statements based on some ridiculously simple calculations then drawing deep conclusions because of some percieved correlation.

IE The genius who was screaming that the Nuge couldn't play hockey because his assists to goals ratio was too high. (Although in the end he quantified that this deep analysis only works for top 5 picks and you have to ignore anyone else.)

At the level we are fed there is no effort to provide the math or explanation behind it, identifying outliers, modeling it and any sort of quantitative qualifying statements to give some sort of perspective.

The biggest problem is dividing things by other things and claiming the outcome is significant is not Statistics nor is it Advanced statistics. Its, well looking for a way to quantify a feeling they have about a particular player.

Number one rule of stats for anyone who isn't sure what all this means. Correlation IS NOT equal to Causation, just because someone notices a trend that higher goal scoring teams tend to win does not at that point mean anything until a lot more work is done to quantify and verify.

Luckily for you there are a lot of people who devote several years to post secondary educations to understand this stuff and I am pretty sure the Oilers pay those people bags of money to help them.

So stat haters rest well! You don't have to do the math, nor do you have to pay attention! Someone else is getting payed to have those sleepless nights for you :)

Avatar
#25 SmellOfVictory
November 17 2011, 03:56PM
Trash it!
0
trashes
+1
0
props
kittensandcookies wrote:

The problem with statistical analysis in hockey is that it's basically all post-hoc.

Yeah, I know the Flames suck. I know the players suck. I don't need "Corsi" to know this. I can look at their game record.

The players have bad "stats" because the team is bad. The team has bad stats because the players are bad. QED.

Corsi tells you which players are sucking and which might not be. Like all other stats, they lend context to your understanding of a player. PDO, for example, is hugely useful (especially because you can't "watch the game" in order to get a good feel for it); look at Phil Kessel, starting the year off as the leading scorer in the NHL. Does this mean he's literally the best offensive player in the NHL? Nope. His shooting percentage, and his line's on-ice shooting percentage were THREE TIMES higher the mean (Kessel's career average, NHL mean respectively). That's flat-out ludicrous.

Advanced stats can help understand that a guy is snake-bitten, product of his circumstances, etc. in ways that simply watching the game can't do alone. Nobody can watch enough hockey or pay close enough attention to get the kind of meaty understanding that one gets from a combination of watching the game and looking at the underlying numbers.

Avatar
#26 Don Jackson
November 17 2011, 05:05PM
Trash it!
0
trashes
+1
0
props

I seem to remeber your advanced statistics pointing to Linus Omark as the best defensive player on the Oilers roster. Regardless of what statistics were used to prove something like that Ray Charles can see that it just doesnt make it so.

Avatar
#27 MattL
November 17 2011, 06:07PM
Trash it!
0
trashes
+1
0
props

Stats guy kept saying, "although it looks like Dustin Penner doesn't try very hard, have a look at his underlying stats. He's a valuable player."

14GP 0G 2A -2

Q.E.D.

Avatar
#28 Bleak Winter
November 17 2011, 09:29PM
Trash it!
0
trashes
+1
0
props

@Archaeologuy

That post alone should have a button for props in increments of 5 and 10.

100% agree.

Avatar
#29 stevezie
November 17 2011, 10:06PM
Trash it!
0
trashes
+1
0
props

I do not consider myself a stats guy, but I do think that the virulently anti-stats crowd is confused. The only thing that matters about a hockey game is a stat: who won, who lost. If team A looks better by eye but loses the game, no one cares. If Linus Omark looks like crap on defence but his line doesn't get scored on, then his defence works. Unless, of course, advanced stats show that he's getting lucky with weak opponents and great goaltending.

Avatar
#30 MattL
November 18 2011, 08:44AM
Trash it!
0
trashes
+1
0
props

@stevezie

Yes and no. The only thing that matters about a hockey game is entertainment, no? Otherwise it would be a bunch of people sitting around rolling dice.

Avatar
#34 KenMcC
November 19 2011, 03:34PM
Trash it!
0
trashes
+1
0
props

@SmellOfVictory

A great example of a stats guy throwing out unsubstantiated claims in a language all his own.

Avatar
#35 FastOil
November 20 2011, 12:37PM
Trash it!
0
trashes
+1
0
props

Sorry for the novel-

I have to agree with Archaeologuy in regard to 'elitism' or whatever attitude comes up at times, especially in some writer's responses to comments. I have often thought rude and angry. Reminds me of university and and all the self righteous, narrow focused angry crap I heard from people studying the sciences.

Many people take the attitude that the 'net is a free for all, including some writers. In reality, rough insulting language and personal attacks are a demonstration of having no class, and perhaps being a bully. The course of human history has shown that neither of these traits have value, and nothing has changed since the advent of the net.

I have often wondered how some responses to posts I've made would have been put around a table in a pub. My guess is with a lot less vitriol and attitude. Internet tough guys are cowards of the highest order. I can't recall ever seeing that in your work JW, which is a sign of your class, and why I enjoy reading your work more than the other stats based Oiler writers now.

I also agree that some of the stats writers could use language better. However I don't see the issue as one of quality or style, as much as the misuse of words, especially to prove a point in a rebuttal to a commenter. Given the seemingly high level of education for most of the writers, it calls honesty into question, and is part of the backlash against stats writers to me. It's not that the numbers are necessarily wrong, more that people naturally dislike being manipulated and have a repulsion to BS.

An example of this is a time I stated I prefer bigger players, took all kinds of abuse from writers and commenters. I was thinking closer to the NHL average as opposed to players far under it which the Oilers have had so many of. Most who got angry at me turns out assumed I meant guys as big as Lucic. It is important to clearly define words and parameters if you want a clear analysis and reasonable debate.

To me there is a difference in a writer misusing language and stats to promote a point of view, and a commenter doing so. Everyone should show respect, and commenters that don't should simply be moderated, and banned if disruptive and noncontributing. The writers are in control of the forum, so should be held to a higher standard.

I am interested in understanding hockey, and that has lead me to an interest in stats. I don't have the time to become expert in them. I have at times made mistakes commenting, or gone against the current flavour du jour of the stats gurus and had really mean spirited angry retorts from guys getting paid for me to read their work. I think that is completely unacceptable, immature, and in the end the writer's anger biased them so much they were often wrong.

To get my stats chops up I read the abandoned blog Irreverent Oiler Fans, where much of this whole thing was born. Vic Ferrari is the standard all who write about hockey stats should hold themselves to. For those of you who don't trust or care for hockey stats, the key to their validity lies mainly with one thing - context. And context is what takes the time, and the smarts.

Many of the stats writers we read now commented on Vic's posts, and cut their stats teeth there. What continually came up was the context in which numbers were looked at. When Corsi or Fenwick should be used. How those numbers are better for looking at long periods of time, and aren't deeply meaningful for a single game, unless looked at in the context of who a player played, when and how much. I think it is fair to say a lot of what we read is superficial analysis and not given the rigour the best and very likely brightest Oiler stats blogger Vic Ferrari says it needs.

The type of player analysis JW often uses is valid and is actually just basic logic. Players will not play above their normal demonstrated level for more than certain periods of time. The only time this doesn't hold up is when there was a mistake made in assessing what that normal level of play is, and that is usually an issue of context not being accounted for. It is very hard to argue this, especially because there is so many examples to support it.

Same for LT's NHL equivalency pieces. Not set in stone, but a pretty good look at the future of player, that has a lot of research by Desjardin behind it, and hundreds of examples to verify it.

Comments are closed for this article.