Let’s Talk About Regression

index

The Calgary Flames play their 20th game of the 2014-15 season tonight – preview coming later on today – and the hockey newswires are abuzz with discussion of the Flames red-hot start…and the inevitability of their collapse down the stretch due to one thing.

Statistical regression.

At the risk of sounding like an apologist for the team, let’s talk about regression.

First off, let’s get this out of the way: the Flames crashing down to Earth this season is not in-and-of-itself an inevitability, though I personally think there’s no damn way they can keep up this pace.

The thought process behind “Flames are gonna regress!” is partially based upon Calgary being among the NHL’s worst teams in terms of Corsi Close at 46.0% – 26th in the NHL and well below Vancouver’s 50.9% league median score. Bad possession teams don’t win a lot of games, so the fact that the Flames are winning a lot of games despite not being a bad possession team suggests something funky is going on. And the data seems to bear that out: the Flames have a PDO of 102.8 right now based on a 92.62 save percentage and a 10.12 shooting percentage. In short: a lot of the Flames’ success right now is based on proverbial smoke and mirrors.

*Need help with these stats terms? Click here to learn about Corsi and PDO

Let’s break that down a bit.

The save percentage is more or less league average; the median team in the NHL right now (Boston) has a 92.07 save percentage (and last year’s median was 92.26%), so the Flames aren’t that far above average. And based on last year’s horrid 90.89 number, they were probably bound for a course correction.

As for the shooting percentage? Let’s not mince words, 10.12% isn’t sustainable over an 82 game schedule. Last season’s best team (Anaheim) had a 9.83%. Two seasons ago, Toronto was able to maintain a 10.57% over a 48-game schedule and we all know how that turned out. Looking back at several years of team performances, it’s exceptionally rare that a team can maintain a shooting percentage north of 9 over a full season. Most likely the Flames crash down towards the mid-8% range, if not a tad lower even.

So why is it so high now?

Two main factors: luck and shot location. The first is easy: pucks are bouncing Calgary’s way when they could’ve bounced otherwise. The second?

Via our pals at Sporting Charts, here’s a comparison of shot locations from this season and last season.

Screen Shot 2014-11-17 at 10.42.38 AM

Calgary is a bit more effective at driving chances towards the front of the net, something that’s generally been born out of our own scoring chance data. Last year was a lot of perimeter play, this year involves more chip-in plays that utilize team speed to get past defenders. As a result, they’re getting more shots in high-percentage areas and more pucks are going in during these chances.

Can it work over an 82 game season? We’ll see. The California teams of the Pacific Division offer a nice test, and the change in strategy in the summer may have been a result of the Flames getting knocked around a bit by big, mean Kings, Sharks and Ducks defenders. If you can’t beat them with size, beat them with speed.

So far, the Flames have been.

So when IS regression coming, if it’s inevitable? Well, that’s the thing – nobody knows.

Here’s a quick table to illustrate the point.

Season(s) #1 PDO #30 PDO
2013-14 102.5 (BOS) 98.0 (FLA)
2012-14 101.9 (ANA) 97.3 (FLA)
2011-14 101.6 (BOS) 98.1 (FLA)
2010-14 101.9 (BOS) 98.4 (FLA)
2009-14 101.3 (BOS) 98.6 (NYI)
2008-14 101.6 (BOS) 98.6 (NYI)
2007-14 101.4 (BOS) 98.6 (NYI)

Regression doesn’t operate on a set schedule. Florida didn’t get appreciably “luckier” until FOUR YEARS of data was accumulated – or roughly 294 games, while it took two seasons for Boston to transition from “Calgary Lucky” to just fairly above average. Look at how Boston’s numbers stuck around as the sample grew from three years to seven.

For the curious, the distribution of PDO got more concentrated around the theoretical “100.0” mean value as the sample got bigger, but as you can see from this example, it’s a distribution with some long tails, even though the theory behind the data suggests everyone should crash down to 100.0 after infinite games.

Oh, and what quality does Boston have over this sample size that could make their PDO numbers a bit more resilient? They’re a great possession team, with a seven-year Corsi Close of 52.7% (fourth in the NHL in that span). With the Islanders and Panthers in the bottom-third of the NHL in that statistic, that could explain their lousy numbers.

Can Calgary have a sky-high PDO all season? The numbers say that it’s not impossible but it’s pretty unlikely, particularly given that the difference-maker for them has been insane shooting team-wide – high save percentages have tended to be a bit more resilient historically.

It’s not that the statistics community is trying to rain on anybody’s parade; it’s just that when you see some rain coming, it’s merely common courtesy to suggest everyone grab an umbrella. We’re not sure when the rain is gonna get here, but sooner or later, the rain – and Calgary’s PDO – is gonna fall.

  • Lordmork

    Reading comments over the last few weeks leads me to believe that a lot of people are going to be upset when the Flames regress because they think the team is better than it really is. Just how far they regress is a matter of debate; the Flames have some very talented players and some very promising prospects, but I fear there’s a long way to go.

    In my opinion this is a developmental season and the Flames are doing well in that goal and playing hockey that’s been great to watch as well. I’m enjoying seeing them defy gravity, but I fully expect them to come back down to earth sooner or later.

  • Rock

    There is one number that really matters and that is the big W once this is registered then there is no regression and so far the flames has been seeing a lot of W’s and that is what really matters. I believe the numbers didn’t want the flames on there Stanley cup run against Tampa either but the flames were. How about the Stanley cup run of Edmonton and Carolina nope the numbers weren’t good for those runs either. I don’t believe L.A. On there first cup run ranked very high in the numbers either being they just snuck into the playoffs what really counts is wins and so far so good GFG

  • jdthor

    Considering the injuries we have had, the improvement in goal, and the emergence of Gio/ Brodie as fantastic, I don’t think it is a foregone conclusion that we crash back to earth.

    Either way I am enjoying the season.

  • RKD

    There’s several reasons why I don’t think the Flames will regress as far as suggested. Firstly, Shot location. You even said it your self. Calgary has spent a lot of time working on where to shoot and how to get there (see interviews with players). And it shows. The Flames aren’t going to just forget where they should shoot from.

    There are other reasons why regression won’t as bad, but it’s much less quantifiable. They play as a team. It’s a team game and must be played as one to win. There are many groups out there that have superior talent, but fail to come together and play as one cohesive unit. It doesn’t matter if you have a superstar, it takes a team effort, not one good player.

    The next reason, which I’ve heard many people disregard, is heart, hard work and chemistry. I’m a full believer in heart and hard work. It’s hard to quantify, but the impact is profound. The Flames have guys like this in spades. They want it more than everyone else and it shows. They don’t want to be a bottom feeder, putting in the bare minimum. They can’t, it’s not in their blood. They will give everything they have when they step out on that ice.

    Finally, the Flames are a young team that’s constantly improving. They are learning the game, finding out how to win and improving their own abilities. If anything, they will get better!

    It’s for those reasons why I don’t think the Flames will regress as bad as many people are suggesting. Analytics are a great tool, but it can’t measure some of the most important factors in the game.

    Go Flames!!

    • Avalain

      “Firstly, Shot location. You even said it your self. Calgary has spent a lot of time working on where to shoot and how to get there (see interviews with players). And it shows. The Flames aren’t going to just forget where they should shoot from.”

      It has been demonstrated, over and over again, that it is highly unlikely that a team can take shots from a “better location” on a repeatable basis at a rate demonstrably higher than their underlying shot differential. Go read everything Gabe Desjardins wrote 3-6 years ago on Minnesota and Colorado.

    • jdthor

      I agree with your last point here. This team is still seeing improvements from guys like Jooris and Sven plus eventually Backlund, Raymond and Stajan will be back which will only help them. I’d have to think there’s a real chance that the possession numbers improve over the season and in that case the PDO doesn’t have to come crashing down like everyone seems to think. In any case I’d rather my young guys get a taste of what it takes to win rather than have them flounder all season long.

  • Kevin R

    I maintain that stats is a phenomenal analytic tool of what’s happened in a particular sample size. So to say that the team is due for regression is a no brainer in a league that parity is about as close as you can get. For example, how do stats explain Buffalo trouncing Anaheim. So can the stats community step out & predict the amount of regression a team like the Flames are about to endure? I don’t think so because there are too many human/other circumstances playing a factor.

    The shot locations can have a two sided interpretation. I read into your view that the shot location is the reason for the high shooting percentage & that will change as we play bigger teams. Why can’t it be that the Flames players are just that much better hockey players & have been coached to shoot from these optimal locations & therefore it is not clear as to how much regression will necessarily occur.

    The moral of this is that there is too much tendency to use stats to predict how a team is going to do. To me, it is a tool that analyses performance, weaknesses of a players game & stats should be used for organizations to evaluate & scout prospects, to assist in development of an organizations young players & assist in enhancing performance of a teams players.

    The human factor on the ice cannot be ignored. If Gio or Brodie were to get injured & the Flames suddenly stopped scoring, is that regression? You can look at stats & say wow, that really does explain what happened but it really doesn’t say what will happen. Or you can take the knowledge as an organization, use it to create profiles for the scouts, to create development programs for its young players & used for coaches to assist players that are in slumps or struggling in some facet of their game. Stats do not predict luck, if they did we all would be winners in the casino every time we went. JMO

  • That shot heat chart is awesome. It would be better if they had one for shots against them as well. It would be a great way to offer insight into shot quality factoring into the raw numbers as well.

    • Avalain

      The whole point of this is that if people get too hyped up about the team and then the team falls back to earth, a lot of people can get upset. What we don’t want is for the team to go on a cold streak and suddenly people are calling for the coach to be fired or some key player traded.

      Personally, I’m cautiously optimistic. I’m seeing the team improve as the season goes on, and if that continues we can have a regression and still keep winning.

    • The Last Big Bear

      Well, I don’t consider “This won’t last” to be bad news.

      What I wanted to see this season was a continuation of the hard working on-ice culture, some progression out of Backlund, Monahan, and Sven, one or two prospects from the farm to look like legit NHLers, and for Brodie to not regress.

      The jury is still out on Backlund and Sven, so as far as I’m concerned, the Flames are 5-0-2 in terms of what I wanted them to accomplish this season.

      Talking about whether they’re going to keep winning games this season is of purely academic interest to me. This season is a success already.

  • The Last Big Bear

    PDO doesn’t regress.

    I have no idea where this notion comes from.

    Teams like Pittsburgh and Boston maintain league-leading and decidedly outlying PDO’s for years and years on end.

    Teams like Florida maintain poor PDOs for years.

    That’s not just random noise, teams have PDO’s that are reflective of their level of goaltending and skill of their shooters. There is some noise within that, and there are perturbations based purely on chance, but there is nothing inherently unsustainable about outlying PDO numbers.

    Calgary is due to regress because their roster sucks. PDO’s got nothing to do with it.

    Good teams with low PDO will regress to being good. Good teams with high PDO will continue to be good. So what is PDO adding to the discussion?

    In both cases we’re just saying that good teams will be good. PDO has contributed nothing.

    Bad teams with high PDO will regress to being bad, and bad teams with low PDO will continue to be bad. Again, you can throw PDO out the window, and get the same result. Bad teams will be bad.

    PDO is descriptive, but it is in no way predictive of anything. Calgary is going to ‘regress’ because they are not particularly good. Not because of their PDO.

    (I know you went into some more detail Ryan, particularly the shooting %, which I think is the real underlying anomaly in the Flames’ stats, I’m more just ranting against a lot of the discussion I’ve seen on the topic lately).

    • Operationally, you’re saying the same thing. PDO regresses in the league because the talent is too spread out for anyone to maintain percentages a couple of standard deviations outside the mean. Sure, if the Flames had Crosby, Gio, Rask, Brodie, Neal, Malkin, Perry, Doughty and Hall they could dominate with percentages. But that’s why PDO has been shown to regress to the mean 87% over the long-term for teams in the NHL – no one puts together Oiler style dynasties anymore. It’s impossible.

      If the Flames are bound to regress to their true talent, how do you say “this isn’t predictive”. That’s the definition of predictive! It’s fine to use hindsight after a crash to say “well that team bad, that’s why they crashed”, but that’s not how prediction works.

      “Good teams with low PDO will regress to being good. Good teams with high PDO will continue to be good. So what is PDO adding to the discussion?” – Because when good teams suffer through lousy PDO’s people start to question if they’re good and vice versa. You’re projecting a non-reality where we all know for certain which teams are “good” and “bad” and that all results are obvious. Of course, in this reality we don’t need any predictive or descriptive stats – we already know the answer.