Comments / New

The Casual Critic’s Complete Guide to Hockey Analytics and Advanced Stats

When is a 20-goal scorer not a 20-goal scorer? When is a playoff team not a good team? Who has value that teams take for granted? Can we know future success? These were the questions analytics helped answer when they warned the Toronto Maple Leafs not to sign David Clarkson, because performance and the capacity for performance are not one and the same. They predicted the Avalanche under Patrick Roy would inevitably collapse (as well as Toronto under David Nonis and Randy Carlyle), because strong results without strong possession are not sustainable. They were at the heart of assessing the quiet value of then-journeyman defender Anton Stralman, and were the common denominator in figuring which teams could score more, and thus win more.

But what makes analytics more special than video analysis? Simple: our eyes aren’t enough. Understanding something and gaining knowledge of something requires more than one sensation. That’s why it’s so worrying to hear NHL coaches describe their eyes as video, or how they use numbers to give them more accurate info of what they already see, because analytics are the complete opposite of this. They’re about recognizing our limitations — hundreds of events occur in a given hockey game, so analytics help chunk that info into relevant patterns rather than try to catalog every single piece. (There is a sport for Memory Athletes, and no, not even champion level players work this way). They’re about challenging our biases — am I more likely to notice a defenseman making a turnover because I already believe they’re prone to turnovers, or because they’re actually turning the puck over at a disproportionate rate? They’re about developing a better language for assessment, and getting away from the standardized testing of Coach’s Trust and teleprompter analysis.

For you, the casual critic (or the layfan), they’re also out of our jurisdiction. All the best info costs money to access, which inherently excludes a segment of people, creating a sort of knowledge inequality. That’s why this primer will have an added twist: let’s define the criticism too. Analytics are a big part of my writing, but my background is philosophy. I know what it’s like to smell the confusion around a fact (the state of something) versus its truth (the essence of something). We have a name for this contradiction in philosophy. They’re called Gettier Problems, referring to when justified true belief becomes false (i.e. Pluto is a planet). Thus, each section will have subsections on when a stat or model is more or less useful.

I’ve divided this primer into four main sections.

  • Impact Metrics (the what — what happens when Player X or Team Y are on ice?)
  • Situation Metrics (the who/when — who is chosen for Situation X and Circumstance Y and when?)
  • Performance Metrics (the how — how does Player X or Team Y influence on-ice results?)
  • Value Metrics (the bottom line — what is the value of Player X or Team Y?)         /

A Digression About Rates and Score/Venue Adjustments

If you’re new to stats, you might be wondering why analytics are mentioned in combination with even strength and per 60. This is because rates clarify traditional counting stats. If I earn $21 per three hours, and my brother earns $10 per hour, who makes more? My brother! Not only is he more talented, but I only make seven dollars per hour. That’s how per 60 (or per 60 minutes of even strength play) helps. No game is played entirely at even strength, but most of it is; over 75 percent of the game is played at even strength on average, and roughly 75 percent of the shots taken over the course of a season are taken there too. By analyzing even strength events in depth, we can ignore the noise of traditional counting stats. In fact, let’s demonstrate with an exercise in why per 60 production makes more sense.

  • A true top-line forward will give you, on average, 2.5 points per hour or more of even-strength play.
  • A top three forward delivers between 2.0 to 2.5 points per hour.
  • A top six forward will score between 1.5 and 2.0 points per hour.
  • A bottom six forward will score between 1.0 and 1.5 points per hour. /

Here are the 2021-2022 Dallas Star forwards, arranged by rank in points per 60, with their raw point totals listed in the last column.

If you’re a Stars fan, this all checks out. Michael Raffl and Radek Faksa may have scored more overall than other forwards, but their production was boosted by playing more minutes than most others, hence why they rank so low if we only counted what their production would look like if the entire game was played at even strength.

As for score adjustments, most modern stats already do this, but what it means is that numbers should account for the score of the game since leading teams will shoot less, and trailing teams will shoot more. This concept was introduced in 2011 when it was found that you could reasonably predict a team’s future wins by looking at their shot differential in a tied game state rather than using the team’s points in the standings. Venue adjustment is a little more complicated, but the long and short of it is this: people counting stats in some arenas are kind of suspect. Between inaccurate shot locations in Minnesota and New York, blocked shots in New Jersey, giveaways for the Islanders, and hits in Vegas — you can count on some sort of discrepancy in how stats are recorded at home versus how they rank on the road. If there’s a lot to clarify, it’s because there’s a lot to measure as accurately as possible. Now, onto the primer!

Impact Metrics

What they include: Corsi, Fenwick, Scoring Chances, Expected Goals (xG), etc.

Corsi refers to shot attempts; nothing more, nothing less. It’s a fixed version of a ‘shots on net.’ After all, shots that hit the goalpost should count as a chance, just as shots that miss the net should count to give us a better idea of a goalie’s workload. Corsi clarifies what possession means, ignoring the archaic dichotomy between offensive vs defensive teams, and focusing on the degree to which they control or don’t control the puck. It helps us identify teams that are sustainably creating offense and controlling the puck versus those that are artificially scoring goals, and simply shooting hot. For a demonstration, here are the top 10 teams last year in Corsi For Percentage.

  1. Florida Panthers (CF 4300 / CA 3330) 56.36%
  2. Carolina Hurricanes (CF 4116 / CA 3205) 56.22%
  3. Calgary Flames (CF 4194 / CA 3351) 55.59%
  4. Boston Bruins (CF 3864 / CA 3227) 54.49%
  5. Los Angeles Kings (CF 4013 / CA 3398) 54.15%
  6. Toronto Maple Leafs (CF 4074 / CA 3533) 53.56%
  7. Colorado Avalanche (CF 4008 / CA 3574) 52.86%
  8. Vegas Golden Knights (CF 4147 / CA 3742) 52.57%
  9. Edmonton Oilers (CF 3948 / CA 3577) 52.47%
  10. Pittsburgh Penguins (CF 3905 / CA 3597) 52.05%

Only the Knights missed the playoffs, which says a lot about how valuable Corsi is; especially when considering what an anomalous season they had. I’ve included the raw Corsi For or shot attempts for (CF) and Corsi Against or shot attempts against (CA) because it’s always useful to see how percentages are distributed. In this case, Florida is driven by high-event hockey while Boston is driven by low-event hockey, giving us a language to talk about the pace at which teams control the game. We can then use these same principles for players.

You’ve probably heard about Fenwick, but it’s like Lipton Alligator Soup or the Rainforest Cafe at this point. Who remembers? The reason Fenwick’s worth talking about is that before all the fancy models and ten-dollar charts, it was hard to separate shot quantity from shot quality, even if we always knew the inherent difference. Enter Fenwick. While a blocked shot might be a worthwhile stat to tally, we wouldn’t call it a scoring chance, would we? Thus, Fenwick refers to all shots except for blocked shots. The benefit of excluding them is that for defenders, it can help identity skilled shot blockers while for forwards, it can help identify skilled shooters. Here are Dallas’ top five defenders in shots on goal + missed shots allowed (Fenwick Against or FA).

  1. Joel Hanley FA/60: 39.11
  2. Miro Heiskanen FA/60: 39.18
  3. Jani Hakanpaa FA/60: 39.29
  4. Thomas Harley FA/60: 39.83
  5. Esa Lindell FA/60: 41.62

And here are Dallas’ top five forwards in Fenwick For:

  1. Jason Robertson FF/60: 49.93
  2. Roope Hintz FF/60: 47.83
  3. Joe Pavelski FF//60: 47.45
  4. Marian Studenic FF/60: 42.91
  5. Alex Radulov FF/60: 41.12

These check out, even if Hanley and Studenic’s sample sizes inflate their placement a little.

Moving on, what distinguishes Fenwick from a scoring chance? This is where things get more granular. Scoring chances refer to shots within the home plate area of the ice, as seen below.

“Yea but what if the shot is coming from above the left dot on a Patrik Laine slapper versus Jay Beagle trying to battering ram a puck past the goalie’s pads in the crease?”

That’s a good question, Casual Critic. Some databases don’t tally them for this reason. Not only are we creating superficial borders, but some scoring chances only become high danger opportunities because of a low danger scenario i.e. a point shot that turns into a net-front deflection. Then there’s the whole rink bias issue mentioned earlier. Natural Stat Trick still has stats for scoring chances, along with different levels of low versus high danger chances (just click the Show/Hide Column bar on the top left). I still think these are valuable as raw information, and noticing strange trends i.e. a forward taking a disproportionate amount of low danger shots or a defenseman taking a disproportionate amount of high danger shots. Adding to that, proprietary info was given to Elliotte Friedman last year suggesting that high danger goals against per 60 were a strong predictor of playoff success.

That brings us to expected goals (xG), which is an attempt to weave all of these variables into a model rather than raw tallies. Let’s step away from hockey for a second. If you want to know how much space is inside a gray box, you’re not going to measure every inch; you’re gonna break it down so that it won’t matter how many boxes you have, or how big or small they are. With the right formula, you’ll always have a good idea of their volume. In our gray box case, we can find out the volume by multiplying length, width, and height. Viola!

Think of xG the same way you do shot quality rather than a strict expectation of which shots should become goals. It’s also important to think of models in terms of usefulness rather than whether they’re right or wrong. Just like the gray box analogy, you may have to adjust your formula if the material on the gray box itself is too thick, but you don’t have to abandon it completely. You just have to adjust. To give you a sense of scope, here are the variables in MoneyPuck’s xG model.

1.) Shot Distance From Net

2.) Time Since Last Game Event

3.) Shot Type (Slap, Wrist, Backhand, etc)

4.) Speed From Previous Event

5.) Shot Angle

6.) East-West Location on Ice of Last Event Before the Shot

7.) If Rebound, difference in shot angle divided by time since last shot

8.) Last Event That Happened Before the Shot (Faceoff, Hit, etc)

9.) Other team’s # of skaters on ice

10.) East-West Location on Ice of Shot

11.) Man Advantage Situation

12.) Time since current Powerplay started

13.) Distance From Previous Event

14.) North-South Location on Ice of Shot

15.) Shooting on Empty Net

Micah Blake McCurdy of HockeyViz has what is, for my money, the only definition of xG you’ll ever need. “Expected goals are just a measure of how loud the crowd gets when a shot is taken.” Put another way — also quoting McCurdy — “crowds are xG models that talk.”

When they’re more useful: I think impact metrics are good for assessing team value. In the end, if you’re outshooting your opponent, night in and night out, you’re creating the climate for good offense rather than lazily benefiting from the weather of offensive happenstance. Here’s another thing. As more and more stats, models, and ideas are introduced, it’s easy to dismiss the elegance of Corsi. As CJ Turturo found, quantity is easier to control than quality, and even now, some analysts argue that Corsi is still king, and better at predicting future goals than some xG models.

When they’re less useful: Impact metrics don’t tell us much at the player level, especially at the level of small sample sizes; same principle as why we don’t use plus/minus. The other issue is that a team’s quality will weigh heavily on these counting stats. For example, the worst CF team last year was Arizona. Clayton Keller was 12th on the team in CF percentage. Safe to say, Keller was not the 12th-worst player on that team. He was just 12th worst in terms of shot attempt control when he was on the ice on a team already depressing shot shares for everyone else.

Let’s put this into perspective another way. There’s a strong correlation between per capita cheese consumption and the number of people who died by becoming tangled in their bedsheets. So yea, even the strongest correlations are obviously bogus. A player with a high CF/xG percentage doesn’t tell us how they were used, when, whether or not that usage had an effect, or whether or not they’re helping drive those impact metrics. That’s what the next two sections are about.

Situation Metrics

What they include: TOI (time on ice), QoC (quality of competition), QoT (quality of teammate), OZS (offensive zone starts), With or Without You (WOWY)

It’s not enough to assess the impact a player is having on their team. It’s also worth clarifying the perception that teams themselves have on their players. How often does a player start in the defensive zone versus the offensive zone? Does their coach trust them more in offensive situations (needing to score), or defensive situations (hold the lead)? Thankfully, we some have very convenient, easy-to-read visuals for these exact scenarios per HockeyViz.

These are normally pretty indicative of how coaches perceive their roster. Last year, Dallas started their best goal scorer in the offensive zone as much as possible, which makes sense. Last year, Dallas chose not to deploy their best defensemen for late ties or holding leads, which makes less sense. Situation metrics give us a better view of how coaches view the roles they’ve assigned to their players.

Quality of competition is tricky, but PuckIQ has what I consider the most thorough database since the opposition is divided into tiers of elite, middle, and “gritensity.”

Let’s focus on the most important parts. Last season, Heiskanen played 6.76 minutes per game against elite competition. We also get the unblocked shot differential within those minutes (DFF%), and unblocked shot differential within those minutes compared to the rest of the team (DFF% RC), where he was a net positive compared to his teammates versus elite competition.

JFreshHockey has an easy-to-read tableau for these situation metrics on his Patreon page. For example, here’s QoC vs. QoT.

Moving on, WOWYs are a stat that bright hockey minds hate and for good reason. WOWYs refer to “with or without you” which is to say, how does a player perform away from someone versus with them? Obviously, how one player performs away from another skater says nothing about the other four. But I think they can be valuable in allowing you to catch broad patterns more easily.

Jamie Benn didn’t have a great season, but as you can see, when he was centering his own line — which I would argue Dallas should have done more — his group either generated more offense than usual (Alex Radulov and Jacob Peterson) or defended better (Michael Raffl).

When they’re most useful: Context metrics are good at showing us how coaches value players, and who is trusted to run their system. The more they trust a player, the more minutes that player gets. How they view that player’s role determines whether they start more in the offensive zone, or the defensive zone. They’re also a good window into coaches who (I would argue) overthink things by failing to trust talented players in situations they’re otherwise willing to trust others who reliably fail in said situations. Player usage reflects a coach’s psychology and their coaching habits in a lot of ways, and they should be no less immune to criticism than players. Thankfully, we have stats on how coaches react to different game states too.

When they’re less useful: More context doesn’t mean more perspective. Too often we see role players get lots of defensive zone starts, and when they get hammered in those starts, the excuse is that “well they play tough minutes.” If they can’t succeed in those minutes, then perhaps they shouldn’t be playing them. Even more importantly; most players start close to 60 percent of their shifts on the fly, and another 15 percent in the neutral zone. In other words, just because a player is getting “buried in tough minutes” doesn’t mean they’re always starting in the defensive zone. It just means they’re always starting in the defensive zone when the coach can control who starts in the defensive zone. Moreover, you can’t talk about quality of competition without talking about quality of teammates. As Ryan Stimson discovered, a defenseman’s on-ice numbers are more influenced by their defensive partner, followed by their forward opponents whereas a forward’s on-ice numbers are more influenced by their forward partners, followed by their defensive opponents.

Performance Metrics

What they include: individual corsi for (iCF), individual expected goals (ixG), Shot Assists, Zone Entries, Zone Exits, Isolated Impact (HockeyViz), Regularized Adjusted Plus Minus (RAPM)

So far we don’t know how or to what degree a player drives impact. Individual measurements — individual Corsi for, individual expected goals — give us the broad strokes for how a player performs. In Dallas, Jason Robertson was the leader in individual shot attempts per 60, and second only to Roope Hintz in expected goals per 60. As two of Dallas’ three best offensive players, this checks out. Thanks to Corey Sznajder, we now have access to how they pass, whether they excel at deflections, if they’re a threat off the rush versus the cycle, how consistently they break into the opponent’s zone with control, how consistently they break out of their own zone with control, and much more.

“We really need all that?”

Slow your role, Pierre. Some of this might look like analytics minutiae, but zone entries are critical in a game where most goals are scored within seven seconds of entering the zone. Why passing? Preliminary data shows that the best predictor for keeping goals down is not clearing the crease, but preventing the pass. These are also stats Corey tracks at the team level too. How does a team drive offense? With quick-strike rush entries, or attrition-style forechecking?

Again, this might look like analytics minutiae, but a team’s method of attack can be predictive of their ability to generate offense depending on the game state. Meghan Chayka famously found that a cycle attack creates higher shot quality for trailing teams while rush attacks create higher shot quality for leading teams. The trend reverses when the game states reverse (cycle attacks are bad for leading teams while rush attacks are bad for trailing teams). This makes intuitive sense since leading teams are typically conceding the zone anyway, allowing a more patient attack to further work its magic while trailing teams are allowing quick odd-man rushes back the other way in their desperation to manufacture quick odd-man rushes of their own. The cliche goes “they don’t ask how, they ask how many” and the cliche is wrong. If they didn’t ask how, we wouldn’t have a draft, and the first AHL callup would be its leading scorer (which is rarely a talented young prospect, but traditionally a veteran AHLer). I think Corey’s performance metrics can also give us a window into understanding why a team might be getting more chances, but aren’t scoring. No matter how many shots they take, if there’s no plan of attack, then that might explain why they’re not scoring the way you’d expect them to.

Now let’s go above my pay-grade and talk about trying to assess player-performance by isolating their on-ice impact from everything else.

“How the hell do you do that?”

I’m not a mathematician so I don’t know why you’re asking me but I’ll do my best. Micah Blake McCurdy’s isolated impacts are based on a model that measures shot differential, and the likelihood of those shots becoming goals (or not) based on location, assuming league-average goaltending. From there, variables are included to cut down on any statistical noise. These variables include:

  • Even Strength Offense
  • Even Strength Defense
  • Power Play Offense
  • Short Handed Defense
  • Penalties Taken
  • Penalties Drawn
  • Teammates
  • Competition
  • Score Effects
  • Zone Usage
  • Coaching
  • Home-ice deployment
  • Player tendencies (shooters versus passers)/

His model accounts for the prior probabilities of the player as well, which is a way for the model to “recognize” that the following data respects that this is, in fact, Miro Heiskanen that we’re looking at and not goalie prospect Blade Mann-Dixon (yes, that’s an amazing real name of an actual player.) We end up with the following, with red indicating more shots in a given area above league average, and blue indicating less shots in a given area above league average. You want to see more red than blue at the top (offense), and more blue than red on the bottom (defense).

The percentages in each box refer to the increase or decrease in the likelihood of Dallas scoring (or stopping) a goal based on Heiskanen’s impact. His +3 EV offense means the Stars are three percentage points more likely to score with Heiskanen in the offensive zone, and his -11 for EV defense means the Stars are 11 percentage points less likely to be scored on in the defensive zone, which makes him a stud at both ends.

The other cool thing about McCurdy’s model is that the visuals highlight the geography of performance. On the power play, his work increases Dallas’ man advantage offense by two percentage points thanks to his movement to get shots up close rather than bombing from the point. Setting is a new addition to HockeyViz that links with Corey’s tracking data to give playmaking skills a nice metric. In this case, Heiskanen’s passing increases his team’s ability to score by four percentage points. I think what makes McCurdy’s data especially crucial is that it makes it easier to analyze special teams and systems. Is the player in the bumper position creating offense where we should expect them to create offense? Is a defensemen getting caved in on his side of the ice, or does it look like the other skaters are letting him down?

That brings us to a similar(ish) stat: Regularized Adjusted Plus Minus, or RAPM. You’ve probably seen these.

I won’t explain this one in depth because a) I’ll screw it up and b) the additional reading section will have multiple primers on this. The other reason is that I don’t want to get lost in the weeds of analytics. If you have a good grasp of McCurdy’s isolated metric, then you have a good grasp of RAPM. The difference is that RAPM is displayed in a plus/minus form looking at isolated player impacts through goals for, expected goals for, shot attempts for, expected goals against, and shot attempts against. There are differences beyond how they’re presented though and that’s important to understand too. RAPM incorporates an xG model into its calculations, whereas McCurdy’s does not. However, where McCurdy calculates coaching impacts and player priors, RAPM does not. This is another important takeaway from the advanced stats you see: different people have different interpretations of how best to measure. As a result, different models will disagree on certain players. But while the adverbs and adjectives might change for each model, the language is the same. These models give us a better image of Heiskanen’s true plus/minus; as in, the plus/minus of events rather than the plus/minus of goals.

When they’re more useful: I always find tracking data especially useful for getting a more complete picture of a defenseman’s performance. Most defenders are defined by their broad efficiencies rather than their narrow talents. They have more ice to cover, which makes the scope of their duties wider in scale. It’s just as important for them to activate plays from the defensive zone as it is for them to control the neutral for effective, controlled setups for forwards to break into the offensive zone. Good defensive defensemen tend to look good in these stats too. Just look at a truly elite three-zone defender, in this case Jaccob Slavin, versus a defender who is decent at this point in his career in terms of counting stats, but also clearly a byproduct of their minutes and their linemates, in this case Ryan Suter (who spent his last several years next to multi-dimensional defenders like Heiskanen and Jared Spurgeon).

That’s data stretching all the way from 2016 to 2021. Checks out.

When they’re less useful: Being good at something doesn’t guarantee making others better. A good rush attacker on a team that wants to cycle may be less effective. Performance doesn’t happen in a vacuum, especially in a sport where coaches don’t last, and prime years are short. Mix that into the fact that players are constantly learning new systems, reacting to new systems, as well as new players, and it’s clear just how dynamic the spectrum is.

A similar criticism is lobbied at RAPM. How can you truly isolate a player’s impact in such a team-centered sport? Dallas fans have argued over this point with regards to Esa Lindell. He only played four games without Klingberg as soon as he entered the league; they’ve been tied to the hip ever since. Finding out how closely connected Lindell’s performance was to Klingberg is finally happening, six years later, which creates a huge discrepancy in how you isolate a sample size for said player.

In addition, how do we know the difference between what a player is capable of versus what a player is told to do? That answer should be obvious for players we have large sample sizes of, but we don’t always have those, so respecting when to apply them is key.        

Value Metrics

What they include: Game Score (GS), Game Score Value Added (GSVA), Goals Above Replacement (GAR), Wins Above Replacement (WAR)

The concept behind value metrics is to bridge the gap between performance (causation) and impact (correlation). While the layperson may consider these stats overly complex, I don’t think we give enough credit to how intuitive they can be. How many times have you watched a game and were surprised that a player didn’t have any points? “He could have scored at least four goals that night!” How many times have you watched a player do all the legwork carrying the puck into the zone to set up his teammates and then doesn’t get credit for an assist? “Yeah, that was a nice goal by the Edmonton winger, but that play was all McDavid.”

That’s what value metrics are for; a comprehensive macro-level view of a player’s impact + performance. Starting with Game Score, the idea is not only to capture the direct value of goals and shots, but the ripple effect a player has on goals and shots, such as assists, penalty differential, faceoff differential, and shot attempt differential. You’ve probably seen these on Twitter after a game per HockeyStatCards.

The concept of GS (“Who had the best game?”) is to capture single-game productivity. It’s like when you see a player’s name in the box score; now just add more events. Imagine plus/minus, but now reward them for drawing a penalty, or winning a faceoff, and punish them for taking a penalty, or losing a faceoff, etc. This overall concept forms the backbone of Dom Luszczyszyn’s GSVA, which takes Game Score, and then adds even more ingredients, such as expected goals, a player’s last three seasons, and ultimately converts them into wins. You’ve probably seen the player cards. Here’s the explainer:

Pay attention to the bottom numbers: 1.5 (projected) and 2.2 (on pace for). That means this player is projected to be worth at least two points in the standings, and they’re on pace to be worth four. So if a team is on pace for 95 points without this player — which is around the cutoff for making the playoffs — then the team should comfortably make the playoffs with them.

A similar concept defines Goals Above Replacement. But first, what is a replacement player? If you’re wondering why we don’t just use averages, let’s consider what average would actually mean on the 2021-2022 Stars roster. The average number of goals scored by a Star player was 8.03. If a forward scored eight goals last season, they would have been ninth in goals among the forwards. For a defenseman, eight goals last season would have led all defenders in scoring on a roster with Heiskanen and Klingberg. This “average” goal scorer would have been reasonably valuable. A replacement player on the other hand represents a lower baseline. In this case, a replacement level player is defined by the average performance of a typical healthy scratch forward/defenseman i.e. players with league-minimum salaries, and negligible ice time. This gives us a better sense of population rather than a generic distribution. (It’s worth noting that despite this, Evolving-Hockey lets you switch between replacement and average on their baseline tab in their skater tables.)

GAR models then weigh the following variables against how much higher (or lower) a player will perform over that replacement baseline.

  • Even Strength Offense
  • Even Strength Defense
  • Power Play Offense
  • Short Handed Defense
  • Penalties Taken
  • Penalties Drawn/

The end result is a single number that can be represented via goals.

Wins:

Or points in the standings, which is probably easier for most fans to understand:

“Ok but what makes GAR/WAR different from Game Score/GSVA?”

While they measure similar things, replacement models use what’s called regression analysis.

“What is regression analysis?”

You’re a jerk but let’s go. Let’s use regression analysis on something stupid, like whether my writing benefits from eating more (or less) Entenmann’s chocolate donuts, which are great. My dependent variable is what I’m trying to predict — more words, more paragraphs, and thus more writing production. My independent variables will be all the things that could be closely connected with my writing production. In this case, more donuts to stuff my face. [Editor’s note: there were a lot of donuts consumed in the making of this story.]

As we can see, I have a serious problem. The more of those delicious, borderline addictive Entenmann’s donuts I eat, the more words I write. The regression line doesn’t represent the truth of this analogy. It just represents the estimate of my relationship between writing and eating. After all, there are more variables that may be at play. But we’re not trying to figure out if my Entenmann’s addiction is making me a more productive writer. We’re just trying to estimate what that relationship is so I can reasonably predict the variables expected to impact my future production.

This is incredibly simplified of course, not only because I’m only talking about one variable, but because the regression analysis I just used is just one type of regression whereas many of these models use more complicated ones.

When they’re more useful: Evaluating depth forwards. Raw production is prone to so much random nonsense: injuries forcing a player into an increased role, shooting percentage, ice time, and deployment for example. Value metrics allow us to reframe production into what a player is able to manufacture from shift to shift rather than point to point. Let’s say I have four forwards. Only one of them can make the team and the only information I have are their counting stats from the previous year. Who do I choose?

  • Player A scored 17 goals and 16 assists for 33 points.
  • Player B scored 11 goals and 21 assists for 32 points.
  • Player C scored 14 goals and 13 assists for 27 points.
  • Player D scored 9 goals and 18 assists for 26 points./

Player A might seem obvious. He scored the most points with a nice mixture of goals and assists. The rest are a lesser combination of Player A, with Player D being the outlier as the forward who favors the pass. These players aren’t setting the world on fire with their production, so production can’t be my focal point here. Maybe the difference in value can be found in a more comprehensive metric, like their goals or wins above replacement? Now let’s look at these 2017-2018 players to include their Wins Above Replacement from the last three seasons.

  • Player A scored 17 goals and 16 assists for 33 points. Prior three-season WAR of 3.3.
  • Player B scored 11 goals and 21 assists for 32 points. Prior three-season WAR of 1.
  • Player C scored 14 goals and 13 assists for 27 points. Prior three-season WAR of 1.2.
  • Player D scored 9 goals and 18 assists for 26 points. Prior three-season WAR of 3.1./

It’s still difficult, but it’s not difficult to see who should be the first to cut given the fact that Player B’s production seem to depend on good weather rather than good climate. (Player A through D, in order, were Radek Faksa, Devin Shore, Tyler Pitlick, and Jason Spezza.) These metrics can also be effective for knowing which forwards will benefit from playing up the lineup rather being stuck in an artificial “role.” Artturi Lehkonen’s play-driving ability was fantastic before he got to Colorado, but he was buried in Montreal. The fact that he only broke the 30-point barrier once was clearly misleading. Using per 60 production, he graded out as a top three forward in Colorado; inflated perhaps, but also evidence that he was always capable of something better with better usage, and certainly worth more than the draft picks he was eventually traded for. (Using the per 60 production rule of thumb from above, in Lehkonen’s first five seasons with Montreal, he averaged 1.42 points per hour of even strength play. That would make him just shy of a proper top six forward, which he probably could have been for them with better respect for the subtleties of his game.)

When they’re less useful: Before delving into criticisms, I want to be clear about what this section is not: this section is not about why “hockey is too complicated to be broken down into just numbers.” This is information contextualized using mathematical methods, and that’s kind of a big deal. Math predicted antimatter in 1928 with Paul Dirac’s equation, and universe expansion in 1916 with Einstein’s theory of relativity. That math will somehow never understand how a puck can move past a 72-inch wide net is an unserious position I’m not interested in giving weight to.

Part of the issue with value metrics is that they suffer from the so-called drunkard’s search principle: a drunkard looking under the streetlight for the keys he lost on the other side will look under the streetlight because that’s where the light is — what other choice does he have? The point being, value metrics rely on what’s happening on the ice with the puck as opposed to what’s happening on the ice away from the puck (personal opinion: this is what really sets Corey’s Three Zones Project apart). This seems like an obvious enough statement, but Game Score’s focus on single-game productivity will naturally favor forwards over defensemen. Dom Luszczyszyn’s GSVA model underwent some remodeling before the 2019-2020 season to better account for defensive performance after strong defensive teams like the Stars, Islanders, and Blues made so much noise in the playoffs that year.

There are two other issues with value metrics, and that’s how they account for development. I’m not talking about assessing prospects either. That part goes without saying. Prospects are black boxes, so assessing their potential upward development is an impossible task, no matter how much promise. Here I’m talking about development among experienced players too.

Example: Matthew Tkachuck averaged a Wins Above Replacement of 2.04 through his first five seasons in the NHL. Last year? He was worth 4.2 wins. Even if Tkachuck is simply hitting his stride, that’s a massive spike in value, likely reflecting what he owed his teammates in Darryl Sutter’s system rather than reflecting his true value. In the 2017-2018 season, depth defenseman Colin Miller was worth almost two wins, the highest of his career, and a number that would indicate a borderline elite defenseman. Why? I have no doubt that playing next to Brayden McNabb (which he only got to do once in his career) on a Cinderella team had everything to do with it. John Klingberg went from averaging around two wins between 2014 and 2019 to less than one over the last three years. Any deeper analysis makes very clear just how much a passive defensive system under Rick Bowness affected his three-zone play, and thus his bottom line. There’s also a real connection between Goals Above Replacement, and the change that will occur as a direct result from going to a better team, or worse team. If the quality of a player’s GAR scores are contingent upon the quality of a team, then a single number won’t always approximate the link between talent and performance. Systems are a big deal, especially since they change rapidly given how short the average coach’s stay is. (This is especially true in hockey as head coaches in the NHL have one of the highest turnover rates in all of major sports, with an average shelf life of 2.4 years per tenure.)

Getting back to the drunkard’s search principle, this is something Cam Charron intimated to The Athletic after reflecting on his eight years with the Leafs, and what publicly available data can’t quantify: the importance of passing. Shots preceded by a pass are more than twice as likely to go in as those that don’t. For any single-value stat relying on goal impacts, that’s a major blindspot (again to reiterate, I am not saying this renders the metric inert). Going back to my writing and donut problem, that’s like focusing on my increase in writing as I eat more donuts, and ignoring all the times I take my car to buy the donuts to begin with.

Closing Thoughts

I consider advanced stats information an essential part of analysis. But I know it’s tough for readers to slog through, and in a landscape with increasing negligence about how to apply information, the onus is even greater on those who want to share. In doing so, it’s important for analysts to communicate the divide between practical and theoretical knowledge. What a player is theoretically worth in wins tells us little about their practical worth to a new team, or the on-ice dynamic we can expect in a potentially different role, or under a different system. Analytics should be about making complex data accessible, and converting that accessible data into a dialogue about what it means for the stories that unfold for players and teams. They’re not supposed to supply easy answers. They’re supposed to supply clearer language. Good insights are important, but not as important as good judgment.

References and Additional Resources

Evolving-Hockey by Josh and Luke Younggren | Glossary | Goals Above Replacement | Wins Above Replacement | RAPM

HockeyViz by Micah Blake McCurdy

NaturalStatTrick by Brad T | Glossary | Patreon Page

All Three Zones Project by Corey Sznajder | Glossary | Patreon Page

MoneyPuck by Peter Tanner

Game Score Added Value | Game Score Added Value FAQ | Measuring Single Game Productivity: An Introduction to Game Score by Dom Luszczyszyn

JFreshHockey | Analytics via Patrick Bacon

ARHockeyStats by Andy & Rono

HockeyStatCards by Cole Palmer

Hockey-Statistics by Lars Skytte | Introduction to Advanced Hockey Statistics

MoreHockeyStats by Roman Parparov

Puckluckanalytics by Greg Amundsen

PuckIQ by WG

CHL Microstats Database by Mitch Brown | Patreon Page

Tape to Space: Redefining Modern Hockey Tactics (2018) by Ryan Stimson

Writers to Follow

Alison Lukan

Travis Yost

Dimitri Filipovic

Charlie O’Connor

Jack Han

Dom Luszczyszyn

Hockey-Graphs | A Treasure Trove of Insightful Articles

Points per hour benchmark inspired by Jonathan Willis

Iztok Franco (Although he covers basketball, few do storytelling better through numbers than him. You can learn a lot from other sports where analytics are more entrenched — almost all of hockey’s models and stats come from existing work in other sports to begin with.)

Talking Points