Advanced Stats Explained

LGR4GM · July 5, 2018

From the Athletic: https://theathletic.com/415611/2018/07/05/an-advanced-stats-primer-with-naturalstattricks-brad-timmins/

It covers more than just CORSI.

Quote

Corsi

Corsi simply means all shots directed at the net, including blocks, misses, goals and shots on goal. You’ll likely see me refer to “shots” and mean the same thing as I think it’s easier for fans getting into advanced stats for the first time to understand without the foreign names.

The community began looking at all shot attempts because it was found to be more predictive of future scoring than prior scoring. Analytics is really about using available information to make the best decisions for the future of your team. What information do we have today that can help us make better decisions for tomorrow?

Quote

...Evan Rodrigues played 527 minutes during 5v5 play in 2017-2018. In that time he scored 18 points. Jack Eichel played 970 minutes during 5v5 play and scored 33 points. When we normalize their ice time (Points/TOI*60) we arrive at nearly identical scoring rates of: 2.05 for Rodrigues and 2.04 for Eichel. Very efficient play from Rodrigues, especially more so given his Quality of Teammates, which we’ll get to shortly.

Edited July 5, 2018 by LGR4GM

LGR4GM · July 5, 2018

CF% or Corsi For Percentage, Shots for/ (Shots for + shots against). You want this to be over 50% because it means you control more of the shots on the ice. For example Eichel's CF% is 49.7 so when on the ice there are slightly more shots against the Sabres then for the Sabres.

IKnowPhysics · July 5, 2018

Ugh, they're simply determined to make me want to subscribe.

LGR4GM · July 5, 2018

He's going to have a part 2 story probably tomorrow or so. The Rodriguez stuff was very interesting. Rodriguez is a player that we need to produce if we are ever going to leave the basement.

Erod - Mittelstadt - Okposo might be a solid line next year because Erod should flourish with better teammates and an uptick in ice time. Now the numbers above are just from 5v5 and that is because PP and PK are something different. Erod plays about 11minutes of 5v5 time per game. If we just bump that up to 13minutes of 5v5 time, so 2 more shifts, that would mean he plays 1,066 minutes of 5v5 in an 82 game season. If he maintained his scoring rate, that equals 36.42 5v5 points in a year. That doesn't even take into account him playing with better teammates. That is just straight up what he should have if he plays 82games a year at an average of 13minutes 5v5. That is some encouraging stats.

P60 is useful because you basically normalize the points a player would score per 60 minutes of ice time (5v5).

Samson's Flow · July 5, 2018

I just feel like there is another 10 paragraph megapost from Randall coming in this thread...

Derrico · July 5, 2018

1 hour ago, IKnowPhysics said:

Ugh, they're simply determined to make me want to subscribe.

It really is worth it. Very good content.

LTS · July 5, 2018

In the chart I posted in the ROR thread.. or whatever.. Rodrigues' name popped into the top 5 Sabres in scoring. This was apparently not coincidentally when the team played its best hockey last year.

Randall Flagg · July 5, 2018

9 minutes ago, Samson's Flow said:

I just feel like there is another 10 paragraph megapost from Randall coming in this thread...

I'm totally grappling with a few topics (mostly questions) right now but I don't have what I want to post quite nailed down yet.

24 minutes ago, Skurk Liger said:

He's going to have a part 2 story probably tomorrow or so. The Rodriguez stuff was very interesting. Rodriguez is a player that we need to produce if we are ever going to leave the basement.

Erod - Mittelstadt - Okposo might be a solid line next year because Erod should flourish with better teammates and an uptick in ice time. Now the numbers above are just from 5v5 and that is because PP and PK are something different. Erod plays about 11minutes of 5v5 time per game. If we just bump that up to 13minutes of 5v5 time, so 2 more shifts, that would mean he plays 1,066 minutes of 5v5 in an 82 game season. If he maintained his scoring rate, that equals 36.42 5v5 points in a year. That doesn't even take into account him playing with better teammates. That is just straight up what he should have if he plays 82games a year at an average of 13minutes 5v5. That is some encouraging stats.

P60 is useful because you basically normalize the points a player would score per 60 minutes of ice time (5v5).

We saw that line for a game and a half and it's the only one I would really push to see first thing this season - the rest of the lines I'm pretty unsure of/open to persuasion.

pi2000 · July 5, 2018

Haven't read it yet, but fwiw... WSH had the leagues 8th WORST SAT% last season,.

The year before that, PIT was in the bottom half of the league.

Draw your own conclusions.

dudacek · July 5, 2018

1 hour ago, Randall Flagg said:

I'm totally grappling with a few topics (mostly questions) right now but I don't have what I want to post quite nailed down yet.

We saw that line for a game and a half and it's the only one I would really push to see first thing this season - the rest of the lines I'm pretty unsure of/open to persuasion.

In my most perfect of worlds, Okposo bounces back and Mittelstadt is an immediate hit, making that a legitimate second scoring line.

A lot of things would have to go right.

LGR4GM · July 6, 2018

17 hours ago, pi2000 said:

Haven't read it yet, but fwiw... WSH had the leagues 8th WORST SAT% last season,.

The year before that, PIT was in the bottom half of the league.

Draw your own conclusions.

What's SAT%? or how does it work?

Derrico · July 6, 2018

Athletic has Part 2 to this series out now.

LGR4GM · July 6, 2018

23 minutes ago, Derrico said:

Athletic has Part 2 to this series out now.

Great article. Also Kane with Eichel was a major mistake and Sheary should by all metrics be a better fit.

Drunkard · July 6, 2018

21 minutes ago, Skurk Liger said:

Great article. Also Kane with Eichel was a major mistake and Sheary should by all metrics be a better fit.

I always hoped we'd see Kane with Reinhart as his center. They seemed like they would have been a much better fit since Kane likes to carry the puck and shoot and Reinhart seems most at home camping out near the net. It a shame they didn't get more time as a pairing.

Edited July 6, 2018 by Alkoholist

Happy Days · July 6, 2018

https://www.diebytheblade.com/2018/7/5/17536778/the-sabres-in-a-bubble-part-1-team-statistics

LGR4GM · July 6, 2018

30 minutes ago, Superhero said:

https://www.diebytheblade.com/2018/7/5/17536778/the-sabres-in-a-bubble-part-1-team-statistics

Awesome love this. I am still learning about stats and hockey stats so this stuff is really interesting.

Randall Flagg · July 6, 2018

Some of the most egregious straw men I've read in hockey are directed towards advanced stats and their users. An example is hinted at in pi's comment above. I've seen other people say similar things here.

The straw man is that advanced stats communities claim to have predictive power over every single outcome to happen in hockey. Point me to a single serious user of advanced statistics who believes that every good corsi team will win and there's no chance a bad corsi team can win. Better yet, point me to a single serious creator and publisher of advanced stats that not only doesn't outline the limited scope and applicability of any stat, but doesn't employ statistical techniques that the straw-men makers don't understand to get a value on exactly how reliable the stat is relative to any other in existence, and publishes the results ruthlessly, coldly, and without emotion, as a high energy physicist would publish his jet reconstruction algorithm results.

It's a foolish claim to make with this in mind.

What we all know about hockey is that it's the most fluid and unpredictable sport that exists at this level. This is clearly why stats are always going to have limitations in predictive power. That doesn't change the fact that stats have a value of predictive power, like any other stat in the world it is testable, and some metrics outperform others for well-established reasons. It is utterly conclusive, for example, that shot events predict team success better and more often than goals for and goals against. This makes complete sense, and I'll outline why next sentence, but I've seen this fact get confused with the idea that goals for/against are better at currently describing the standings, which is also true. That's because goals for/against tells you how good a team has been up to this point, and a better number inherently implies more wins because that is what differentiates team in a hockey game. But when you use the current standings to try to predict the standings in 41 games, the correlation factor is half of what it is when you use various shot based metrics. This is sensible because goals for/against are low-frequency events that can theoretically be bumped up or dragged down by unsustainable stretches. Shots can too, but since so many more of those happen that gets smoothed out far quicker. What that means is that it's statistically significant to say that you're going to have better standings predictions on average if you only had access to shot-based metrics* than if you only had access to goal based metrics, but R^2/T values are also not high enough to believe you have the entire league figured out for the next decade. Sports would be trivial and boring if this were the case. If you find me one serious stat-user that has ever claimed anything of the sort that the most-common straw-man rails against, I will literally give you a million dollars.

Further, we HAVE more context than the idealized scenario mentioned above, which gets USED by people who understand stats. The great irony here is that certain members of the goal event religion are well noted to regularly post opinions citing not a single thing, stats or otherwise, besides the goal based metric and massive sweeping generalizations that come with it. They actively choose an objectively worse metric and do worse than the stats community by avoiding context in any manner at all costs.

Whereas coupling stats together, noting usage, noting stats relative to teammates, supplying physical descriptions and reasons for why things look the way they do, emphasizing luck and immeasurable things, and bundling all of this in a package in an attempt to understand, explain and suggest why we're bad and what might make us better is the ultimate goal of anybody that seriously uses a stat.

Of course, if we find amateurs such as myself who treats the stats as religion, they should be ostracized for it. Cold, unemotional reason drives productivity in this area forward.

*Some other interesting results I've found are that the best predictor for future scoring is expected goals for, while the best predictor for future goals allowed is actually corsi-against. And again, their regression analyses give results that would make my adviser kick me out of school, but any one of the stats community will tell you that stats can only explain a small fraction of the standings, according to analysis. It's just a bigger chunk than other stats that get love and adoration (and which should get that love and adoration in the context of standings and happenings RIGHT NOW - however, that should stop there when carrying forward discussions about the future). This is why I happily say "oh man, the Sabres' -50 goal differential is simply dreadful." and still sleep at night.

A small and interesting real-world application of the differences in predictive power manifesting is that, between 2009/2010 and 2014/2015, All six cups are accounted for by the top four NHL teams in corsi during that stretch, whereas the top 7 teams in raw goal differential (and this isn't even using PREDICTIVE power as much as DESCRIPTIVE - corsi does better in other situations than it's doing here, and it's still winning) only contain 4 of the six cup winners. I don't know how far back you have to go to get the other two, but I think it's pretty far. The top corsi teams also have a more playoff series wins in that stretch as well. This is an unsurprising result, and it should also be noted that it is not the result of a claim that EVERY CORSI LEADER WILL WIN EVERY CUP AND IN A SEVEN GAME SAMPLE SIZE NO BAD CORSI TEAM CAN BEAT A GOOD CORSI TEAM HER DER unlike the accusations that seem to get thrown every single year for some reason. I think users of advanced stats have a complete and stable grip on what their stats do and don't say, and I can't say that for a lot of people that hate advanced stat usage.

Stats are a tool that can be incorporated to give you more information and slightly better analysis of hockey. They claim no more and no less than this ability. Anybody that does, or asserts that this claim is made when it isn't, should be dismissed. However, due to their limited scope, I always try to provide as many stats, both advanced and non-advanced, as possible, along with video and photos if I have time, and certainly with an attempt to possibly tie things into the hockey we can see, as there IS an undeserved contempt for what your eye can see among some of the stats community as well - they aren't saints either, overall, even if they do understand their stats, they sometimes miss out on other things.

Edited July 6, 2018 by Randall Flagg

SDS · July 6, 2018

6 minutes ago, Randall Flagg said:

The straw man is that advanced stats communities claim to have predictive power over every single outcome to happen in hockey.

People just don't understand statistics and randomness in general. A 60% chance of an event occuring does not mean it happens 100% of the time just because it is more likely. It should be obvious, but it isn't because most people are bad at math. Really bad.

Samson's Flow · July 6, 2018

21 hours ago, Samson's Flow said:

I just feel like there is another 10 paragraph megapost from Randall coming in this thread...

I knew it was coming! Randall at least gave us a cliff notes version this time ?

Randall Flagg · July 6, 2018

It's also unfortunate that using obnoxious stats jargon makes you sound fervently religious about it by default. It's hard to talk about the stupid things without coming off as elitist and eye-rollingly self-righteous.

Eye-rollingly. Going to show that one to my significant other who writes for a living

Samson's Flow · July 6, 2018

9 minutes ago, Randall Flagg said:

It's also unfortunate that using obnoxious stats jargon makes you sound fervently religious about it by default. It's hard to talk about the stupid things without coming off as elitist and eye-rollingly self-righteous.

Eye-rollingly. Going to show that one to my significant other who writes for a living

Don't worry I get that. As a guy who knows and uses a lot of advanced stats in baseball (WAR, xFIP, wOBA, DRS, etc.) they tend to come up in my conversation when I am trying to justify my preference for one player over another. Just using those terms when discussing baseball with the 'average' fan makes me sound elitist, and I'm well aware of that negative perception. But I sure as $hit am not going to use traditional stats like wins or RBI's when there are better predictive stats available.

pi2000 · July 6, 2018

Good post Flagg, thanks for that. I have a better understanding of advanced stats and the role they play.

My only argument is that for every season, going back forever, the league standings will almost always 100% align with goal differential. We don't see this same correlation with CorsiFor%.

That said, wouldn't it be true that goal differential is a better predictor of future success than CF%?

Samson's Flow · July 6, 2018

6 minutes ago, pi2000 said:

Good post Flagg, thanks for that. I have a better understanding of advanced stats and the role they play.

My only argument is that for every season, going back forever, the league standings will almost always 100% align with goal differential. We don't see this same correlation with CorsiFor%.

That said, wouldn't it be true that goal differential is a better predictor of future success than CF%?

As Flagg mentioned (I think), goal differential is aligned with past performance but is less successful in predicting future performance due to the streaky/random nature of goal scoring. CorsiFor% on the other hand is a stronger predictive stat since over a longer sample the teams generating the most shot events have proven to be most successful.

EDIT: I should clarify and say that goal differential is a good shorthand for overall team performance, its just that CF% is the better predictive stat.

Edited July 6, 2018 by Samson's Flow

Randall Flagg · July 6, 2018

9 minutes ago, pi2000 said:

Good post Flagg, thanks for that. I have a better understanding of advanced stats and the role they play.

My only argument is that for every season, going back forever, the league standings will almost always 100% align with goal differential. We don't see this same correlation with CorsiFor%.

That said, wouldn't it be true that goal differential is a better predictor of future success than CF%?

That's what I tried to point out - if you freeze time at any given moment, looking at goal differential will closely align with the standings, in general more so than corsi up to that point. But the tests which show an improvement in advanced stats over regular ones are taking each of those numbers right now, and using them to check the standings in 40 games, or any time in the future. Those are where advanced metrics do better even if the standings now are the same. Namely, if you just stick with the standings now for 40 games from now, there is statistically significant analysis that shows you will have a worse prediction than those taking Corsi from right now (and those guys worse than those who use a maximized combination of individual stats).

So no, that isn't true. Here's an example of a test done with the stats:

GF% is plus-minus converted into a percentage. It is the goal differential stat and it shows comprehensively worse predictive power at a team and individual player level, which means it's simply less useful. Note that the claim isn't that shot-based metrics are a key to the universe, either. Those R^2 are low. They aren't so low as to be useless, and we should note that if they approached 1, then you'd never have to watch a hockey game again to know exactly what will happen in what order, and we know that sports aren't like that and shouldn't be like that.

So in general, call your team's goal differential X. Right now, it is a good indicator of where you are in the standings. But if you use X now to figure out X later, (and this X later will be a good descriptor of where you are later) you will in general do worse than if you use shot-based metrics to predict your X later, which will tell you a lot about standings later, and this is shown to be comprehensively true through rigorous analysis designed to figure out exactly that ability of any stat on the planet.

Randall Flagg · July 6, 2018

Also note that all stats are weaker when analyzing specific players than with overall teams. that's why the example I highlighted in the first post is so telling (6 cup winners in 6 years in top 4 in Corsi) whereas there are some damn good players with damn bad corsi (and plus minus and everything but production stats). That's why when I talk about individuals I try to incorporate as much context as humanly possible because you can arrive to some dreadful conclusions if you don't take lots of things into account. Bad (or good) teams can do wild things to an individual player, which is why Adam Larsson gets traded for Taylor Hall I guess.

Sign In

Advanced Stats Explained

Recommended Posts

LGR4GM

Corsi

LGR4GM

IKnowPhysics

LGR4GM

Samson's Flow

Derrico

LTS

Randall Flagg

pi2000

dudacek

LGR4GM

Derrico

LGR4GM

Drunkard

Happy Days

LGR4GM

Randall Flagg

SDS

Samson's Flow

Randall Flagg

Samson's Flow

pi2000

Samson's Flow

Randall Flagg

Randall Flagg

Join the conversation

Browse

Activity