Jun 4, 2015
Data Mining the Play-By-Play
At long last, my thesis is complete:
Overall summary:
- Linear regression models demonstrating:
- Full-strength tied Corsi% as a useful predictor for win percentage and goal percentage (“Full-strength tied Corsi” refers to a team’s shot attempts when the game is tied and with five skaters vs five skaters)
- The surprisingly distinct contributions of powerplay and the penalty kill
- Pretty graphs! I mean, “data visualization”.
- A demonstration of how association rule learning and interest measures can be used to evaluate players and player combinations
Less technical summary:
- If your team regularly has more shot attempts than opponents while the game is tied and while at full strength, that’s good
- The powerplay and penalty kill don’t seem to overlap much and have roughly equal value in explaining win percentage
- Pretty graphs! I mean, “data visualization”.
- Ever wondered who really carries the second line? A machine learning technique called “association rule learning” can help tease out the details to see which players tend to contribute more (and which ones should be bumped from the second line to the first).