Thursday, February 5, 2015

Preliminary Analysis of Error in NHL's RTSS Data

[Edit: Error in RTSS data was first documented in 2007 by Alan Ryder in this paper. Thanks to an anonymous commenter for pointing this out.]


Introduction


It's no secret that the NHL's Real Time Scoring System (RTSS) is flawed. In 2009, hockey analytics pioneer JLikens wrote about the prevalence of bias in the NHL's shot data. Michael Schuckers and Brian MacDonald have since developed models to correct for the level of statistically-evident bias in each NHL rink. These types of adjustments are helpful, but the true level of error can not be observed without a comparative data set. Few have taken on the challenge of collecting the data themselves. Chris Boyle is one of these brave souls. In 2013, he conducted a comparative analysis in which he found that 10 of 32 shots on goal "were off by more than 10 feet" and that approximately half of the shots "were accurate to within 5 feet."

The analysis herein is similar to Boyle's, but thanks to the incredible work done by WAR On Ice, we can take it a few analytical steps further. Using WAR On Ice's shot plots, we assess the RTSS location error for goals, shots on goal, missed shots, and blocked shots. Accurate shot locations are plotted as an overlay on the RTSS plots, allowing us to clearly see the discrepancies.


Methods


Game selection.
To select which games to analyze, three criteria had to be met. The games had to be (1) recently played, (2) in different rinks and, (3) with no selection bias on my part. The easiest solution was to pick one day with a decent number of games. This day was January 21st 2015.

Location capture.
To capture the location from which each shot was taken, NHL Gamecenter was used. Slow-motion replay captured the location of the puck at the moment prior to the puck moving towards the goal as the result of the shooting motion. Or in the case of deflections, the location of the puck at the moment it was tipped. While I don't have video software that returns precise x,y coordinates, the offensive zone provides concrete reference points like the faceoff circles and hash marks. Using these I was able to locate shots with a high-level of precision.

RTSS shot locations are plotted by WAR On Ice's coloured letters: the red G for goal; the blue S for shot on goal; the black M for missed shot; and the green B for blocked shot. The Actual shot locations are plotted with bolded rings. Black lines have been added to connect RTSS and Actual plots where it may not be obvious that they represent the same shot.


Goal Data

We begin our analysis with goal data. This is the shot-type that affords the official NHL "scorers" the greatest possible benefit of the doubt. This is the only shot-type that comes guaranteed with replays and a 45-second break. In other words, we can assume (but we won't) that goal location data is the most accurate of all the shot-types. Goals are also the rarest of the shot-types, making them easier to distinguish on the plots.

For this comparison the Actual plot rings are colour-coded to reflect the degree of error in the RTSS plots. I'm using the following criteria to measure error: the distance between the Actual and the RTSS plot, the shooting angle difference between the Actual and the RTSS plot, and whether they differ in being in/out of the scoring chance area. Because I want to publish this before the Analytics Conference and don't have time to measure the actual deviations in distance and shooting angle, I'm using the following three-point scale:
  1. Accurate (green). RTSS and Actual plots have some degree of overlap, and all of the error criteria listed above are met.
  2. Acceptable (orange). RTSS and Actual are not a significant distance apart, and at least 2 of the 3 error criteria are met. 
  3. Unacceptable (red). One or fewer criteria are met.
(Feel free to be your own judge, too.)


Results


Note: war-on-ice adjusts RTSS location data, so these results reflect the error in RTSS that persist after being adjusted by war-on-ice.


Game 1 - Toronto @ Ottawa at the Canadian Tire Centre.
  • 4/7 Accurate
  • 2/7 Acceptable
  • 1/7 Unacceptable


Game 2 - Chicago @ Pittsburgh at Consol Energy Center.

  • 3/4 Accurate
  • 1/4 Acceptable


Game 3 - Columbus @ Winnipeg at the MTS Centre

  • 3/4 Accurate
  • 1/4 Unacceptable


Game 4 - Boston @ Colorado at the Pepsi Center.

  • 3/4 Accurate
  • 1/4 Acceptable


Game 5 - Calgary @ Anaheim at the Honda Center

  • 3/8 Accurate
  • 2/8 Acceptable
  • 3/8 Unacceptable



Game 6 - Los Angeles @ San Jose at the SAP Center

  • 1/5 Accurate
  • 2/5 Acceptable
  • 2/5 Unacceptable
  • (The barred goal represents Couture's empty-netter and is therefore not counted)


Totals: 32 non-empty net goals were scored in 6 different buildings.
  • 53% of the goals were accurately plotted (17/32)
  • 25% were acceptably plotted (8/32)
  • 22% were unacceptably plotted (7/32)


Goals at MSG


WAR On Ice co-founder @acthomas noted that their models deal with a relatively high level of error/bias with data from Madison Square Garden. Using the same methods as above, we look at the 5 most recent games played at MSG going back from Jan 21st 2015. This provides a recent sample of games without any selection bias on my part.


Results


Game 1 - January 20th 2015 vs. Ottawa
  • 2/5 Accurate
  • 2/5 Acceptable
  • 1/5 Unacceptable


Game 2 - January 13th 2015 vs. NY Islanders
  • 1/3 Accurate
  • 1/3 Acceptable
  • 1/3 Unacceptable


Game 3 - January 3rd 2015 vs. Buffalo
  • 1/7 Accurate
  • 3/7 Acceptable
  • 3/7 Unacceptable - 1 of these represents a first period goal that is missing from WAR On Ice's shot plot. This could be due to two goals having the exact same coordinates and therefore overlapping each other perfectly.


Game 4 - December 27th 2014 vs. New Jersey
  • 1/3 Accurate
  • 1/3 Acceptable - significant change in shooting angle
  • 1/3 Unacceptable - significant change in shooting angle, and difference in scoring chance area.


Game 5 - December 23rd 2014 vs. Washington
  • 1/6 Accurate
  • 2/6 Acceptable
  • 3/6 Unacceptable - the most egregious of these is a goal that was initially credited to St. Louis in front of the net, but was later credited to Rick Nash from the right circle. This is not necessarily the fault of the scorers, but rather a fault in the nature of real time data.  

Totals at MSG: 24 non-empty net goals were scored
  • 25% were plotted accurately (6/24)
  • 37.5% were plotted acceptable (9/24)
  • 37.5% were plotted unacceptably (9/24)
The data suggests that MSG does indeed have a problem with collecting location data, above and beyond the error already exhibited in every single arena for which data was collected.


Shots on Goal, Missed Shots, Blocked Shots


Below are the shot plots from the first, second, and third periods of the Calgary @ San Jose game which took place on January 17th, 2015.

In the modified plots below, the Actual plot rings are given the same shot-type colour code that WAR On Ice uses to plot the RTSS data. The Actual rings are given diagonal bars in cases where a shot-type discrepancy occurs between the Actual plots and the RTSS plots. The ring colour represents the Actual shot-type, and the bar colour represents the shot-type recorded by RTSS. A red bar indicates RTSS didn't track the shot at all.


1st Period




2nd Period


3rd Period


The same three-point accuracy test used for goals is applied to each shot. Note that due to the volume of shots, it's more difficult to relate each Actual plot to its RTSS counterpart. I've done this as best as I can, offering the benefit of the doubt to RTSS where there is discretion.

In the first three periods of this game, a total of 33 shots on goal and 20 missed shots were taken.


Shots on Goal

  • 30.5% were accurately plotted (10/33)
  • 48.5% were acceptably plotted (16/33)
  • 21% were unacceptably plotted (7/33)


Missed Shots

  • 35% were accurately plotted (7/20)
  • 45% were acceptably plotted (9/20)
  • 20% were unacceptably plotted (4/20)

Shots on goal and missed shots scored similarly. This is to be expected because there is no inherent difference between tracking a shot on goal and a missed shot in terms of location.


Blocked Shots


The Actual and RTSS plots for blocked shots have different operational definitions, making a location comparison impossible. The Actual plots show the location from where each shot is taken. Conversely, RTSS data plots the location of where the shot gets blocked. This is a critical consideration when developing scoring chance metrics using RTSS blocked shot location data. If the NHL is getting serious about stats like Corsi, the league must understand that blocked shots are important because of the shot itself, not because the shot gets blocked. As such, the NHL should ensure that location data for blocked shots captures where on the ice the shot is taken from. (You can track both locations if your heart so desires.) 


Insertion, Deletion, & Substitution Errors


I currently have count discrepancies in the number of shots recorded by RTSS and myself for two complete games. Moving forward, I will compare all of my tracked shot data against RTSS data. It provides for a level of quality control, however minimal, and keeps tabs on the number of shot errors in RTSS data.

I compared my data to RTSS by going through both sets shot by shot, tagging every discrepancy and reviewing it using NHL Gamecenter video. In this analysis, only non-discretionary discrepancies are included. Three types of errors are tracked:
  • Insertion errors - RTSS records a shot where a shot should not be recorded (i.e. false positive)
  • Deletion errors - RTSS does not record a shot where a shot should be recorded (i.e. false negative)
  • Substitution errors - RTSS records the shooting player or shot type incorrectly.
Each shooting play can only be credited with one error, so that the total number of errors represents the total number of faulty shooting plays.


Results


Edmonton @ Calgary - January 31st 2015
  • 3 insertion errors
  • 12 deletion errors
  • 8 substitution errors
Calgary @ San Jose - January 17th 2015
  • 6 insertion errors
  • 11 deletion errors
  • 1 substitution error
As a result, RTSS recorded (for both games combined):
  • a total of 82 shots on goal, when only 75 actually took place (+9% error).
  • a total of 119 missed + blocked shots, when 132 actually took place (-10% error)
Note that WAR On Ice uses Schuckers & MacDonald's rink effect model to adjust shot counts - this may correct some of the error presented here.


Discussion


There is virtually no error in RTSS counts for total shot attempts. Insertions offset deletions, and substitution errors do not affect the totals. However, error could affect analyses conducted on individual players (e.g. iCF) over small sample sizes.

Not surprisingly, shots on goal are over-reported. Scorers in general are trigger happy, often recording dump-ins and broken plays that trickle on to goal as shots. At least RTSS records the zone from which shots are taken; for stats like Corsi, shots from the neutral and defensive zones can be parsed out of the counts. For goalies the result is over-inflated save percentage stats, but relatively this should have little to no effect, especially over large sample sizes.

The RTSS location data is where error abounds. This has a major impact on location-based metrics such as scoring chances. Consistently across all shot types, at least 20% of shots are plotted with a high degree of error. As expected, (non-MSG) goals fare the best, being plotted accurately 53% of the time. Shots on goal and missed shots are plotted accurately around 35% of the time. Blocked shots present an entirely different problem, as RTSS tracks the location of where the shot gets blocked and not where the shot is taken. For scoring chance metrics, this means that RTSS blocked shot plots are virtually inadmissible.

The RTSS location data isn't useless - WAR On Ice has developed predictive scoring chance metrics despite the error. But the potential for improvement is huge, and rests on collecting accurate data. So where do we go from here?

With Sportvision slated to insert itself into every puck and jersey by the start of next season, many of the problems discussed here could be solved. For one, the technology has the ability to track precise shot locations. As of now though, we don't know if and when this data will be made public. Sportvision also has the potential to capture novel data sets such as passing data, but it could be years until they develop the code to parse the raw data, and the question surrounding public availability remains.

The idea that Sportvision is going to solve the community's analytical needs is misguided. Technology has always worked best in conjunction with human ability, and that could not be more true for hockey analysis. Chips in pucks and jerseys capture the location and speed of on-ice events, but only humans can judge intent. Intent-based judgments are critical in differentiating between skilled plays and luck plays, a critical factor for predictive models. This is something technology simply can not assess.

Better data is here, right now, displaying itself openly every time we watch a game. It's time for the community to step up and collect the data - any data we want! - using technology as an aid and adhering to scientific principles. The potential is massive, it's present, and it doesn't rest in the hands of the NHL or Sportvision. It's in our hands.


***

References


"Product Recal Notice for 'Shot Quality'". Alan Ryder. http://hockeyanalytics.com/2007/06/product-recall-notice-for-shot-quality/

"Fancystats community shocked, saddened to learn of passing of hockey analytics pioneer "JLikens", a.k.a. Edmonton lawyer Tore Purdy". Bruce McCurdy. http://blogs.edmontonjournal.com/2014/05/25/fancystats-community-shocked-saddened-to-learn-of-passing-of-hockey-analytics-pioneer-jlikens/

"Home Recording Bias: Shots on Goal". JLikens. http://objectivenhl.blogspot.ca/2009/03/in-previous-posts-it-was-shown-how-some.html

"Accounting for Rink Effects in the National Hockey League's Real Time Scoring System". Michael Schuckers and Brian MacDonald. http://arxiv.org/pdf/1412.1035.pdf

"How Reliable is the NHL.com Shot Tracker?". Chris Boyle. http://www.habseyesontheprize.com/2013/2/20/4005122/how-reliable-is-the-nhl-com-shot-tracker

"NHL.com adding Corsi, Fenwick, enhanced stats next month". Greg Wyshynski. http://sports.yahoo.com/blogs/nhl-puck-daddy/nhl-com-adding-corsi--fenwick--enhanced-stats-next-month-233506566.html

"NHL, Sportvision test program to track players, puck". Corey Masisak. http://www.nhl.com/ice/news.htm?id=750201

2 comments:

  1. The problem was first documented in 2007:
    http://hockeyanalytics.com/2007/06/product-recall-notice-for-shot-quality/

    ReplyDelete
    Replies
    1. Thanks for linking this my way. I've put it in my notes and will edit accordingly.

      Delete