About DATARENA
Data professionals, reimagining public sports analytics
We are Data People
We are a group of data scientists, engineers and analysts passionate about our craft. We believe in learning by doing, and in the power of community.
DATARENA is the manifestation of our group's spirit pointed towards public sports data. We've embarked on a journey to simplify and enhance public sports analytics, starting with the beloved game of hockey.
Mission
Our mission is clear: to deliver advanced sports analytics that are accessible to everyone. We believe that sports analytics should be community-driven, openly accessible, and constantly evolving. Hockey is just the beginning of our journey.
Approach
We approach our work with open-mindedness, a commitment to excellence, and fun. We value community feedback and collaboration, and we're constantly seeking innovative ways to improve our analytics tools and services.
Data Quality
In the world of data, quality is paramount. It's often said that the output is only as good as the input, and this holds especially true in the realm of public hockey data.
At DATARENA, we've dedicated over a year to meticulously crafting what we proudly consider a new gold standard in public hockey databases, which we were fortunate enough to share at the 2022 SeaHack conference.
Our ecosystem stands as a testament to our commitment to cleanliness, comprehensiveness, and cutting-edge quality. To achieve this, we've harnessed the power of tools like dlt, DuckDB, dbt, and a calibrated machine-learning expected-goals (xG) model, alongside an array of data science and engineering resources to unearth high-quality insights from top-tier data sources.
Data Source
Our primary data source is the NHL API, which serves as the backbone of our data ecosystem. We follow a rigorous process of data extraction, transformation, and modeling to create structured relational tables that fuel all our downstream operations.
While working with this API can be quite intricate, we owe our success to the dedicated efforts of the community and its members. Their work has enabled us to navigate the complexities of this data source, despite its often poorly documented and occasionally buggy nature.
Predictive Models
We explore, test, and build our own predictive models for the purposes of quantifying impact and understanding sports as objectively as we can.
Predictive models like Expected Goals (xG) analyze historical data on factors like shot location, angle, and player positions to estimate the likelihood of a shot resulting in a goal. By using xG, we can try to assess a player (or team's) offensive and defensive performance, help inform lineup decisions, and develop strategic game plans based on data-driven insights.
However, these models are not perfect. Elements like "time" and "space" are crucial to the "eye-test" when evaluating a shot's Expected Goal, but the key data points required to measure this are not available to the public. Passing events and player live-tracking events do exist, but they are not (yet) public.
Assumptions & Limitations
It's important to clarify that our current focus is descriptive rather than predictive in nature. While we employ predictive models like xG, their primary purpose is to describe expected outcomes versus actual events — for example, comparing observed vs. actual on all Fenwick shots (shots on target + missed shots, excluding blocked shots).
Achieving a true player evaluation involves isolating individual effects and making predictions based on available data, utilizing methods such as WAR, SPAR, or RAPM. We are committed to exploring these advanced methods and maintaining transparent communication about our assumptions, limitations, and methodologies.
Acknowledgments
We want to express our sincere gratitude to the existing work in the public hockey analytics space that has greatly influenced our own endeavors. Our work wouldn't have been feasible without the invaluable contributions of those who preceded us.
- Drew Hynes' nhlapi repository
- Evolving Hockey
- HockeyViz
- Moneypuck
- Natural Stat Trick
- CapFriendly (shut down 2024)
- Hockey Prospecting
- All Three Zones
- EP Rinkside
A special mention to JFresh and Dom Luszczyszyn, who consistently share exceptional work on their respective X accounts. Your contributions have paved the way for progress in hockey analytics, and we are excited to join this journey of innovation and discovery.
