About DATARENA

Data professionals, reimagining public sports analytics

We are Data People

We are a group of data scientists, engineers and analysts passionate about our craft. We believe in learning by doing, and in the power of community.

DATARENA is the manifestation of our group's spirit pointed towards public sports data. We've embarked on a journey to simplify and enhance public sports analytics, starting with the beloved game of hockey.

Mission

Our mission is clear: to deliver advanced sports analytics that are accessible to everyone. We believe that sports analytics should be community-driven, openly accessible, and constantly evolving. Hockey is just the beginning of our journey.

Approach

We approach our work with open-mindedness, a commitment to excellence, and fun. We value community feedback and collaboration, and we're constantly seeking innovative ways to improve our analytics tools and services.

Data Quality

In the world of data, quality is paramount. It's often said that the output is only as good as the input, and this holds especially true in the realm of public hockey data.

At DATARENA, we've dedicated over a year to meticulously crafting what we proudly consider a new gold standard in public hockey databases, which we were fortunate enough to share at the 2022 SeaHack conference.

Our ecosystem stands as a testament to our commitment to cleanliness, comprehensiveness, and cutting-edge quality. To achieve this, we've harnessed the power of tools like dlt, DuckDB, dbt, and a calibrated machine-learning expected-goals (xG) model, alongside an array of data science and engineering resources to unearth high-quality insights from top-tier data sources.

Data Source

Our primary data source is the NHL API, which serves as the backbone of our data ecosystem. We follow a rigorous process of data extraction, transformation, and modeling to create structured relational tables that fuel all our downstream operations.

While working with this API can be quite intricate, we owe our success to the dedicated efforts of the community and its members. Their work has enabled us to navigate the complexities of this data source, despite its often poorly documented and occasionally buggy nature.

Predictive Models

We explore, test, and build our own predictive models for the purposes of quantifying impact and understanding sports as objectively as we can.

Predictive models like Expected Goals (xG) analyze historical data on factors like shot location, angle, and player positions to estimate the likelihood of a shot resulting in a goal. By using xG, we can try to assess a player (or team's) offensive and defensive performance, help inform lineup decisions, and develop strategic game plans based on data-driven insights.

However, these models are not perfect. Elements like "time" and "space" are crucial to the "eye-test" when evaluating a shot's Expected Goal, but the key data points required to measure this are not available to the public. Passing events and player live-tracking events do exist, but they are not (yet) public.

Assumptions & Limitations

It's important to clarify that our current focus is descriptive rather than predictive in nature. While we employ predictive models like xG, their primary purpose is to describe expected outcomes versus actual events — for example, comparing observed vs. actual on all Fenwick shots (shots on target + missed shots, excluding blocked shots).

Achieving a true player evaluation involves isolating individual effects and making predictions based on available data, utilizing methods such as WAR, SPAR, or RAPM. We are committed to exploring these advanced methods and maintaining transparent communication about our assumptions, limitations, and methodologies.

Acknowledgments

We want to express our sincere gratitude to the existing work in the public hockey analytics space that has greatly influenced our own endeavors. Our work wouldn't have been feasible without the invaluable contributions of those who preceded us.

A special mention to JFresh and Dom Luszczyszyn, who consistently share exceptional work on their respective X accounts. Your contributions have paved the way for progress in hockey analytics, and we are excited to join this journey of innovation and discovery.