December 10, 2024 in Data Science Education
The SCORE Network
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/orms.2024.04.07
The consensus among researchers and educators strongly supports the use of real-world data in teaching data science and statistical concepts; however, two main obstacles often limit educators: time and concept alignment. Educators often struggle to find sufficient time to identify and curate real-world datasets that align with lesson objectives or reinforce key statistical or data science concepts. Consequently, educators often resort to prepackaged datasets as part of a textbook, well-established historical and pedagogical datasets often used repeatedly, or simplified or artificial “toy” datasets. Even when educators do find a dataset that has a clear use case, they often don’t fully develop the lesson for a broad curriculum or provide their content online for others to use.
To address these issues, and to motivate student interest in statistics, data science and sports analytics, we created the SCORE Network (https://scorenetwork.org/). SCORE stands for Sports Content for Outreach, Research, and Education and is an NSF-funded project that has the following goals:
- Create a sustainable national network of academic, industry, media and government partners to build a platform for the purpose of elevating data science education, particularly in underrepresented populations and minorities.
- Develop, implement, evaluate and disseminate an agile, educational framework based on case-based learning [1], an authentic learning method involving real-world problems and applications [2].
- Support and generate educational research on different data science education delivery modalities (in-person, virtual, hybrid) for student and instructor subpopulations.
Although the initial genesis of the project was long-term relationships among academics in statistics, data science and operations research who have incorporated sports analytics applications in their classroom, the network has matured to include industry partners to create peer-reviewed content leveraging unique sports-related datasets.
The SCORE Network is a novel academic/industry/government public-private partnership with hubs at academic institutions across the United States and a broad set of industry and academic collaborators. We have three regional hubs in Pittsburgh (Pennsylvania), Texas and New York. The “Pittsburgh Hub” includes faculty from Carnegie Mellon University and the University of Pittsburgh; the “New York Hub” is a team from St. Lawrence University, United States Military Academy West Point, University of North Carolina at Charlotte, University of St. Thomas (Minnesota) and Yale University; and the “Texas Hub” is composed of faculty at Baylor University and Azusa Pacific University. For each university, we have designated a lead faculty member who is responsible for the oversight and management of their hub’s activities.

In addition to the hubs, the SCORE Network includes dozens of partners working in organizations in academia and the sports industry, including research institutions; Hispanic Serving Institutions (HSI); liberal arts colleges; a women’s college; professional sports teams from the National Football League (NFL), National Basketball Association (NBA), National Hockey League (NHL) and Major League Baseball (MLB); sports data providers; sports media organizations (ESPN, FiveThirtyEight); and the U.S. Olympic & Paralympic Committee (USOPC). Partner roles include two NHL general managers, the director of data and analytics at the NFL, on-air talent at ESPN, writers from ESPN, an NHL scout, and organizers of Women in Data Science and R-Ladies events. Among members of the network, we have coverage and expertise in football, basketball, baseball, hockey, tennis, lacrosse, swimming, diving, figure skating, gymnastics and volleyball, among others. Although fewer analysts focus on women’s professional sports, some publicly available resources exist (e.g., https://atevenstrength.com/ for hockey and https://herhoopstats.com/ for basketball). The SCORE Network is especially dedicated to advancing projects and data science applications in women’s sports.
Implementing SCORE
The building blocks of SCORE are the case-based learning modules. Modules are stand-alone materials that instructors can incorporate into their courses. The modules are submitted work by academics, students and industry partners that focus on introducing foundational topics in undergraduate data science and analytics, using real data from sports as a vehicle for application. The content levels span from novice (wrangling data using software) to advanced (regularized generalized linear models) and use everything from traditional sports such as football and baseball to nontraditional sports such as esports, swimming and marathon running.
Modules that are published as part of SCORE are peer-vetted to ensure quality, appropriateness and completeness. Each module has at least two reviewers, one industry reviewer who is familiar with the sport and one pedagogical reviewer who has experience teaching the concepts in the module. When possible, student reviewers are also found to get a user’s perspective. Pedagogical reviewers consider the motivation, content, learning objectives, data and its documentation, and language of the module, as well as whether it meets best instructional practices. Industry reviewers address the module’s connection, motivation, application and clarity for the relevant sport. Student reviewers provide feedback on the module’s clarity of task, tone, engagement, length and required background. After these reviews are completed, they are compiled by the associate editor and a recommendation is made to the editor. The editor makes a final decision and communicates that to the author(s). If revisions are needed, the editor works with the author(s) to ensure those happen before publication.
The SCORE Network provides both a repository of modules (see SCORE Module Repository at https://modules.scorenetwork.org/) along with a repository for data sources (see SCORE Sports Data Repository at https://data.scorenetwork.org/) that educators can either use in the classroom or to create their own modules. A module typically consists of learning goals, a motivational introductory video (optional), a set of statistical or data science methods, exercises and activities, a conclusion, dataset(s), data glossary and a README file describing the contents of the module. Meta tags are also gathered as part of the submission process. These tags allow users to efficiently search for modules in the SCORE repository and can include the academic level of the module, the topic(s) covered and the sport(s) involved. We aim to have a motivational introductory video for each module that is made by someone involved in the sport. A video is not required for submission, and members of SCORE are happy to use our connections and expertise to find help for developing a video. Additionally, a module might include handouts, presentation slides, written discussion of results, solutions to handout(s) and code.
A module need not be fully complete for submission; educators and others are encouraged to submit ideas for modules as well. In these cases, the SCORE team will assign the submitter a “coach” who will help the submitter flesh out the required components.
Educational Projects
We were fortunate to receive funding for the SCORE Network from the National Science Foundation through their Division of Undergraduate Education, DUE Award 2142705. The NSF is one source of funding for educational projects such as this. The U.S. Department of Education, Simons Foundation and state departments of education are other sources for support of pedagogical initiatives in the quantitative sciences.
A full competitive proposal to any of these organizations is a major undertaking. Funders want to have a full understanding of the goals of the project, how the project will be implemented, how funds will be spent, how the initiative will be evaluated and the anticipated outcomes/benefits of the project. In our case, our initial proposal was rejected by NSF, but with detailed comments. After discussions with an NSF program manager, we modified our proposal to address the feedback. Conversations with program managers at the funding agency as well as experienced research staff at your institution are advisable before preparing and submitting a proposal.
In our situation, there were several factors that helped strengthen our grant proposal. Our topic of sports has broad appeal, as nearly half of all U.S. adults watch sports once per month [3]. Sports data is also readily available because there are numerous publicly available data sources that can be accessed for free via APIs or R/Python libraries. We enhanced our position by leveraging our network, which brings together collaborators from various sectors – academia, industry, media and government – offering unique expertise and exposing students to diverse perspectives. In many cases, these collaborators have already used the aforementioned publicly available data to create teaching resources.
Over the past two years, we have been spreading the word about the SCORE Network and building the infrastructure necessary for the initiative. In that time, we conducted an all-day workshop at the U.S. Conference on Teaching Statistics, published in the Proceedings of the International Conference on Teaching Statistics, and given 24 presentations on the SCORE Network. As of this writing, we have received 56 module submissions, 12 of which have been published, and another 33 are available as preprints and are linked from the SCORE website. The SCORE sports data repository has more than 75 publicly available sports datasets.
The SCORE Network is funded for two more years, and we have plans to continue to grow beyond this timeframe. Along with offering a repository for peer-vetted resources, we will be analyzing how academia can better partner with industry to enhance the classroom experience as well as ensure that the needs of potential employers are being met through our educational pathways. We will also analyze how students engage with the content to assess whether stand-alone modules, inspired by real-world sports examples, help students from diverse educational and socioeconomic backgrounds more effectively learn data science concepts.
For someone who wants to join our efforts, there are several roles that they can assume. We need creators of modules as well as reviewers, both pedagogical and industry. Anyone is welcome to join the SCORE Network and receive updates on the work that we are doing. For a sign-up form, or for questions or more information about the SCORE Network, please visit scorenetwork.org.
References
- Leake, 1996, “Case-based reasoning: Experiences, lessons and future directions,” Cambridge, MA: MIT Press.
- Herrington, T. C. Reeves and R. Oliver, 2014, “Authentic learning environments,” Handbook of Research on Educational Communications and Technology, pp. 401-412, New York: Springer.
- https://www.statista.com/statistics/1310558/live-sports-viewers-us/
Brian Macdonald is a senior lecturer and research scientist in the Department of Statistics and Data Science at Yale University, where he focuses on statistics and data science education, sports analytics and environmental data science. He was previously the director of Sports Analytics at ESPN and director of Hockey Analytics with the Florida Panthers Hockey Club, and held faculty positions at West Point, Carnegie Mellon University, University of Miami and Florida Atlantic University. He received a Bachelor of Science in Electrical Engineering from Lafayette College, and a Master of Arts and Ph.D. in mathematics from Johns Hopkins University. Col. Nicholas Clark is an associate professor in the Department of Mathematical Sciences at West Point. Nick received a B.S. in mathematics from West Point in 2002, an M.S. in statistics from George Mason University in 2010, and a Ph.D. in statistics from Iowa State University in 2018. His dissertation was on “Self-Exciting Spatio-Temporal Statistical Models,” and he has published in a variety of disciplines, including spatio-temporal statistics, best practices in statistical methodologies, epidemiology and sports statistics. Nick is the former director of the Center for Data Analysis and Statistics and former program director for West Point’s Applied Statistics and Data Science Program. Col. Andrew Lee is an associate professor in the Department of Mathematical Sciences at West Point. Andrew received his master’s (2012) and Ph.D. (2016) degrees from the Massachusetts Institute of Technology, specializing in transportation, policy, probability and optimization. His expertise encompasses a wide array of operations research and transportation models with a focus on military applications, unmanned aerial systems and disaster relief. At the United States Military Academy, he is the program director for the Operations Research program and former director of the Math Sciences program and Center for Leadership and Diversity in STEM. Michael Schuckers is an editor of the SCORE Network and a professor of sports analytics and data science at UNC Charlotte. He has extensively published in sports analytics, particularly for ice hockey; serves as an associate editor for the Journal of Quantitative Analysis in Sports; and was the 2013 winner of the Significant Contributor Award from the Section on Statistics and Sports of the American Statistical Association.
