An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance
Abstract
Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision making by uncovering relationships among different simulated systems and between a system’s inputs and outputs. We present a novel agglomerative clustering algorithm that utilizes the regularized Wasserstein distance to cluster multivariate empirical distributions of simulation outputs to identify patterns and trade-offs among performance measures. This framework has several important use cases, including anomaly detection, preoptimization, and online monitoring. In numerical experiments involving a call center model, we demonstrate how this methodology can identify staffing plans that yield similar performance outcomes and inform policies for intervening when queue lengths signal potentially worsening system performance.
History: Eunshin Byon served as the senior editor for this article.
Funding: This work is supported by the National Science Foundation [Grant CMMI-2206972].
Data Ethics & Reproducibility Note: The code is available at https://github.com/mohammadmgh78/Agglomerative_Clustering_Distribution and the Python package at https://pypi.org/project/distclust and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2024.0056).

