An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance

Mohammadmahdi Ghasemloo
Corresponding Author
Mohammadmahdi Ghasemloo
[email protected]
https://orcid.org/0009-0005-2444-1956
Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843
Search for more papers by this author
,
David J. Eckman
David J. Eckman
[email protected]
https://orcid.org/0000-0002-6473-6434
Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843
Search for more papers by this author

Mohammadmahdi Ghasemloo

Corresponding Author

Mohammadmahdi Ghasemloo

[email protected]

https://orcid.org/0009-0005-2444-1956

Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843

Search for more papers by this author

David J. Eckman

[email protected]

https://orcid.org/0000-0002-6473-6434

Department of Industrial and Systems Engineering, Texas A&M University, College Station, Texas 77843

Search for more papers by this author

Published Online:17 Sep 2025https://doi.org/10.1287/ijds.2024.0056

Abstract

Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision making by uncovering relationships among different simulated systems and between a system’s inputs and outputs. We present a novel agglomerative clustering algorithm that utilizes the regularized Wasserstein distance to cluster multivariate empirical distributions of simulation outputs to identify patterns and trade-offs among performance measures. This framework has several important use cases, including anomaly detection, preoptimization, and online monitoring. In numerical experiments involving a call center model, we demonstrate how this methodology can identify staffing plans that yield similar performance outcomes and inform policies for intervening when queue lengths signal potentially worsening system performance.

History: Eunshin Byon served as the senior editor for this article.

Funding: This work is supported by the National Science Foundation [Grant CMMI-2206972].

Data Ethics & Reproducibility Note: The code is available at https://github.com/mohammadmgh78/Agglomerative_Clustering_Distribution and the Python package at https://pypi.org/project/distclust and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2024.0056).

cover image INFORMS Journal on Data Science

Volume 5, Issue 1

January-March 2026

Pages iii-iv, 1-80, ii

Article Information

Supplemental Material

Metrics

Information

Received:November 01, 2024
Accepted:July 31, 2025
Published Online:September 17, 2025

Cite as

Mohammadmahdi Ghasemloo, David J. Eckman (2025) An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance. INFORMS Journal on Data Science 5(1):65-80.

https://doi.org/10.1287/ijds.2024.0056

Keywords

Acknowledgments

The authors thank Morteza Davari for helpful discussions about the online monitoring application and thank the associate editor and reviewers for helpful comments that improved the paper. No data ethics considerations are foreseen related to this paper.

PDF download

Available Issues