An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance
Published Online:17 Sep 2025https://doi.org/10.1287/ijds.2024.0056
References
- (2020) Unsupervised local cluster-weighted bootstrap aggregating the output from multiple stochastic simulators. Reliability Engrg. System Safety 199:106876.Google Scholar
- (2017) Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1964–1974.Google Scholar
- (2017) Wasserstein generative adversarial networks. Precup D, Teh YW, eds. Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 214–223.Google Scholar
- (2015) Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2):A1111–A1138.Google Scholar
- (2020) Hierarchical clustering with optimal transport. Statist. Probab. Lett. 163:108781.Google Scholar
- (2001) Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Finance 1(2):223–236.Google Scholar
- (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2292–2300.Google Scholar
- (2014) Fast computation of Wasserstein barycenters. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn., vol. 32 (PMLR, New York), 685–693.Google Scholar
- (2019) Robust clustering tools based on optimal transportation. Statist. Comput. 29:139–160.Google Scholar
- (2013) Updated analyses of temperature and precipitation extreme indices since the beginning of the twentieth century: The HadEX2 dataset. J. Geophysical Res. Atmospheres 118(5):2098–2118.Google Scholar
- (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25):14863–14868.Google Scholar
- (2018) Learning generative models with Sinkhorn divergences. Storkey A, Perez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1608–1617.Google Scholar
- (2009) Adjusting for bias due to missing data in clinical studies using auxiliary data and empirical distributions. Biostatistics 10(2):245–257.Google Scholar
- (2015) EP-MEANS: An efficient nonparametric clustering of empirical probability distributions. Proc. 30th Annual ACM Sympos. Appl. Comput. (ACM, New York), 893–900.Google Scholar
- (2021) Clustering market regimes using the Wasserstein distance. Preprint, submitted October 22, https://arxiv.org/abs/2110.11848.Google Scholar
- (1985) Comparing partitions. J. Classification 2:193–218.Google Scholar
- (2014) Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. Expert Systems Appl. 41(7):3351–3366.Google Scholar
- (2010) Data clustering: 50 years beyond K-means. Pattern Recognition Lett. 31(8):651–666.Google Scholar
- (1999) Data clustering: A review. ACM Comput. Surveys 31(3):264–323.Google Scholar
- (2013) An Introduction to Statistical Learning: With Applications in R, vol. 112 (Springer, New York).Google Scholar
- (2020) A comparative study on k-means clustering and agglomerative hierarchical clustering. Internat. J. Emerging Trends Engrg. Res. 8(5):1600–1604.Google Scholar
- (2006) Implementing representations of uncertainty. Henderson SG, Nelson BL, eds. Simulation, Handbooks in Operations Research and Management Science, vol. 13 (Elsevier, Amsterdam), 181–191.Google Scholar
- (2024) Efficient learning for clustering and optimizing context-dependent designs. Oper. Res. 72(2):617–638.Link, Google Scholar
- (2021) A framework of digital twin generation for structural dynamic monitoring of offshore platform. Ocean Engrg. 237:109599.Google Scholar
- (2004) Outlier detection using clustering methods: A data cleaning application. Proc. KDNet Sympos. Knowledge-Based Systems Public Sector (Springer, New York).Google Scholar
- (2009) Empirical distributions of process data. Introduction to Statistical Quality Control, 6th ed. (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2016) Determination of the optimal number of clusters using a spectral clustering optimization. Expert Systems Appl. 65:304–314.Google Scholar
- (2012) Algorithms for hierarchical clustering: An overview. Data Mining Knowledge Discovery 2(1):86–97.Google Scholar
- (2016) Some tactical problems in digital simulation for the next 10 years. J. Simulation 10(1):2–11.Google Scholar
- (1989) An adaptive clustering algorithm for image segmentation. Internat. Conf. Acoustics Speech Signal Processing, vol. 3 (IEEE, Piscataway, NJ), 1667–1670.Google Scholar
- (2018) Efficient simulation sampling allocation using multifidelity models. IEEE Trans. Automatic Control 64(8):3156–3169.Google Scholar
- (2019) Computational optimal transport: With applications to data science. Foundations Trends Machine Learn. 11(5–6):355–607.Google Scholar
- (2023) The geometry of financial institutions—Wasserstein clustering of financial data. Preprint, submitted May 5, https://arxiv.org/abs/2305.03565.Google Scholar
- (2000) The earth mover’s distance as a metric for image retrieval. Internat. J. Comput. Vision 40:99–121.Google Scholar
- (2016) Convexity of the support of the displacement interpolation: Counterexamples. Appl. Math. Lett. 58:152–158.Google Scholar
- (2020) Cluster quality analysis using silhouette score. IEEE Seventh Internat. Conf. Data Sci. Adv. Anal. (IEEE, Piscataway, NJ), 747–748.Google Scholar
- (2009) Optimal Transport: Old and New, vol. 338 (Springer, Berlin).Google Scholar
- (2000) Market Segmentation: Conceptual and Methodological Foundations (Kluwer Academic Publishers, Boston).Google Scholar
- (2024) Sample-efficient clustering and conquer procedures for parallel large-scale ranking and selection. Preprint, submitted February 3, https://arxiv.org/abs/2402.02196.Google Scholar
- (2022) Wasserstein K-means for clustering probability distributions. Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc. Red Hook, NY), 11382–11395.Google Scholar

