An Agglomerative Clustering Algorithm for Simulation Output Distributions Using Regularized Wasserstein Distance

Published Online:https://doi.org/10.1287/ijds.2024.0056

References

  • Abdallah I, Tatsis K, Chatzi E (2020) Unsupervised local cluster-weighted bootstrap aggregating the output from multiple stochastic simulators. Reliability Engrg. System Safety 199:106876.Google Scholar
  • Altschuler J, Niles-Weed J, Rigollet P (2017) Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1964–1974.Google Scholar
  • Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. Precup D, Teh YW, eds. Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 214–223.Google Scholar
  • Benamou J-D, Carlier G, Cuturi M, Nenna L, Peyré G (2015) Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2):A1111–A1138.Google Scholar
  • Chakraborty S, Paul D, Das S (2020) Hierarchical clustering with optimal transport. Statist. Probab. Lett. 163:108781.Google Scholar
  • Cont R (2001) Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Finance 1(2):223–236.Google Scholar
  • Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2292–2300.Google Scholar
  • Cuturi M, Doucet A (2014) Fast computation of Wasserstein barycenters. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn., vol. 32 (PMLR, New York), 685–693.Google Scholar
  • Del Barrio E, Cuesta-Albertos JA, Matrán C, Mayo-Íscar A (2019) Robust clustering tools based on optimal transportation. Statist. Comput. 29:139–160.Google Scholar
  • Donat MG, Alexander LV, Yang H, Durre I, Vose R, Caesar J (2013) Updated analyses of temperature and precipitation extreme indices since the beginning of the twentieth century: The HadEX2 dataset. J. Geophysical Res. Atmospheres 118(5):2098–2118.Google Scholar
  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25):14863–14868.Google Scholar
  • Genevay A, Peyré G, Cuturi M (2018) Learning generative models with Sinkhorn divergences. Storkey A, Perez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1608–1617.Google Scholar
  • Haneuse S, Wakefield J (2009) Adjusting for bias due to missing data in clinical studies using auxiliary data and empirical distributions. Biostatistics 10(2):245–257.Google Scholar
  • Henderson K, Gallagher B, Eliassi-Rad T (2015) EP-MEANS: An efficient nonparametric clustering of empirical probability distributions. Proc. 30th Annual ACM Sympos. Appl. Comput. (ACM, New York), 893–900.Google Scholar
  • Horvath B, Issa Z, Muguruza A (2021) Clustering market regimes using the Wasserstein distance. Preprint, submitted October 22, https://arxiv.org/abs/2110.11848.Google Scholar
  • Hubert L, Arabie P (1985) Comparing partitions. J. Classification 2:193–218.Google Scholar
  • Irpino A, Verde R, de AT Carvalho F (2014) Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. Expert Systems Appl. 41(7):3351–3366.Google Scholar
  • Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognition Lett. 31(8):651–666.Google Scholar
  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: A review. ACM Comput. Surveys 31(3):264–323.Google Scholar
  • James G, Witten D, Hastie T, Tibshirani R (2013) An Introduction to Statistical Learning: With Applications in R, vol. 112 (Springer, New York).Google Scholar
  • Karthikeyan B, George DJ, Manikandan G, Thomas T (2020) A comparative study on k-means clustering and agglomerative hierarchical clustering. Internat. J. Emerging Trends Engrg. Res. 8(5):1600–1604.Google Scholar
  • Kelton WD (2006) Implementing representations of uncertainty. Henderson SG, Nelson BL, eds. Simulation, Handbooks in Operations Research and Management Science, vol. 13 (Elsevier, Amsterdam), 181–191.Google Scholar
  • Li H, Lam H, Peng Y (2024) Efficient learning for clustering and optimizing context-dependent designs. Oper. Res. 72(2):617–638.LinkGoogle Scholar
  • Liu Y, Zheng Y, Peng Y, Yuan W (2021) A framework of digital twin generation for structural dynamic monitoring of offshore platform. Ocean Engrg. 237:109599.Google Scholar
  • Loureiro A, Torgo L, Soares C (2004) Outlier detection using clustering methods: A data cleaning application. Proc. KDNet Sympos. Knowledge-Based Systems Public Sector (Springer, New York).Google Scholar
  • Montgomery DC (2009) Empirical distributions of process data. Introduction to Statistical Quality Control, 6th ed. (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Mur A, Dormido R, Duro N, Dormido-Canto S, Vega J (2016) Determination of the optimal number of clusters using a spectral clustering optimization. Expert Systems Appl. 65:304–314.Google Scholar
  • Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: An overview. Data Mining Knowledge Discovery 2(1):86–97.Google Scholar
  • Nelson BL (2016) Some tactical problems in digital simulation for the next 10 years. J. Simulation 10(1):2–11.Google Scholar
  • Pappas TN, Jayant NS (1989) An adaptive clustering algorithm for image segmentation. Internat. Conf. Acoustics Speech Signal Processing, vol. 3 (IEEE, Piscataway, NJ), 1667–1670.Google Scholar
  • Peng Y, Xu J, Lee LH, Hu J, Chen C-H (2018) Efficient simulation sampling allocation using multifidelity models. IEEE Trans. Automatic Control 64(8):3156–3169.Google Scholar
  • Peyré G, Cuturi M (2019) Computational optimal transport: With applications to data science. Foundations Trends Machine Learn. 11(5–6):355–607.Google Scholar
  • Riess L, Beiglböck M, Temme J, Wolf A, Backhoff J (2023) The geometry of financial institutions—Wasserstein clustering of financial data. Preprint, submitted May 5, https://arxiv.org/abs/2305.03565.Google Scholar
  • Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Internat. J. Comput. Vision 40:99–121.Google Scholar
  • Santambrogio F, Wang X-J (2016) Convexity of the support of the displacement interpolation: Counterexamples. Appl. Math. Lett. 58:152–158.Google Scholar
  • Shahapure KR, Nicholas C (2020) Cluster quality analysis using silhouette score. IEEE Seventh Internat. Conf. Data Sci. Adv. Anal. (IEEE, Piscataway, NJ), 747–748.Google Scholar
  • Villani C (2009) Optimal Transport: Old and New, vol. 338 (Springer, Berlin).Google Scholar
  • Wedel M, Kamakura WA (2000) Market Segmentation: Conceptual and Methodological Foundations (Kluwer Academic Publishers, Boston).Google Scholar
  • Zhang Z, Peng Y (2024) Sample-efficient clustering and conquer procedures for parallel large-scale ranking and selection. Preprint, submitted February 3, https://arxiv.org/abs/2402.02196.Google Scholar
  • Zhuang Y, Chen X, Yang Y (2022) Wasserstein K-means for clustering probability distributions. Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc. Red Hook, NY), 11382–11395.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.