An Exact Solution Approach for Hierarchical Clustering

Published Online:https://doi.org/10.1287/ijoc.2024.0903

References

  • Ágoston KC, E-Nagy M (2024) Mixed integer linear programming formulation for k-means clustering problem. Central Eur. J. Oper. Res. 32(1):11–27.CrossrefGoogle Scholar
  • Aloise D, Hansen P (2009) A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering. Pesquisa Operacional 29(3):503–516.CrossrefGoogle Scholar
  • Aloise D, Hansen P, Liberti L (2012a) An improved column generation algorithm for minimum sum-of-squares clustering. Math. Programming 131:195–220.CrossrefGoogle Scholar
  • Aloise D, Hansen P, Rocha C (2012b) A column generation algorithm for semi-supervised minimum sum-of-squares clustering. Aloise D, Hansen P, Rocha C, eds. Proc. Global Optimi. Workshop 2012, 19–22.Google Scholar
  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Machine Learn. 75:245–248.CrossrefGoogle Scholar
  • Awasthi P, Bandeira AS, Charikar M, Krishnaswamy R, Villar S, Ward R (2015) Relax, no need to round: Integrality of clustering formulations. Proc. 2015 Conf. Innovations Theoretical Comput. Sci. (Association for Computing Machinery, New York), 191–200.Google Scholar
  • Baldacci R, Christofides N, Mingozzi A (2008) An exact algorithm for the vehicle routing problem based on the set partitioning formulation with additional cuts. Math. Programming 115:351–385.CrossrefGoogle Scholar
  • Brusco MJ (2006) A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning. Psychometrika 71(2):347–363.CrossrefGoogle Scholar
  • Burgard JP, Costa CM, Hojny C, Thomas K, Schmidt M (2023) Mixed-integer programming techniques for the minimum sum-of-squares clustering problem. J. Global Optim. 87(1):133–189.CrossrefGoogle Scholar
  • Chami I, Gu A, Chatziafratis V, Ré C (2020) From trees to continuous embeddings and back: Hyperbolic hierarchical clustering. Adv. Neural Inform. Processing Systems, vol. 33 (MIT Press, Cambridge, MA), 15065–15076.Google Scholar
  • Charikar M, Chatziafratis V (2017) Approximate hierarchical clustering via sparsest cut and spreading metrics. Klein PN, ed. Proc. 2017 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 841–854.Google Scholar
  • Cohen-Addad V, Kanade V, Mallmann-Trenn F, Mathieu C (2019) Hierarchical clustering: Objective functions and algorithms. J. ACM 66(4):1–42.CrossrefGoogle Scholar
  • Dao TBH, Kuo CT, Ravi SS, Vrain C, Davidson I (2018) Descriptive clustering: ILP and CP formulations with applications. Lang J, ed. Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Washington, DC), 1263–1269.Google Scholar
  • Dasgupta S (2016) A cost function for similarity-based hierarchical clustering. Proc. 48th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 118–127.Google Scholar
  • Diehr G (1985) Evaluation of a branch and bound algorithm for clustering. SIAM J. Sci. Statist. Comput. 6(2):268–284.CrossrefGoogle Scholar
  • Dinkelbach W (1967) On nonlinear fractional programming. Management Sci. 13(7):492–498.LinkGoogle Scholar
  • Du Merle O, Hansen P, Jaumard B, Mladenovic N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21(4):1485–1505.CrossrefGoogle Scholar
  • Edwards AWF, Cavalli-Sforza LL (1965) A method for cluster analysis. Biometrics 21(2):362–375.CrossrefGoogle Scholar
  • Gilpin S, Davidson I (2017) A flexible ILP formulation for hierarchical clustering. Artificial Intelligence 244:95–109.CrossrefGoogle Scholar
  • Gilpin S, Nijssen S, Davidson I (2013) Formalizing hierarchical clustering as integer linear programming. 27th AAAI Conf. Artificial Intelligence, vol. 27 (AAAI Press, Washington, DC), 372–378.Google Scholar
  • Gondzio J, González-Brevis P, Munari P (2016) Large-scale optimization with the primal-dual column generation method. Math. Programming Comput. 8:47–82.CrossrefGoogle Scholar
  • Gordon AD (1987) Parsimonious trees. J. Classification 4:85–101.CrossrefGoogle Scholar
  • Greenberg CS, Macaluso S, Monath N, Dubey A, Flaherty P, Zaheer M, Ahmed A, Cranmer K, McCallum A (2021) Exact and approximate hierarchical clustering using A*. Proc. 37th Conf. Uncertainty Artificial Intelligence, vol. 161 (PMLR, New York), 2061–2071.Google Scholar
  • Hansen P, Mladenović N (2001) J-means: A new local search heuristic for minimum sum of squares clustering. Pattern Recognition 34(2):405–413.CrossrefGoogle Scholar
  • Hansen P, Jaumard B, Meyer C (2000) A simple enumerative algorithm for unconstrained 0 - 1 quadratic programming. Cahiers du GERAD G-2000-59. https://numerique.banq.qc.ca/patrimoine/details/52327/3081864.Google Scholar
  • Hennig C (2022) An empirical comparison and characterisation of nine popular clustering methods. Adv. Data Anal. Classification 16(1):201–229.CrossrefGoogle Scholar
  • Hennig C, Meila M, Murtagh F, Rocci R (2015) Handbook of Cluster Analysis (Chapman & Hall, New York).CrossrefGoogle Scholar
  • Hoffman KL, Padberg M (1993) Solving airline crew scheduling problems by branch-and-cut. Management Sci. 39(6):657–682.LinkGoogle Scholar
  • Hubert L, Arabie P (1985) Comparing partitions. J. Classification 2:193–218.CrossrefGoogle Scholar
  • Jensen RE (1969) A dynamic programming algorithm for cluster analysis. Oper. Res. 17(6):927–1092.LinkGoogle Scholar
  • Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis (Wiley, New York).CrossrefGoogle Scholar
  • Koontz W, Narendra PM, Fukunaga K (1975) A branch and bound clustering algorithm. IEEE Trans. Comput. 24(9):908–915.CrossrefGoogle Scholar
  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Sympos. Math. Statist. Probab. Vol. 1 Statist. (University of California Press, Berkeley, CA), 281–297.Google Scholar
  • Moseley B, Wang JR (2023) Approximation bounds for hierarchical clustering: Average linkage, bisecting k-means, and local search. J. Machine Learn. Res. 24(1):1–36.Google Scholar
  • Mueller M, Kramer S (2010) Integer linear programming models for constrained clustering. Internat. Conf. Discovery Sci. (Springer, New York), 159–173.Google Scholar
  • Naumov S, Yaroslavtsev G, Avdiukhin D (2021) Objective-based hierarchical clustering of deep embedding vectors. Proc. AAAI Conf. Artificial Intelligence, vol. 35 (AAAI Press, Washington, DC), 9055–9063.CrossrefGoogle Scholar
  • Pecin D, Pessoa A, Poggi M, Uchoa E (2017) Improved branch-cut-and-price for capacitated vehicle routing. Math. Programming Comput. 9:61–100.CrossrefGoogle Scholar
  • Peng J, Xia Y (2005) A new theoretical framework for k-means-type clustering. Foundations Adv. Data Mining 180:79–96.Google Scholar
  • Piccialli V, Russo AR, Sudoso AM (2022a) An exact algorithm for semi-supervised minimum sum-of-squares clustering. Comput. Oper. Res. 147:105958.CrossrefGoogle Scholar
  • Piccialli V, Sudoso AM, Wiegele A (2022b) SOS-SDP: An exact solver for minimum sum-of-squares clustering. INFORMS J. Comput. 34(4):2144–2162.LinkGoogle Scholar
  • Rajagopalan A, Vitale F, Vainstein D, Citovsky G, Procopiuc CM, Gentile C (2021) Hierarchical clustering of data streams: Scalable algorithms and approximation guarantees. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 8799–8809.Google Scholar
  • Rao MR (1971) Cluster analysis and mathematical programming. J. Amer. Statist. Assoc. 66(335):622–626.CrossrefGoogle Scholar
  • Roy A, Pokutta S (2017) Hierarchical clustering via spreading metrics. J. Machine Learn. Res. 18(88):1–35.Google Scholar
  • Ryan DM, Foster BA (1981) An integer programming approach to scheduling. Wren A, ed. Computer Scheduling of Public Transport: Urban Passenger Vehicle and Crew Scheduling (North-Holland, Amsterdam), 269–280.Google Scholar
  • Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er MJ, Ding W, Lin CT (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681.CrossrefGoogle Scholar
  • Schrijver A (2003) Combinatorial Optimization (Springer, Berlin, Heidelberg).Google Scholar
  • Senne EL, Lorena LA, Pereira MA (2005) A branch-and-price approach to p-median location problems. Comput. Oper. Res. 32(6):1655–1664.CrossrefGoogle Scholar
  • Sudoso AM, Aloise D (2024) A column generation algorithm with dynamic constraint aggregation for minimum sum-of-squares clustering. Preprint, submitted October 8, https://arxiv.org/abs/2410.06187.Google Scholar
  • Thrun MC, Ultsch A (2020) Clustering benchmark datasets exploiting the fundamental clustering problems. Data Brief 30:105501.CrossrefGoogle Scholar
  • Vichi M, Cavicchia C, Groenen PJF (2022) Hierarchical means clustering. J. Classification 39(3):553–577.CrossrefGoogle Scholar
  • Vinod HD (1969) Integer programming and the theory of grouping. J. Amer. Statist. Assoc. 65(326):506–519.CrossrefGoogle Scholar
  • Wang Y, Moseley B (2020) An objective for hierarchical clustering in Euclidean space and its connection to bisecting k-means. Proc. AAAI Conf. Artificial Intelligence, vol. 34 (AAAI Press, Washington, DC), 6307–6314.CrossrefGoogle Scholar
  • Willemsen R, Cavicchia C, Van den Heuvel W, Van de Velden M (2025) An exact solution approach for hierarchical clustering. http://dx.doi.org/10.1287/ijoc.2024.0903.cd, https://github.com/INFORMSJoC/2024.0903.Google Scholar
  • Xia Y, Peng J (2005) A cutting algorithm for the minimum sum-of-squared error clustering. Proc. SIAM Internat. Data Mining Conf. (Society for Industrial and Applied Mathematics, Philadelphia), 150–160.Google Scholar
  • Yarkony J, Fowlkes C (2015) Planar ultrametrics for image segmentation. Adv. Neural Inform. Processing Systems, vol. 28 (MIT Press, Cambridge, MA), 64–72.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.