An Algorithm for Clustering with Confidence-Based Must-Link and Cannot-Link Constraints

Published Online:https://doi.org/10.1287/ijoc.2023.0419

References

  • Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. Proc. Eighteenth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1027–1035.Google Scholar
  • Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. Simonis H, ed. Integration AI OR Techniques Constraint Programming: 11th Internat. Conf. CPAIOR 2014 (Springer, Cham, Switzerland), 438–454.Google Scholar
  • Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. Skillicorn D, Berry MW, Dayal U, Kamath C, eds. Proc. 2004 SIAM Internat. Conf. Data Mining (SDM) (Society for Industrial and Applied Mathematics, Philadelphia), 333–344.Google Scholar
  • Baumann P (2020) A binary linear programming-based k-means algorithm for clustering with must-link and cannot-link constraints. 2020 IEEE Internat. Conf. Indust. Engrg. Engrg. Management (IEEE, Piscataway, NJ), 324–328.Google Scholar
  • Baumann P, Hochbaum D (2022) A k-means algorithm for clustering with soft must-link and cannot-link constraints. De Marsico M, Sanniti di Baja G, Fred A, eds. Proc. 11th Internat. Conf. Pattern Recognition Appl. Methods (Springer, Cham, Switzerland), 195–202.Google Scholar
  • Baumann P, Hochbaum D (2024) An algorithm for clustering with confidence-based must-link and cannot-link constraints. http://dx.doi.org/10.1287/ijoc.2023.0419.cd, https://github.com/INFORMSJoC/2023.0419.Google Scholar
  • Bentley JL (1975) Multidimensional binary search trees used for associative searching. Ashenhurst RL, ed. Comm. ACM, vol. 18(9) (Association for Computing Machinery, New York), 509–517.CrossrefGoogle Scholar
  • Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.CrossrefGoogle Scholar
  • Brusco MJ (2006) A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning. Psychometrika 71(2):347–363.CrossrefGoogle Scholar
  • Chen Y, Jalali A, Sanghavi S, Xu H (2014) Clustering partially observed graphs via convex optimization. J. Machine Learn. Res. 15(1):2213–2238.Google Scholar
  • Dao TBH, Duong KC, Vrain C (2013) A declarative framework for constrained clustering. Blockeel H, Kersting K, Nijssen S, Železný F, eds. Machine Learn. Knowledge Discovery Databases: Euro. Conf., ECML PKDD 2013 (Springer, Berlin, Heidelberg), 419–434.Google Scholar
  • Davidson I, Ravi S (2005) Clustering with constraints: Feasibility issues and the k-means algorithm. Kargupta H, Kamath C, Srivastava J, Goodman A, eds. Proc. 2005 SIAM Internat. Conf. Data Mining (Society for Industrial and Applied Mathematics, Philadelphia), 138–149.Google Scholar
  • Fischetti M, Glover F, Lodi A (2005) The feasibility pump. Math. Programming 104(1):91–104.CrossrefGoogle Scholar
  • Gambella C, Ghaddar B, Naoum-Sawaya J (2021) Optimization problems for machine learning: A survey. Eur. J. Oper. Res. 290(3):807–828.CrossrefGoogle Scholar
  • Ganji M, Bailey J, Stuckey PJ (2016) Lagrangian constrained clustering. Chawla Venkatasubramanian S, Meira W, eds. Proc. 2016 SIAM Internat. Conf. Data Mining (Society for Industrial and Applied Mathematics, Philadelphia), 288–296.Google Scholar
  • González-Almagro G, Luengo J, Cano JR, García S (2020) DILS: Constrained clustering through dual iterative local search. Comput. Oper. Res. 121:104979.CrossrefGoogle Scholar
  • González-Almagro G, Luengo J, Cano JR, García S (2021) Enhancing instance-level constrained clustering through differential evolution. Appl. Soft Comput. 108:107435.CrossrefGoogle Scholar
  • Guns T, Dao TBH, Vrain C, Duong KC (2016) Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering. Van Harmelen F, Dignum V, Dignum F, Bouquet P, Fox M, Kaminka GA, Hüllermeier E, eds. 22nd Eur. Conf. Artificial Intelligence (IOS Press, The Hague, Netherlands), 462–470.Google Scholar
  • Hubert L, Arabie P (1985) Comparing partitions. J. Classification 2(1):193–218.CrossrefGoogle Scholar
  • Koontz WLG, Narendra PM, Fukunaga K (1975) A branch and bound clustering algorithm. IEEE Trans. Comput. 100(9):908–915.CrossrefGoogle Scholar
  • Le HM, Eriksson A, Do TT, Milford M (2019) A binary optimization approach for constrained k-means clustering. Jawahar CV, Li H, Mori G, Schindler K, eds. Computer Vision – ACCV 2018: 14th Asian Conf. Computer Vision (Springer, Berlin, Heidelberg), 383–398.Google Scholar
  • Pelegrín M (2023) New variants of the simple plant location problem and applications. Eur. J. Oper. Res. 306(3):1094–1108.CrossrefGoogle Scholar
  • Pelleg D, Baras D (2007) K-means with large and noisy constraint sets. Kok JN, Koronacki J, de Mantaras RL, Matwin S, Mladenič D, Skowron A, eds. Machine Learn. ECML 2007: 18th Eur. Conf. Machine Learn. (Springer, Berlin, Heidelberg), 674–682.Google Scholar
  • Perron L, Didier F (2024) CP-SAT Solver. Accessed May 5, 2024, https://developers.google.com/optimization/cp/cp_solver/.Google Scholar
  • Piccialli V, Russo AR, Sudoso AM (2022a) An exact algorithm for semi-supervised minimum sum-of-squares clustering. Comput. Oper. Res. 147:105958.CrossrefGoogle Scholar
  • Piccialli V, Sudoso AM, Wiegele A (2022b) SOS-SDP: An exact solver for minimum sum-of-squares clustering. INFORMS J. Comput. 34(4):2144–2162.LinkGoogle Scholar
  • Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20:53–65.CrossrefGoogle Scholar
  • Seref O, Fan YJ, Chaovalitwongse WA (2014) Mathematical programming formulations and algorithms for discrete k-median clustering of time-series data. INFORMS J. Comput. 26(1):160–172.LinkGoogle Scholar
  • Tian T, Zhang J, Lin X, Wei Z, Hakonarson H (2021) Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data. Nature Comm. 12(1):1873.CrossrefGoogle Scholar
  • Valls A, Batet M, Lopez EM (2009) Using expert’s rules as background knowledge in the ClusDM methodology. Eur. J. Oper. Res. 195(3):864–875.CrossrefGoogle Scholar
  • Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. Brodley CE, Pohoreckyj Danyluk A, eds. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco, CA), 577–584.Google Scholar
  • Wang S, Gittens A, Mahoney MW (2019) Scalable kernel k-means clustering with Nyström approximation: Relative-error bounds. J. Machine Learn. Res. 20(1):431–479.Google Scholar
  • Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Mining Knowledge Discovery 28(1):1–30.CrossrefGoogle Scholar
  • Welsh DJ, Powell MB (1967) An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10(1):85–86.CrossrefGoogle Scholar
  • Yang Y, Zhang K, Fan Y (2022) Analyzing firm reports for volatility prediction: A knowledge-driven text-embedding approach. INFORMS J. Comput. 34(1):522–540.LinkGoogle Scholar
  • Zhang Y, Li X, Jia M (2022) Semi-supervised nonnegative matrix factorization with pairwise constraints for image clustering. Internat. J. Machine Learn. Cybernetics 13(11):3577–3587.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.