A Global Optimization Algorithm for K-Center Clustering of One Billion Samples

Published Online:https://doi.org/10.1287/mnsc.2023.00218

This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. Specifically, we propose a reduced-space branch and bound scheme that guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we design a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several structure-exploiting acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world data sets have demonstrated that our algorithm can solve the K-center problems to global optimal within four hours for 10 million samples in the serial mode and 1 billion samples in the parallel mode, whereas existing studies can only address small-scale problems (e.g., thousands of samples). Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm reduces the objective function by an average of 25.8% on these synthetic and real-world data sets.

This paper was accepted by Chung Piaw Teo, optimization.

Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2019-05499].

Supplemental Material: The data files are available at https://doi.org/10.1287/mnsc.2023.00218.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.