Truncated Fusion Learning on Supervised Clustering and Its Fast Stagewise Algorithm
Abstract
Supervised clustering has emerged to be a popular topic for studying heterogeneous effects in diverse areas such as risk management and medical science. In this paper, we introduce a general heterogeneity tracking model that accommodates various common paradigms for supervised clustering. Building on this new model, we propose a novel method called truncated fusion learning (TRUE) for conducting clustering, which involves a nonconvex fusion penalized optimization. For implementation, we innovate a thresholded fusion stagewise algorithm that allows for quickly tracing out the entire solution paths of TRUE. Under mild regularity conditions, we provide comprehensive theoretical guarantees including the convergence of the proposed algorithm, as well as consistency in parameter estimation, cluster recovery, and model selection. The superior performance of our method is demonstrated by several simulation examples and applications to two real data sets.
History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning.
Funding: This work was supported by the National Key Research and Development Program of China [Grant 2022YFA1008000], the Natural Science Foundation of China [Grants 72571258, 72301258, 11671374, 12101584, 71731010, 71921001, and 72071187], and the Fundamental Research Funds for the Central Universities [Grants WK2040000114, WK2040250125, WK3470000017, and WK2040000047] is gratefully acknowledged.
Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0840) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2024.0840). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

