Fast Filter Pruning Method for Neural Network Compression
Abstract
The rapid adoption of artificial neural networks has led to the development of increasingly complex and large models. Although these models have achieved outstanding performance across various applications, their deployment and inference speed on devices with limited computing resources pose significant challenges. To address these issues, model compression techniques have been proposed to reduce model size while preserving performance. In this paper, we propose a two-stage filter pruning approach for model compression, designed to maximize filter diversity given a desired compression ratio. We interpret filter diversity from an information theory perspective and propose a combinatorial optimization model that selects a subset of filters aimed at retaining the maximum information of the original network. We frame the maximization of filter diversity as a p-dispersion problem and propose a polynomial-time algorithm that is within a factor of two of the optimal solution. Compared with state-of-the-art benchmarks in automated pruning methods, the proposed algorithm offers significant advantages in computational complexity and solution speed.
History: Accepted by Ram Ramesh, Area Editor for Data Science & Machine Learning.
Funding: This work was supported by the National Natural Science Foundation of China [Grants 72192823, 72301172, 72394370:72394375, and 72495130:72495132], the Shanghai Jiao Tong University Office of Liberal Arts [Grant ZHWK2502], and the Zhejiang Key Laboratory of Decision Intelligence [Grant 2025E10006].
Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0991) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2024.0991). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

