Fast Filter Pruning Method for Neural Network Compression

Dongqi Wang
Dongqi Wang
[email protected]
https://orcid.org/0009-0008-0016-9160
School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China; and School of Management, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China; and Zhejiang Key Laboratory of Decision Intelligence, Zhejiang University, Hangzhou, Zhejiang 310058, China
Search for more papers by this author
,
Weiwei Chen
Weiwei Chen
[email protected]
https://orcid.org/0000-0002-7736-3411
Department of Supply Chain Management, Rutgers University, Piscataway, New Jersey 08854
Search for more papers by this author
,
Cheng Hua
Cheng Hua
[email protected]
https://orcid.org/0000-0002-1662-2424
Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200030, China; and Data-Driven Management Decision Making Lab, Shanghai Jiao Tong University, Shanghai 200030, China
Search for more papers by this author
,
Weihua Zhou
Corresponding Author
Weihua Zhou
[email protected]
https://orcid.org/0000-0002-1056-7612
School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China; and Zhejiang Key Laboratory of Decision Intelligence, Zhejiang University, Hangzhou, Zhejiang 310058, China
Search for more papers by this author
,
Xin Lin
Xin Lin
[email protected]
School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China
Search for more papers by this author

School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China; and School of Management, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China; and Zhejiang Key Laboratory of Decision Intelligence, Zhejiang University, Hangzhou, Zhejiang 310058, China

Search for more papers by this author

Weiwei Chen

[email protected]

https://orcid.org/0000-0002-7736-3411

Department of Supply Chain Management, Rutgers University, Piscataway, New Jersey 08854

Search for more papers by this author

Cheng Hua

[email protected]

https://orcid.org/0000-0002-1662-2424

Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200030, China; and Data-Driven Management Decision Making Lab, Shanghai Jiao Tong University, Shanghai 200030, China

Search for more papers by this author

Weihua Zhou

Corresponding Author

Weihua Zhou

[email protected]

https://orcid.org/0000-0002-1056-7612

School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China; and Zhejiang Key Laboratory of Decision Intelligence, Zhejiang University, Hangzhou, Zhejiang 310058, China

Search for more papers by this author

Xin Lin

[email protected]

School of Management, Zhejiang University, Hangzhou, Zhejiang 310058, China

Search for more papers by this author

Published Online:24 Jun 2026https://doi.org/10.1287/ijoc.2024.0991

Abstract

The rapid adoption of artificial neural networks has led to the development of increasingly complex and large models. Although these models have achieved outstanding performance across various applications, their deployment and inference speed on devices with limited computing resources pose significant challenges. To address these issues, model compression techniques have been proposed to reduce model size while preserving performance. In this paper, we propose a two-stage filter pruning approach for model compression, designed to maximize filter diversity given a desired compression ratio. We interpret filter diversity from an information theory perspective and propose a combinatorial optimization model that selects a subset of filters aimed at retaining the maximum information of the original network. We frame the maximization of filter diversity as a p-dispersion problem and propose a polynomial-time algorithm that is within a factor of two of the optimal solution. Compared with state-of-the-art benchmarks in automated pruning methods, the proposed algorithm offers significant advantages in computational complexity and solution speed.

History: Accepted by Ram Ramesh, Area Editor for Data Science & Machine Learning.

Funding: This work was supported by the National Natural Science Foundation of China [Grants 72192823, 72301172, 72394370:72394375, and 72495130:72495132], the Shanghai Jiao Tong University Office of Liberal Arts [Grant ZHWK2502], and the Zhejiang Key Laboratory of Decision Intelligence [Grant 2025E10006].

Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information (https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2024.0991) as well as from the IJOC GitHub software repository (https://github.com/INFORMSJoC/2024.0991). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/.

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 10, 2024
Accepted:May 09, 2026
Published Online:June 24, 2026

Cite as

Dongqi Wang, Weiwei Chen, Cheng Hua, Weihua Zhou, Xin Lin (2026) Fast Filter Pruning Method for Neural Network Compression. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2024.0991

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Fast Filter Pruning Method for Neural Network Compression

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News