Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1):3–39.Crossref, Google Scholar
Banner R, Hubara I, Hoffer E, Soudry D (2018) Scalable methods for 8-bit training of neural networks. Adv. Neural Inform. Processing Systems 31:5145–5153.Google Scholar
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290(2):405–421.Crossref, Google Scholar
Bergman D, Huang T, Brooks P, Lodi A, Raghunathan AU (2022) Janos: An integrated predictive and prescriptive modeling framework. INFORMS J. Comput. 34(2):807–816.Link, Google Scholar
Bernardelli AM, Gualandi S, Lau HC, Milanesi S (2023) The BeMi stardust: A structured ensemble of binarized neural networks. Internat. Conf. Learn. Intelligent Optimization (LION) (Springer, Cham, Switzerland), 443–458.Google Scholar
Bernardelli AM, Gualandi S, Milanesi S, Lau HC, Yorke-Smith N (2024) Multi-objective linear ensembles for robust and sparse training of few-bit neural networks. http://dx.doi.org/10.1287/ijoc.2023.0281.cd, https://github.com/INFORMSJoC/2023.0281.Google Scholar
Bishop CM, Nasrabadi NM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
Blalock DW, Gonzalez Ortiz JJ, Frankle J, Guttag JV (2020) What is the state of neural network pruning? Proc. Machine Learn. Systems, 129–146.Google Scholar
Blott M, Halder L, Leeser M, Doyle L (2019) Qutibench: Benchmarking neural networks on heterogeneous hardware. ACM J. Emerging Tech. Comput. Systems 15(4):1–38.Crossref, Google Scholar
Botoeva E, Kouvaros P, Kronqvist J, Lomuscio A, Misener R (2020) Efficient verification of ReLu-based neural networks via dependency analysis. Proc. Conf. AAAI Artificial Intelligence 34(4):3291–3299.Crossref, Google Scholar
Brigato L, Barz B, Iocchi L, Denzler J (2022) Image classification with small datasets: Overview and benchmark. IEEE Access 10(2022):49233–49250.Crossref, Google Scholar
Cai J, Nguyen KN, Shrestha N, Good A, Tu R, Yu X, Zhe S, Serra T (2023) Getting away with more network pruning: From sparsity to geometry and linear regions. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (CPAIOR) (Springer, Cham, Switzerland), 200–218.Google Scholar
Cappart Q, Chételat D, Khalil EB, Lodi A, Morris C, Veličković P (2023) Combinatorial optimization and reasoning with graph neural networks. J. Machine Learn. Res. 24(130):1–61.Google Scholar
ElAraby M, Wolf G, Carvalho M (2023) OAMIP: Optimizing ANN architectures using mixed-integer programming. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (CPAIOR) (Springer, Berlin, Heidelberg), 219–237.Google Scholar
Fischetti M, Jo J (2018) Deep neural networks and mixed integer linear optimization. Constraints 23(3):296–309.Crossref, Google Scholar
Gholami A, Kim S, Dong Z, Yao Z, Mahoney MW, Keutzer K (2022) A survey of quantization methods for efficient neural network inference. Thiruvathukal GK, Lu Y-H, Kim J, Chen Y, Chen B, eds. Low-Power Computer Vision (Chapman and Hall/CRC, New York), 291–326.Crossref, Google Scholar
Good A, Lin J, Yu X, Sieg H, Fergurson M, Zhe S, Wieczorek J, Serra T (2022) Recall distortion in neural network pruning and the undecayed pruning algorithm. Adv. Neural Inform. Processing Systems 35:32762–32776.Google Scholar
Gurobi Optimization LLC (2023) Gurobi Optimizer reference manual. Accessed August 9, 2023, https://www.gurobi.com.Google Scholar
Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Preprint, submitted October 1, https://arxiv.org/abs/1510.00149.Google Scholar
Huang T, Ferber AM, Tian Y, Dilkina B, Steiner B (2023) Searching large neighborhoods for integer linear programs with contrastive learning. Proc. 40th Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 13869–13890.Google Scholar
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. Adv. Neural Inform. Processing Systems 29:4107–4115.Google Scholar
Huchette J, Muñoz G, Serra T, Tsay C (2023) When deep learning meets polyhedral theory: A survey. Preprint, submitted April 29, https://arxiv.org/abs/2305.00241.Google Scholar
Janosi A, Steinbrunn W, Pfisterer M, Detrano RU (1988) Heart disease data set. Accessed August 9, 2023, http://archive.ics.uci.edu/ml/datasets/Heart+Disease.Google Scholar
Jiang Y, Krishnan D, Mobahi H, Bengio S (2019) Predicting the generalization gap in deep networks with margin distributions. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. Preprint, submitted October 16, https://arxiv.org/abs/1710.05468.Google Scholar
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: Generalization gap and sharp minima. Internat. Conf. Learn. Representations (ICLR), vol. 5 (OpenReview.net).Google Scholar
Khalil EB, Gupta A, Dilkina B (2019) Combinatorial attacks on binarized neural networks. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
Kurtz J, Bah B (2021) Efficient and robust mixed-integer optimization methods for training binarized deep neural networks. Preprint, submitted October 21, https://arxiv.org/abs/2110.11382.Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.Crossref, Google Scholar
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. Accessed August 9, 2023, http://yann.lecun.com/exdb/mnist.Google Scholar
Li X, Sun C, Ye Y (2020) Simple and fast algorithm for binary integer and online linear programming. Adv. Neural Inform. Processing Systems 33:9412–9421.Google Scholar
Lin X, Zhao C, Pan W (2017) Toward accurate binary convolutional neural network. Adv. Neural Inform. Processing Systems 30:345–353.Google Scholar
Lombardi M, Milano M (2018) Boosting combinatorial problem modeling with machine learning. Preprint, submitted July 15, https://arxiv.org/abs/1807.05517.Google Scholar
Mistry M, Letsios D, Krennrich G, Lee RM, Misener R (2021) Mixed-integer convex nonlinear optimization with gradient-boosted trees embedded. INFORMS J. Comput. 33(3):1103–1119.Link, Google Scholar
Moody J (1991) The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. Adv. Neural Inform. Processing Systems 4:847–854.Google Scholar
Morcos A, Yu H, Paganini M, Tian Y (2019) One ticket to win them all: Generalizing lottery ticket initializations across datasets and optimizers. Adv. Neural Inform. Processing Systems 32:4932–4942.Google Scholar
Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. Adv. Neural Inform. Processing Systems 30:5947–5956.Google Scholar
Patil V, Mintz Y (2022) A mixed-integer programming approach to training dense neural networks. Preprint, submitted January 3, https://arxiv.org/abs/2201.00723.Google Scholar
Roger C, Molina R, Rey C, Serra T, Puertas E, Pujol O (2022) Training thinner and deeper neural networks: Jumpstart regularization. Schaus P, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2022, Lecture Notes in Computer Science, vol. 13292 (Springer, Cham, Switzerland), 345–357.Google Scholar
Sakr C, Choi J, Wang Z, Gopalakrishnan K, Shanbhag N (2018) True gradient-based training of deep binary activated neural networks via continuous binarization. 2018 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 2346–2350.Google Scholar
Serra T, Kumar A, Ramalingam S (2020) Lossless compression of deep neural networks. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 417–430.Google Scholar
Serra T, Yu X, Kumar A, Ramalingam S (2021) Scaling up exact neural network compression by ReLU stability. Adv. Neural Inform. Processing Systems 34:27081–27093.Google Scholar
Tang W, Hua G, Wang L (2017) How to train a compact binary neural network with high accuracy? Thirty-First AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2625–2631.Google Scholar
Thorbjarnarson T, Yorke-Smith N (2023) Optimal training of integer-valued neural networks with mixed integer programming. PLoS One 18(2):e0261029.Crossref, Google Scholar
Tjandraatmadja C, Anderson R, Huchette J, Ma W, Patel KK, Vielma JP (2020) The convex relaxation barrier, revisited: Tightened single-neuron relaxations for neural network verification. Adv. Neural Inform. Processing Systems 33:21675–21686.Google Scholar
Tjeng V, Xiao KY, Tedrake R (2018) Evaluating robustness of neural networks with mixed integer programming. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
Toro Icarte R, Illanes L, Castro MP, Cire AA, McIlraith SA, Beck JC (2019) Training binarized neural networks using MIP and CP. Internat. Conf. Principles Practice Constraint Programming, vol. 11802 (Springer, Cham, Switzerland), 401–417.Google Scholar
Tsay C, Kronqvist J, Thebelt A, Misener R (2021) Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Adv. Neural Inform. Processing Systems 34:3068–3080.Google Scholar
Vanschoren J (2019) Meta-learning. Hutter F, Kotthoff L, Vanschoren J, eds. Automated Machine Learning, Springer Series on Challenges in Machine Learning (Springer, Cham, Switzerland), 35–61.Crossref, Google Scholar
Wang K, Lozano L, Cardonha C, Bergman D (2023) Optimizing over an ensemble of trained neural networks. INFORMS J. Comput. 35(3):652–674.Link, Google Scholar
Williams HP (2013) Model Building in Mathematical Programming (John Wiley & Sons, Chichester, UK).Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: A novel image data set for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
Ye H, Xu H, Wang H, Wang C, Jiang Y (2023) GNN&GBDT-guided fast optimizing framework for large-scale integer programming. Proc. 40th Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 39864–39878.Google Scholar
Young HP (1988) Condorcet’s theory of voting. Amer. Political Sci. Rev. 82(4):1231–1244.Crossref, Google Scholar
Yu X, Serra T, Ramalingam S, Zhe S (2022) The combinatorial brain surgeon: Pruning weights that cancel one another in neural networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York), 25668–25683.Google Scholar

cover image INFORMS Journal on Computing

Volume 37, Issue 3

May-June 2025

Pages iii, 503-783, ii

Article Information

Supplemental Material

Metrics

Information

Received:August 09, 2023
Accepted:July 02, 2024
Published Online:September 13, 2024

Cite as

Ambrogio Maria Bernardelli; , Stefano Gualandi; , Simone Milanesi; , Hoong Chuin Lau; , Neil Yorke-Smith (2024) Multiobjective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks. INFORMS Journal on Computing 37(3):623-643.

https://doi.org/10.1287/ijoc.2023.0281

Keywords

Acknowledgments

The authors thank the anonymous reviewers for their comments. Additionally, the authors thank the participants of the Dagstuhl Seminar 22431 “Data-Driven Combinatorial Optimisation,” and the authors also thank Tómas þorbjarnarson.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Multiobjective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks

References

Volume 37, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News