Multiobjective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks

Published Online:https://doi.org/10.1287/ijoc.2023.0281

References

  • Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1):3–39.CrossrefGoogle Scholar
  • Banner R, Hubara I, Hoffer E, Soudry D (2018) Scalable methods for 8-bit training of neural networks. Adv. Neural Inform. Processing Systems 31:5145–5153.Google Scholar
  • Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290(2):405–421.CrossrefGoogle Scholar
  • Bergman D, Huang T, Brooks P, Lodi A, Raghunathan AU (2022) Janos: An integrated predictive and prescriptive modeling framework. INFORMS J. Comput. 34(2):807–816.LinkGoogle Scholar
  • Bernardelli AM, Gualandi S, Lau HC, Milanesi S (2023) The BeMi stardust: A structured ensemble of binarized neural networks. Internat. Conf. Learn. Intelligent Optimization (LION) (Springer, Cham, Switzerland), 443–458.Google Scholar
  • Bernardelli AM, Gualandi S, Milanesi S, Lau HC, Yorke-Smith N (2024) Multi-objective linear ensembles for robust and sparse training of few-bit neural networks. http://dx.doi.org/10.1287/ijoc.2023.0281.cd, https://github.com/INFORMSJoC/2023.0281.Google Scholar
  • Bishop CM, Nasrabadi NM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
  • Blalock DW, Gonzalez Ortiz JJ, Frankle J, Guttag JV (2020) What is the state of neural network pruning? Proc. Machine Learn. Systems, 129–146.Google Scholar
  • Blott M, Halder L, Leeser M, Doyle L (2019) Qutibench: Benchmarking neural networks on heterogeneous hardware. ACM J. Emerging Tech. Comput. Systems 15(4):1–38.CrossrefGoogle Scholar
  • Botoeva E, Kouvaros P, Kronqvist J, Lomuscio A, Misener R (2020) Efficient verification of ReLu-based neural networks via dependency analysis. Proc. Conf. AAAI Artificial Intelligence 34(4):3291–3299.CrossrefGoogle Scholar
  • Brigato L, Barz B, Iocchi L, Denzler J (2022) Image classification with small datasets: Overview and benchmark. IEEE Access 10(2022):49233–49250.CrossrefGoogle Scholar
  • Cai J, Nguyen KN, Shrestha N, Good A, Tu R, Yu X, Zhe S, Serra T (2023) Getting away with more network pruning: From sparsity to geometry and linear regions. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (CPAIOR) (Springer, Cham, Switzerland), 200–218.Google Scholar
  • Cappart Q, Chételat D, Khalil EB, Lodi A, Morris C, Veličković P (2023) Combinatorial optimization and reasoning with graph neural networks. J. Machine Learn. Res. 24(130):1–61.Google Scholar
  • ElAraby M, Wolf G, Carvalho M (2023) OAMIP: Optimizing ANN architectures using mixed-integer programming. Internat. Conf. Integration Constraint Programming Artificial Intelligence Oper. Res. (CPAIOR) (Springer, Berlin, Heidelberg), 219–237.Google Scholar
  • Fischetti M, Jo J (2018) Deep neural networks and mixed integer linear optimization. Constraints 23(3):296–309.CrossrefGoogle Scholar
  • Gholami A, Kim S, Dong Z, Yao Z, Mahoney MW, Keutzer K (2022) A survey of quantization methods for efficient neural network inference. Thiruvathukal GK, Lu Y-H, Kim J, Chen Y, Chen B, eds. Low-Power Computer Vision (Chapman and Hall/CRC, New York), 291–326.CrossrefGoogle Scholar
  • Good A, Lin J, Yu X, Sieg H, Fergurson M, Zhe S, Wieczorek J, Serra T (2022) Recall distortion in neural network pruning and the undecayed pruning algorithm. Adv. Neural Inform. Processing Systems 35:32762–32776.Google Scholar
  • Gurobi Optimization LLC (2023) Gurobi Optimizer reference manual. Accessed August 9, 2023, https://www.gurobi.com.Google Scholar
  • Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Preprint, submitted October 1, https://arxiv.org/abs/1510.00149.Google Scholar
  • Huang T, Ferber AM, Tian Y, Dilkina B, Steiner B (2023) Searching large neighborhoods for integer linear programs with contrastive learning. Proc. 40th Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 13869–13890.Google Scholar
  • Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. Adv. Neural Inform. Processing Systems 29:4107–4115.Google Scholar
  • Huchette J, Muñoz G, Serra T, Tsay C (2023) When deep learning meets polyhedral theory: A survey. Preprint, submitted April 29, https://arxiv.org/abs/2305.00241.Google Scholar
  • Janosi A, Steinbrunn W, Pfisterer M, Detrano RU (1988) Heart disease data set. Accessed August 9, 2023, http://archive.ics.uci.edu/ml/datasets/Heart+Disease.Google Scholar
  • Jiang Y, Krishnan D, Mobahi H, Bengio S (2019) Predicting the generalization gap in deep networks with margin distributions. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
  • Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. Preprint, submitted October 16, https://arxiv.org/abs/1710.05468.Google Scholar
  • Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: Generalization gap and sharp minima. Internat. Conf. Learn. Representations (ICLR), vol. 5 (OpenReview.net).Google Scholar
  • Khalil EB, Gupta A, Dilkina B (2019) Combinatorial attacks on binarized neural networks. Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
  • Kurtz J, Bah B (2021) Efficient and robust mixed-integer optimization methods for training binarized deep neural networks. Preprint, submitted October 21, https://arxiv.org/abs/2110.11382.Google Scholar
  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.CrossrefGoogle Scholar
  • LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. Accessed August 9, 2023, http://yann.lecun.com/exdb/mnist.Google Scholar
  • Li X, Sun C, Ye Y (2020) Simple and fast algorithm for binary integer and online linear programming. Adv. Neural Inform. Processing Systems 33:9412–9421.Google Scholar
  • Lin X, Zhao C, Pan W (2017) Toward accurate binary convolutional neural network. Adv. Neural Inform. Processing Systems 30:345–353.Google Scholar
  • Lombardi M, Milano M (2018) Boosting combinatorial problem modeling with machine learning. Preprint, submitted July 15, https://arxiv.org/abs/1807.05517.Google Scholar
  • Mistry M, Letsios D, Krennrich G, Lee RM, Misener R (2021) Mixed-integer convex nonlinear optimization with gradient-boosted trees embedded. INFORMS J. Comput. 33(3):1103–1119.LinkGoogle Scholar
  • Moody J (1991) The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. Adv. Neural Inform. Processing Systems 4:847–854.Google Scholar
  • Morcos A, Yu H, Paganini M, Tian Y (2019) One ticket to win them all: Generalizing lottery ticket initializations across datasets and optimizers. Adv. Neural Inform. Processing Systems 32:4932–4942.Google Scholar
  • Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. Adv. Neural Inform. Processing Systems 30:5947–5956.Google Scholar
  • Patil V, Mintz Y (2022) A mixed-integer programming approach to training dense neural networks. Preprint, submitted January 3, https://arxiv.org/abs/2201.00723.Google Scholar
  • Roger C, Molina R, Rey C, Serra T, Puertas E, Pujol O (2022) Training thinner and deeper neural networks: Jumpstart regularization. Schaus P, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2022, Lecture Notes in Computer Science, vol. 13292 (Springer, Cham, Switzerland), 345–357.Google Scholar
  • Sakr C, Choi J, Wang Z, Gopalakrishnan K, Shanbhag N (2018) True gradient-based training of deep binary activated neural networks via continuous binarization. 2018 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 2346–2350.Google Scholar
  • Serra T, Kumar A, Ramalingam S (2020) Lossless compression of deep neural networks. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 417–430.Google Scholar
  • Serra T, Yu X, Kumar A, Ramalingam S (2021) Scaling up exact neural network compression by ReLU stability. Adv. Neural Inform. Processing Systems 34:27081–27093.Google Scholar
  • Tang W, Hua G, Wang L (2017) How to train a compact binary neural network with high accuracy? Thirty-First AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2625–2631.Google Scholar
  • Thorbjarnarson T, Yorke-Smith N (2023) Optimal training of integer-valued neural networks with mixed integer programming. PLoS One 18(2):e0261029.CrossrefGoogle Scholar
  • Tjandraatmadja C, Anderson R, Huchette J, Ma W, Patel KK, Vielma JP (2020) The convex relaxation barrier, revisited: Tightened single-neuron relaxations for neural network verification. Adv. Neural Inform. Processing Systems 33:21675–21686.Google Scholar
  • Tjeng V, Xiao KY, Tedrake R (2018) Evaluating robustness of neural networks with mixed integer programming. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
  • Toro Icarte R, Illanes L, Castro MP, Cire AA, McIlraith SA, Beck JC (2019) Training binarized neural networks using MIP and CP. Internat. Conf. Principles Practice Constraint Programming, vol. 11802 (Springer, Cham, Switzerland), 401–417.Google Scholar
  • Tsay C, Kronqvist J, Thebelt A, Misener R (2021) Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Adv. Neural Inform. Processing Systems 34:3068–3080.Google Scholar
  • Vanschoren J (2019) Meta-learning. Hutter F, Kotthoff L, Vanschoren J, eds. Automated Machine Learning, Springer Series on Challenges in Machine Learning (Springer, Cham, Switzerland), 35–61.CrossrefGoogle Scholar
  • Wang K, Lozano L, Cardonha C, Bergman D (2023) Optimizing over an ensemble of trained neural networks. INFORMS J. Comput. 35(3):652–674.LinkGoogle Scholar
  • Williams HP (2013) Model Building in Mathematical Programming (John Wiley & Sons, Chichester, UK).Google Scholar
  • Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: A novel image data set for benchmarking machine learning algorithms. Preprint, submitted August 25, https://arxiv.org/abs/1708.07747.Google Scholar
  • Ye H, Xu H, Wang H, Wang C, Jiang Y (2023) GNN&GBDT-guided fast optimizing framework for large-scale integer programming. Proc. 40th Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 39864–39878.Google Scholar
  • Young HP (1988) Condorcet’s theory of voting. Amer. Political Sci. Rev. 82(4):1231–1244.CrossrefGoogle Scholar
  • Yu X, Serra T, Ramalingam S, Zhe S (2022) The combinatorial brain surgeon: Pruning weights that cancel one another in neural networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York), 25668–25683.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.