When Deep Learning Meets Polyhedral Theory: A Survey

Published Online:https://doi.org/10.1287/ijoc.2024.0902

References

  • Abrahamsen M, Kleist L, Miltzow T (2021) Training neural networks is ∃ℝ-complete. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 18293–18306.Google Scholar
  • Agostinelli F, Hoffman M, Sadowski P, Baldi P (2015) Learning activation functions to improve deep neural networks. Preprint, submitted April 21, https://arxiv.org/abs/1412.6830.Google Scholar
  • Alcántara A, Ruiz C (2023) A neural network-based distributional constraint learning methodology for mixed-integer stochastic optimization. Expert Systems Appl. 232:120895.CrossrefGoogle Scholar
  • Alcántara A, Ruiz C, Tsay C (2025) A quantile neural network framework for two-stage stochastic optimization. Expert Systems Appl. 284:127876.CrossrefGoogle Scholar
  • Amrami A, Goldberg Y (2021) A simple geometric proof for the benefit of depth in ReLU networks. Preprint, submitted January 18, https://arxiv.org/abs/2101.07126.Google Scholar
  • Anderson R, Huchette J, Tjandraatmadja C, Vielma J (2019) Strong mixed-integer programming formulations for trained neural networks. Lodi A, Nagarajan V, eds. Integer Programming Combin. Optim. IPCO 2019, Lecture Notes in Computer Science, vol. 11480 (Springer, Cham, Switzerland), 27–42.Google Scholar
  • Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1–2):3–39.CrossrefGoogle Scholar
  • Anil C, Lucas J, Grosse R (2019) Sorting out Lipschitz function approximation. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (ICML), vol. 97 (PMLR, New York), 291–301.Google Scholar
  • Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. Precup D, Teh YW, eds. Internat. Conf. Machine Learn. (ICML), vol. 70 (JMLR), 214–223.Google Scholar
  • Arora R, Basu A, Mianjy P, Mukherjee A (2018) Understanding deep neural networks with rectified linear units. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Averkov G, Hojny C, Merkert M (2025) On the expressiveness of rational ReLU neural networks with bounded depth. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Aziznejad S, Gupta H, Campos J, Unser M (2020) Deep neural networks with trainable activations and controlled Lipschitz constant. IEEE Trans. Signal Processing 68:4688–4699.CrossrefGoogle Scholar
  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Bakaev E, Brunck F, Hertrich C, Reichman D, Yehudayoff A (2025a) On the depth of monotone ReLU neural networks and ICNNs. Preprint, submitted May 9, https://arxiv.org/abs/2505.06169.Google Scholar
  • Bakaev E, Brunck F, Hertrich C, Stade J, Yehudayoff A (2025b) Better neural network expressivity: Subdividing the simplex. Preprint, submitted May 20, https://arxiv.org/abs/2505.14338.Google Scholar
  • Balas E (1998) Disjunctive programming: Properties of the convex hull of feasible points. Discrete Appl. Math. 89(1–3):3–44.CrossrefGoogle Scholar
  • Balas E (2018) Disjunctive Programming (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • Balas E, Ceria S, Cornuéjols G (1993) A lift-and-project cutting plane algorithm for mixed 0–1 programs. Math. Programming 58(1–3):295–324.CrossrefGoogle Scholar
  • Balas E, Ceria S, Cornuéjols G (1996) Mixed 0-1 programming by lift-and-project in a branch-and-cut framework. Management Sci. 42(9):1229–1246.LinkGoogle Scholar
  • Balestriero R, Baraniuk RG (2018) A spline theory of deep networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York).Google Scholar
  • Balunović M, Vechev M (2020) Adversarial training and provable defenses: Bridging the gap. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Batten B, Kouvaros P, Lomuscio A, Zheng Y (2021) Efficient neural network verification via layer-based semidefinite relaxations and linear cuts. Zhou Z, ed. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 2184–2190.Google Scholar
  • Bengio Y (2009) Learning deep architectures for AI. Foundations Trends Machine Learn. 2(1):1–127.CrossrefGoogle Scholar
  • Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290(2):405–421.CrossrefGoogle Scholar
  • Bennett KP (1992) Decision tree construction via linear programming. Technical report, University of Wisconsin-Madison Department of Computer Sciences, Madison.Google Scholar
  • Bennett KP, Mangasarian OL (1990) Neural network training via linear programming. Computer Sciences Technical Report 1067, University of Wisconsin–Madison Department of Computer Sciences, Madison.Google Scholar
  • Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Software 1(1):23–34.CrossrefGoogle Scholar
  • Benussi E, Patane A, Wicker M, Laurenti L, Kwiatkowska M (2022) Individual fairness guarantees for neural networks. De Raedt L, ed. Proc. 31st Internat. Joint Conf. Artificial Intelligence (IJCAI), 651–658.Google Scholar
  • Bergman D, Huang T, Brooks P, Lodi A, Raghunathan AU (2022) JANOS: An integrated predictive and prescriptive modeling framework. INFORMS J. Comput. 34(2):807–816.LinkGoogle Scholar
  • Bernardelli AM, Gualandi S, Lau HC, Milanesi S (2023) The bemi stardust: A structured ensemble of binarized neural networks. Sellmann M, Tierney K, eds. Learn. Intelligent Optim. LION 2023, Lecture Notes in Computer Science, vol. 14286 (Springer, Cham, Switzerland), 443–458.Google Scholar
  • Berner C, Brockman G, Chan B, Cheung V, Dȩbiak P, Dennison C, Farhi D, et al. (2019) Dota 2 with large scale deep reinforcement learning. Preprint, submitted December 13, https://arxiv.org/abs/1912.06680.Google Scholar
  • Berrada L, Zisserman A, Mudigonda P (2019) Deep Frank-Wolfe for neural network optimization. 7th Internat. Conf. Learn. Representations (ICLR 2019).Google Scholar
  • Bertschinger D, Hertrich C, Jungeblut P, Miltzow T, Weber S (2024) Training fully connected neural networks is ∃R-complete. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 36222–36237.Google Scholar
  • Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput. Chemical Engrg. 108:250–267.CrossrefGoogle Scholar
  • Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Trans. Neural Networks Learn. Systems 25(8):1553–1565.CrossrefGoogle Scholar
  • Biau G, Sangnier M, Tanielian U (2021) Some theoretical insights into Wasserstein GANs. J. Machine Learn. Res. 22(1):5287–5331.Google Scholar
  • Bienstock D, Muñoz G (2018) LP formulations for polynomial optimization problems. SIAM J. Optim. 28(2):1121–1150.CrossrefGoogle Scholar
  • Bienstock D, Muñoz G, Pokutta S (2023) Principled deep neural network training through linear programming. Discrete Optim. 49:100795.CrossrefGoogle Scholar
  • Blum AL, Rivest RL (1992) Training a 3-node neural network is NP-complete. Neural Networks 5(1):117–127.CrossrefGoogle Scholar
  • Bohra P, Campos J, Gupta H, Aziznejad S, Unser M (2020) Learning activation functions in deep (spline) neural networks. IEEE Open J. Signal Processing 1:295–309.CrossrefGoogle Scholar
  • Bonami P, Lodi A, Tramontani A, Wiese S (2015) On mathematical programming with indicator constraints. Math. Programming 151:191–223.CrossrefGoogle Scholar
  • Boob D, Dey SS, Lan G (2022) Complexity of training ReLU neural network. Discrete Optim. 44:100620.CrossrefGoogle Scholar
  • Botoeva E, Kouvaros P, Kronqvist J, Lomuscio A, Misener R (2020) Efficient verification of ReLU-based neural networks via dependency analysis. Proc. AAAI Conf. Artificial Intelligence 34(4):3291–3299.CrossrefGoogle Scholar
  • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.CrossrefGoogle Scholar
  • Brandenburg MC, Grillo ML, Hertrich C (2025) Decomposition polyhedra of piecewise linear functions. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Bridle JS (1990) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Soulié FF, Hérault J, eds. Neurocomput., NATO ASI Series, vol. 68 (Springer, Berlin, Heidelberg), 227–236.Google Scholar
  • Bubeck S (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3–4):231–357.CrossrefGoogle Scholar
  • Bunel RR, Hinder O, Bhojanapalli S, Dvijotham K (2020c) An efficient nonconvex reformulation of stagewise convex optimization problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8247–8258.Google Scholar
  • Bunel RR, Turkaslan I, Torr P, Kohli P, Mudigonda PK (2018) A unified view of piecewise linear neural network verification. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4795–4804.Google Scholar
  • Bunel R, Mudigonda P, Turkaslan I, Torr P, Lu J, Kohli P (2020b) Branch and bound for piecewise linear neural network verification. J. Machine Learn. Res. 21(1):1574–1612.Google Scholar
  • Bunel R, De Palma A, Desmaison A, Dvijotham K, Kohli P, Torr P, Pawan Kumar M (2020a) Lagrangian decomposition for neural network verification. Peters J, Sontag D, eds. Proc. 36th Conf. Uncertainty Artificial Intelligence (UAI), vol. 124 (PMLR, New York), 370–379.Google Scholar
  • Burer S, Monteiro RD (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Programming 95(2):329–357.CrossrefGoogle Scholar
  • Burtea R, Tsay C (2024) Constrained continuous-action reinforcement learning for supply chain inventory management. Comput. Chemical Engrg. 181:108518.CrossrefGoogle Scholar
  • Cai J, Nguyen KN, Shrestha N, Good A, Tu R, Yu X, Zhe S, Serra T (2023) Getting away with more network pruning: From sparsity to geometry and linear regions. Cire AA, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2023, Lecture Notes in Computer Science, vol. 13884 (Springer, Cham, Switzerland), 200–218.Google Scholar
  • Carrasco P, Muñoz G (2024) Tightening convex relaxations of trained neural networks: A unified approach for convex and s-shaped activations. Preprint, submitted October 30, https://arxiv.org/abs/2410.23362.Google Scholar
  • Ceccon F, Jalving J, Haddad J, Thebelt A, Tsay C, Laird CD, Misener R (2022) OMLT: Optimization & machine learning toolkit. J. Machine Learn. Res. 23(1):15829–15836.Google Scholar
  • Charisopoulos V, Maragos P (2018) A tropical approach to neural networks with piecewise linear activations. Preprint, submitted May 22, https://arxiv.org/abs/1805.08749.Google Scholar
  • Chaudhry A, Khan N, Dokania P, Torr P (2020) Continual learning in low-rank orthogonal subspaces. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9900–9911.Google Scholar
  • Chen KL, Garudadri H, Rao BD (2022a) Improved bounds on neural complexity for representing piecewise linear functions. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 7167–7180.Google Scholar
  • Chen W, Gong X, Wang Z (2021) Neural architecture search on ImageNet in four GPU hours: A theoretically inspired perspective. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Chen S, Klivans AR, Meka R (2022b) Learning deep ReLU networks is fixed-parameter tractable. 2021 IEEE 62nd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 696–707.Google Scholar
  • Chen H, Wang YG, Xiong H (2023a) Lower and upper bounds for numbers of linear regions of graph convolutional networks. Neural Networks 168:394–404.CrossrefGoogle Scholar
  • Chen T, Lasserre JB, Magron V, Pauwels E (2020) Semialgebraic optimization for Lipschitz constants of ReLU networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 19189–19200.Google Scholar
  • Chen W, Gong X, Wu J, Wei Y, Shi H, Yan Z, Yang Y, Wang Z (2023b) Understanding and accelerating neural architecture search with training-free and theory-grounded metrics. IEEE Trans. Pattern Anal. Machine Intelligence 46(2):749–763.CrossrefGoogle Scholar
  • Cheng C, Nührenberg G, Ruess H (2017) Maximum resilience of artificial neural networks. D’Souza D, Narayan Kumar K, eds. Automated Tech. Verification Anal. ATVA 2017, Lecture Notes in Computer Science, vol. 10482 (Springer, Cham, Switzerland), 251–268.Google Scholar
  • Cheng CH, Nührenberg G, Huang CH, Ruess H (2018) Verification of binarized neural networks via inter-neuron factoring: (short paper). Piskac R, Rümmer P, eds. Verified Software. Theories Tools Experiments. VSTTE 2018, Lecture Notes in Computer Science, vol. 11294 (Springer, Cham, Switzerland), 279–290.Google Scholar
  • Cheon MS (2022) An outer-approximation guided optimization approach for constrained neural network inverse problems. Math. Programming 196(1–2):173–202.CrossrefGoogle Scholar
  • Chu L, Hu X, Hu J, Wang L, Pei J (2018) Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. KDD’18: Proc. ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1244–1253.Google Scholar
  • Ciresan D, Meier U, Masci J, Schmidhuber J (2012) Multi column deep neural network for traffic sign classification. Neural Networks 32:333–338.CrossrefGoogle Scholar
  • Cisse M, Bojanowski P, Grave E, Dauphin Y, Usunier N (2017) Parseval networks: Improving robustness to adversarial examples. Precup D, Teh YW, eds. ICML’17: Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR), 854–863.Google Scholar
  • Cohan S, Kim NH, Rolnick D, van de Panne M (2022) Understanding the evolution of linear regions in deep reinforcement learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10891–10903.Google Scholar
  • Collobert R (2004) Large scale machine learning. PhD thesis, University of Paris, Paris.Google Scholar
  • Combettes PL, Pesquet JC (2020) Lipschitz certificates for layered network structures driven by averaged activation operators. SIAM J. Math. Data Sci. 2(2):529–557.CrossrefGoogle Scholar
  • Courbariaux M, Bengio Y, David JP (2015) BinaryConnect: Training deep neural networks with binary weights during propagations. Cortes C, Lee DD, Sugiyama M, Garnett R, eds. NIPS’15: Proc. 29th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 3123–3131.Google Scholar
  • Craighero F, Angaroni F, Graudenzi A, Stella F, Antoniotti M (2020a) Investigating the compositional structure of deep neural networks. Nicosia G, ed. Machine Learn. Optim. Data Sci. LOD 2020, Lecture Notes in Computer Science, vol. 12565 (Springer, Cham, Switzerland), 322–334.Google Scholar
  • Craighero F, Angaroni F, Graudenzi A, Stella F, Antoniotti M (2020b) Understanding deep learning with activation pattern diagrams. CEUR Workshop Proc. 2742:119–126Google Scholar
  • Croce F, Hein M (2018) A randomized gradient-free attack on ReLU networks. Brox T, Bruhn A, Fritz M, eds. Pattern Recognition. GCPR 2018, Lecture Notes in Computer Science, vol. 11269 (Springer, Cham, Switzerland), 215–227.Google Scholar
  • Croce F, Andriushchenko M, Hein M (2019) Provable robustness of ReLU networks via maximization of linear regions. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 89 (PMLR, New York), 2057–2066.Google Scholar
  • Croce F, Rauber J, Hein M (2020) Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. Internat. J. Comput. Vision 128:1028–1046.CrossrefGoogle Scholar
  • Croxton KL, Gendron B, Magnanti TL (2003) A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Management Sci. 49(9):1268–1273.LinkGoogle Scholar
  • Curtis FE, Scheinberg K (2017) Optimization methods for supervised machine learning: From linear models to deep learning. INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 89–114.LinkGoogle Scholar
  • Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2:303–314.CrossrefGoogle Scholar
  • Danna E, Fenelon M, Gu Z, Wunderling R (2007) Generating multiple solutions for mixed integer programming problems. Fischetti M, Williamson DP, eds. Integer Programming Combin. Optim. IPCO 2019, Lecture Notes in Computer Science, vol. 4513 (Springer, Cham, Switzerland), 280–294.Google Scholar
  • Dantzig GB (1960) On the significance of solving linear programming problems with some integer variables. Econometrica 28(1):30–44.CrossrefGoogle Scholar
  • Dantzig GB, Eaves BC (1973) Fourier-Motzkin elimination and its dual. J. Combin. Theory Series A 14(3):288–297.CrossrefGoogle Scholar
  • Dathathri S, Dvijotham K, Kurakin A, Raghunathan A, Uesato J, Bunel RR, Shankar S, et al. (2020) Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 5318–5331.Google Scholar
  • Daubechies I, DeVore R, Foucart S, Hanin B, Petrova G (2022) Nonlinear approximation and (deep) ReLU networks. Constructive Approximation 55:127–172.CrossrefGoogle Scholar
  • De Palma A, Behl H, Bunel RR, Torr P, Kumar MP (2021) Scaling the convex barrier with active sets. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Delarue A, Anderson R, Tjandraatmadja C (2020) Reinforcement learning with combinatorial actions: An application to vehicle routing. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 609–620.Google Scholar
  • Deng Y, Zheng X, Zhang T, Chen C, Lou G, Kim M (2020) An analysis of adversarial attacks and defenses on autonomous driving models. 2020 IEEE Internat. Conf. Pervasive Comput. Comm. (PerCom) (IEEE, Piscataway, NJ), 1–10.Google Scholar
  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL) (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
  • Dey SS, Wang G, Xie Y (2020) Approximation algorithms for training one-node ReLU neural networks. IEEE Trans. Signal Processing 68:6696–6706.CrossrefGoogle Scholar
  • Dowson O, Parker RB, Bent R (2025) MathOptAI.jl: Embed trained machine learning predictors into JuMP models. Preprint, submitted July 3, https://arxiv.org/abs/2507.03159.Google Scholar
  • Dubey SR, Singh SK, Chaudhuri BB (2022) Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomput. 503:92–108.CrossrefGoogle Scholar
  • Dutta S, Jha S, Sankaranarayanan S, Tiwari A (2018) Output range analysis for deep feedforward networks. Dutle A, Muñoz C, Narkawicz A, eds. NASA Formal Methods. NFM 2018, Lecture Notes in Computer Science, vol. 10811 (Springer, Cham, Switzerland), 121–138.Google Scholar
  • Dvijotham K, Stanforth R, Gowal S, Mann TA, Kohli P (2018a) A dual approach to scalable verification of deep networks. Globerson A, Silva R, eds. Conf. Uncertainty Artificial Intelligence (UAI) (Monterey), 550–559.Google Scholar
  • Dvijotham K, Gowal S, Stanforth R, Arandjelovic R, O’Donoghue B, Uesato J, Kohli P (2018b) Training verified learners with learned verifiers. Preprint, submitted May 25, https://arxiv.org/abs/1805.10265.Google Scholar
  • Dym N, Sober B, Daubechies I (2020) Expression of fractals through neural network functions. IEEE J. Selected Areas Inform. Theory 1(1):57–66.CrossrefGoogle Scholar
  • Ehlers R (2017) Formal verification of piece-wise linear feed-forward neural networks. D’Souza D, Narayan Kumar K, eds. Automated Tech. Verification Anal. ATVA 2017, Lecture Notes in Computer Science, vol. 10482 (Springer, Cham, Switzerland), 269–286.Google Scholar
  • ElAraby M, Wolf G, Carvalho M (2020) Identifying efficient sub-networks using mixed integer programming. OPT2020: 12th Annual Workshop Optim. Machine Learn.Google Scholar
  • Elsken T, Metzen JH, Hutter F (2019) Neural architecture search: A survey. J. Machine Learn. Res. 20(1):1997–2017.Google Scholar
  • Ergen E, Grillo M (2024) Topological expressivity of ReLU neural networks. Agrawal S, Roth A, eds. Proc. 37th Conf. Learn. Theory (COLT), vol. 247 (PMLR, New York), 1599–1642.Google Scholar
  • Ergen T, Pilanci M (2020) Convex geometry of two-layer ReLU networks: Implicit autoencoding and interpretable models. Chiappa S, Calandra R, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 108 (PMLR, New York), 4024–4033.Google Scholar
  • Ergen T, Pilanci M (2021a) Convex geometry and duality of over-parameterized neural networks. J. Machine Learn. Res. 22(1):9646–9708.Google Scholar
  • Ergen T, Pilanci M (2021b) Global optimality beyond two layers: Training deep ReLU networks via convex programs. Proc. 38th Internat. Conf. Machine Learn. (ICLR) (PMLR, New York), 2993–3003.Google Scholar
  • Ergen T, Pilanci M (2021c) Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Ergen T, Pilanci M (2021d) Revealing the structure of deep neural networks via convex duality. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 3004–3014.Google Scholar
  • Ergen T, Pilanci M (2024) Path regularization: A convexity and sparsity inducing regularization for parallel ReLU networks. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 59761–59786.Google Scholar
  • Ergen T, Pilanci M (2025) The convex landscape of neural networks: Characterizing global optima and stationary points via Lasso models. IEEE Trans. Inform. Theory 71(5):3854–3870.CrossrefGoogle Scholar
  • Ergen T, Gulluk HI, Lacotte J, Pilanci M (2023) Globally optimal training of neural networks with threshold activation functions. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Ergen T, Sahiner A, Ozturkler B, Pauly JM, Mardani M, Pilanci M (2022) Demystifying batch normalization in ReLU networks: Equivalent convex optimization models and implicit regularization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song D (2018) Robust physical-world attacks on deep learning visual classification. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
  • Fan F, Lai R, Wang G (2023a) Quasi-equivalence between width and depth of neural networks. J. Machine Learn. Res. 24(183):1–22.Google Scholar
  • Fan FL, Huang W, Zhong X, Ruan L, Zeng T, Xiong H, Wang F (2023b) Deep ReLU networks have surprisingly simple polytopes. Preprint, submitted May 16, https://arxiv.org/abs/2305.09145.Google Scholar
  • Fazlyab M, Morari M, Pappas GJ (2020) Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Trans. Automatic Control 67(1):1–15.CrossrefGoogle Scholar
  • Fazlyab M, Robey A, Hassani H, Morari M, Pappas GJ (2019) Efficient and accurate estimation of Lipschitz constants for deep neural networks. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 11427–11438.Google Scholar
  • Ferlez J, Shoukry Y (2020) AReN: Assured ReLU NN architecture for model predictive control of LTI systems. HSCC’20: Proc. 23rd Internat. Conf. Hybrid Systems: Comput. Control (Association for Computing Machinery, New York), 1–11.Google Scholar
  • Ferrari C, Mueller MN, Jovanović N, Vechev M (2022) Complete verification via multi-neuron relaxation guided branch-and-bound. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS (2019) Adversarial attacks on medical machine learning. Science 363(6433):1287–1289.CrossrefGoogle Scholar
  • Fischetti M, Jo J (2018) Deep neural networks and mixed integer linear optimization. Constraints 23:296–309.CrossrefGoogle Scholar
  • Fourier J (1826) Solution d’une question particuliére du calcul des inégalités. Nouveau Bull. Des Sci. Par la Société Philomatique de Paris.Google Scholar
  • Francobaldi M, Lombardi M (2025) SMLE: Safe machine learning via embedded overapproximation. AAAI Conf. Artificial Intelligence 39(26):27286–27294.Google Scholar
  • Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.CrossrefGoogle Scholar
  • Froese V, Hertrich C (2024) Training neural networks is NP-hard in fixed dimension. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 44039–44049.Google Scholar
  • Froese V, Grillo M, Skutella M (2024) Complexity of injectivity and verification of ReLU neural networks. Preprint, submitted May 30, https://arxiv.org/abs/2405.19805.Google Scholar
  • Froese V, Hertrich C, Niedermeier R (2022) The computational complexity of ReLU network training parameterized by data dimensionality. J. Artificial Intelligence Res. 74:1775–1790.CrossrefGoogle Scholar
  • Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics 36:193–202.CrossrefGoogle Scholar
  • Funahashi KI (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3):183–192.CrossrefGoogle Scholar
  • Gamba M, Carlsson S, Azizpour H, Björkman M (2020) Hyperplane arrangements of trained ConvNets are biased. Preprint, submitted March 17, https://arxiv.org/abs/2003.07797.Google Scholar
  • Gamba M, Chmielewski-Anders A, Sullivan J, Azizpour H, Björkman M (2022) Are all linear regions created equal? Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 6573–6590.Google Scholar
  • Gambella C, Ghaddar B, Naoum-Sawaya J (2021) Optimization problems for machine learning: A survey. Eur. J. Oper. Res. 290(3):807–828.CrossrefGoogle Scholar
  • Gao J, Sun C, Zhao H, Shen Y, Anguelov D, Li C, Schmid C (2020) VectorNet: Encoding HD maps and agent dynamics from vectorized representation. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
  • Geißler B, Martin A, Morsi A, Schewe L (2012) Using piecewise linear functions for solving MINLPs. Lee J, Leyffer S, eds. Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications, vol. 154 (Springer, New York), 287–314.Google Scholar
  • Glass L, Hilali W, Nelles O (2021) Compressing interpretable representations of piecewise linear neural networks using neuro-fuzzy models. IEEE Sympos. Series Comput. Intelligence (SSCI) (IEEE, Piscataway, NJ), 2057–2066.Google Scholar
  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Gordon G, Dunson D, Dudík M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 315–323.Google Scholar
  • Goebbels S (2021) Training of ReLU activated multilayerd neural networks with mixed integer linear programs. Technical Report No. 2021-01, Hochschule Niederrhein, Fachbereich Elektrotechnik & Informatik, Krefeld, Germany.Google Scholar
  • Goel S, Klivans A, Manurangsi P, Reichman D (2021) Tight hardness results for training depth-2 ReLU networks. 12th Innovations Theoret. Comput. Sci. Conf. (ITCS 2021), Leibniz International Proceedings in Informatics (LIPIcs), vol. 185 (Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Wadern, Germany), 22:1–22:14.Google Scholar
  • Goerigk M, Kurtz J (2023) Data-driven robust optimization using deep neural networks. Comput. Oper. Res. 151:106087.CrossrefGoogle Scholar
  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
  • Goodfellow I, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (JMLR), 1319–1327.Google Scholar
  • Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 2672–2680.Google Scholar
  • Gopinath D, Converse H, Pasareanu CS, Taly A (2019) Property inference for deep neural networks. ASE’19: Proc. 34th IEEE/ACM Internat. Conf. Automated Software Engrg. (IEEE, Piscataway, NJ), 797–809.Google Scholar
  • Goujon A, Etemadi A, Unser M (2024) On the number of regions of piecewise linear neural networks. J. Comput. Appl. Math. 441:115667.CrossrefGoogle Scholar
  • Gowal S, Dvijotham K, Stanforth R, Bunel R, Qin C, Uesato J, Arandjelovic R, Mann T, Kohli P (2018) On the effectiveness of interval bound propagation for training verifiably robust models. Preprint, submitted October 30, https://arxiv.org/abs/1810.12715.Google Scholar
  • Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. Xing EP, Jebara T, eds. ICML’14: Proc. 31st Internat. Conf. Machine Learn. (ICML), vol. 32 (JMLR), II-1764–II-1772.Google Scholar
  • Grigsby JE, Lindsey K (2022) On transversality of bent hyperplane arrangements and the topological expressiveness of ReLU neural networks. SIAM J. Appl. Algebra Geometry 6(2):216–242.CrossrefGoogle Scholar
  • Grigsby JE, Lindsey K, Rolnick D (2023) Hidden symmetries of ReLU networks. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlet J, eds. ICML’23: Proc. 40th Internat. Conf. Machine Learn. (ICML) (JMLR), 11734–11760.Google Scholar
  • Grillo M, Hertrich C, Loho G (2025) Depth-bounds for neural networks via the braid arrangement. Thirty-ninth Annual Conf. Neural Inform. Processing Systems (OpenReview).Google Scholar
  • Grimstad B, Andersson H (2019) ReLU networks as surrogate models in mixed-integer linear programs. Comput. Chemical Engrg. 131:106580.CrossrefGoogle Scholar
  • Grossmann IE, Ruiz JP (2012) Generalized disjunctive programming: A framework for formulation and alternative algorithms for MINLP optimization. Lee J, Leyffer S, eds. Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications, vol. 154 (Springer, New York), 93–115.Google Scholar
  • Haase CA, Hertrich C, Loho G (2023) Lower bounds on the depth of integral ReLU neural networks via lattice polytopes. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Hahnloser R, Sarpeshkar R, Mahowald M, Douglas R, Seung S (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405:947–951.CrossrefGoogle Scholar
  • Han S, Gómez A (2021) Single-neuron convexification for binarized neural networks. Preprint, submitted May 27, https://optimization-online.org/?p=17148.Google Scholar
  • Hanin B, Rolnick D (2019a) Complexity of linear regions in deep networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York).Google Scholar
  • Hanin B, Rolnick D (2019b) Deep ReLU networks have surprisingly few activation patterns. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 18293–18306.Google Scholar
  • Hanin B, Sellke M (2017) Approximating continuous functions by ReLU nets of minimal width. Preprint, submitted October 31, https://arxiv.org/abs/1710.11278.Google Scholar
  • Hashemi V, Kouvaros P, Lomuscio A (2021) OSIP: Tightened bound propagation for the verification of ReLU neural networks. Calinescu R, Păsăreanu CS, eds. Software Engrg. Formal Methods. SEFM 2021, Lecture Notes in Computer Science, vol. 13085 (Springer, Cham, Switzerland), 463–480.Google Scholar
  • He F, Lei S, Ji J, Tao D (2021) Neural networks behave as hash encoders: An empirical study. Preprint, submitted January 14, https://arxiv.org/abs/2101.05490.Google Scholar
  • He J, Li L, Xu J, Zheng C (2020) ReLU deep neural networks and linear finite elements. J. Comput. Math. 38(3):502–527.CrossrefGoogle Scholar
  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. IEEE Internat. Conf. Comput. Vision (ICCV) (IEEE, Piscataway, NJ), 1026–1034.Google Scholar
  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
  • Henriksen P, Lomuscio A (2021) DEEPSPLIT: An efficient splitting method for neural network verification via indirect effect analysis. Zhou Z, ed. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 2549–2555.Google Scholar
  • Henriksen P, Leofante F, Lomuscio A (2022) Repairing misclassifications in neural networks using limited data. SAC’22: Proc. ACM/SIGAPP Sympos. Appl. Comput. (Association for Computing Machinery, New York), 1625–1634.Google Scholar
  • Hertrich C, Loho G (2024) Neural networks and (virtual) extended formulations. Preprint, submitted November 5, https://arxiv.org/abs/2411.03006.Google Scholar
  • Hertrich C, Basu A, Di Summa M, Skutella M (2023) Towards lower bounds on the depth of ReLU neural networks. SIAM J. Discrete Math. 37(2):997–1029.CrossrefGoogle Scholar
  • Hinton G, Deng L, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 29(6):82–97.CrossrefGoogle Scholar
  • Hinz P (2021) Using activation histograms to bound the number of affine regions in ReLU feed-forward neural networks. Preprint, submitted March 31, https://arxiv.org/abs/2103.17174.Google Scholar
  • Hinz P, van de Geer S (2019) A framework for the construction of upper bounds on the number of affine linear regions of ReLU feed-forward neural networks. IEEE Trans. Inform. Theory 65(11):7304–7324.CrossrefGoogle Scholar
  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.CrossrefGoogle Scholar
  • Hopfield J (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8):2554–2558.CrossrefGoogle Scholar
  • Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.CrossrefGoogle Scholar
  • Hu T, Shang Z, Cheng G (2020a) Sharp rate of convergence for deep neural network classifiers under the teacher-student setting. Preprint, submitted January 19, https://arxiv.org/abs/2001.06892.Google Scholar
  • Hu X, Liu W, Bian J, Pei J (2020b) Measuring model complexity of neural networks with curve activation functions. KDD’20: Proc. 26th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1521–1531.Google Scholar
  • Hu X, Chu L, Pei J, Liu W, Bian J (2021) Model complexity of deep learning: A survey. Knowledge Inform. Systems 63:2585–2619.CrossrefGoogle Scholar
  • Huang Y, Zhang H, Shi Y, Kolter JZ, Anandkumar A (2021) Training certifiably robust neural networks with efficient local Lipschitz bounds. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 22745–22757.Google Scholar
  • Huang X, Kroening D, Ruan W, Sharp J, Sun Y, Thamo E, Wu M, Yi X (2020) A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37:100270.CrossrefGoogle Scholar
  • Huchette J, Vielma JP (2022) Nonconvex piecewise linear functions: Advanced formulations and simple modeling tools. Oper. Res. 71(5):1835–1856.LinkGoogle Scholar
  • Huster T, Chiang CYJ, Chadha R (2018) Limitations of the Lipschitz constant as a defense against adversarial examples. Alzate C, ed. ECML PKDD 2018 Workshops. ECML PKDD 2018, Lecture Notes in Computer Science, vol. 11329 (Springer, Cham, Switzerland), 16–29.Google Scholar
  • Hwang WL, Heinecke A (2020) Un-rectifying non-linear networks for signal representation. IEEE Trans. Signal Processing 68:196–210.CrossrefGoogle Scholar
  • Icarte RT, Illanes L, Castro MP, Cire AA, McIlraith SA, Beck JC (2019) Training binarized neural networks using MIP and CP. Schiex T, de Givry S, eds. Principles Practice Constraint Programming. CP 2019, Lecture Notes in Computer Science, vol. 11802 (Springer, Cham, Switzerland), 401–417.Google Scholar
  • Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Bach F, Blei D, eds. ICML’15: Proc. 32nd Internat. Conf. Machine Learn. (ICML), vol. 37 (JMLR), 448–456.Google Scholar
  • Jeroslow RG, Lowe JK (1984) Modelling with integer variables. Korte B, Ritter K, eds. Mathematical Programming at Oberwolfach II, Mathematical Programming Studies, vol. 22 (Springer, Berlin, Heidelberg), 167–184.CrossrefGoogle Scholar
  • Jia K, Rinard M (2020) Efficient exact verification of binarized neural networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1782–1795.Google Scholar
  • Johnson TT, Lopez DM, Musau P, Tran HD, Botoeva E, Leofante F, Maleki A, Sidrane C, Fan J, Huang C (2020) ARCH-COMP20 category report: Artificial intelligence and neural network control systems (AINNCS) for continuous and hybrid systems plants. EPiC Series Comput. 74:107–139.CrossrefGoogle Scholar
  • Jordan M, Dimakis AG (2020) Exactly computing the local Lipschitz constant of ReLU networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 7344–7353.Google Scholar
  • Jordan M, Lewis J, Dimakis AG (2019) Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 14082–14092.Google Scholar
  • Kaleem W, Subramanyam A (2024) Neural embedded mixed-integer optimization for location-routing problems. Preprint, submitted December 7, https://arxiv.org/abs/2412.05665.Google Scholar
  • Kanamori K, Takagi T, Kobayashi K, Ike Y, Uemura K, Arimura H (2021) Ordered counterfactual explanation by mixed-integer linear optimization. Proc. AAAI Conf. Artificial Intelligence 35(13):11564–11574.CrossrefGoogle Scholar
  • Karg B, Lucia S (2020) Efficient representation and approximation of model predictive control laws via deep learning. IEEE Trans. Cybernetics 50(9):3866–3878.CrossrefGoogle Scholar
  • Karia T, Lastrucci G, Schweidtmann AM (2025) Deterministic global optimization over trained Kolmogorov Arnold networks. Preprint, submitted March 4, https://arxiv.org/abs/2503.02807.Google Scholar
  • Katz J, Pappas I, Avraamidou S, Pistikopoulos EN (2020) Integrating deep learning models and multiparametric programming. Comput. Chemical Engrg. 136:106801.CrossrefGoogle Scholar
  • Katz G, Barrett C, Dill DL, Julian K, Kochenderfer MJ (2017) Reluplex: An efficient SMT solver for verifying deep neural networks. Majumdar R, Kunčak V, eds. Comput. Aided Verification. CAV 2017, Lecture Notes in Computer Science, vol. 10426 (Springer, Cham, Switzerland), 97–117.Google Scholar
  • Katz G, Huang DA, Ibeling D, Julian K, Lazarus C, Lim R, Shah P, et al. (2019) The marabou framework for verification and analysis of deep neural networks. Dillig I, Tasiran S, eds. Comput. Aided Verification. CAV 2019, Lecture Notes in Computer Science, vol. 11561 (Springer, Cham, Switzerland), 443–452.Google Scholar
  • Keup C, Helias M (2022) Origami in N dimensions: How feed-forward networks manufacture linear separability. Preprint, submitted March 21, https://arxiv.org/abs/2203.11355.Google Scholar
  • Khalife S, Basu A (2022) Neural networks with linear threshold activations: Structure and algorithms. Aardal K, Sanità L, eds. Integer Programming Combin. Optim. IPCO 2022, Lecture Notes in Computer Science, vol. 13265 (Springer, Cham, Switzerland), 347–360.Google Scholar
  • Khalife S, Cheng H, Basu A (2023) Neural networks with linear threshold activations: Structure and algorithms. Math. Programming 206:333–356.CrossrefGoogle Scholar
  • Khedr H, Ferlez J, Shoukry Y (2021) Peregrinn: Penalized-relaxation greedy neural network verifier. Internat. Conf. Comput. Aided Verification (Springer, Cham, Switzerland), 287–300.Google Scholar
  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980.Google Scholar
  • Kody A, Chevalier S, Chatzivasileiadis S, Molzahn D (2022) Modeling the ac power flow equations with optimally compact neural networks: Application to unit commitment. Electric Power Systems Res. 213:108282.CrossrefGoogle Scholar
  • Kouvaros P, Kyono T, Leofante F, Lomuscio A, Margineantu D, Osipychev D, Zheng Y (2021) Formal analysis of neural network-based systems in the aircraft domain. Huisman M, Păsăreanu C, Zhan N, eds. Formal Methods. FM 2021, Lecture Notes in Computer Science, vol. 13047 (Springer, Cham, Switzerland), 730–740.Google Scholar
  • Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. NIPS’12: Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1097–1105.Google Scholar
  • Kronqvist J, Misener R, Tsay C (2021) Between steps: Intermediate relaxations between big-M and convex hull formulations. Stuckey PJ, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2021, Lecture Notes in Computer Science, vol. 12735 (Springer, Cham, Switzerland), 200–218.Google Scholar
  • Kronqvist J, Misener R, Tsay C (2025) P-split formulations: A class of intermediate formulations between big-M and convex hull for disjunctive constraints. Math. Programming, ePub ahead of print June 9, https://doi.org/10.1007/s10107-025-02232-1.CrossrefGoogle Scholar
  • Kumar A, Serra T, Ramalingam S (2019) Equivalent and approximate transformations of deep neural networks. Preprint, submitted April 11, https://arxiv.org/abs/1905.1142.Google Scholar
  • Lacoste-Julien S, Jaggi M, Schmidt M, Pletscher P (2013) Block-coordinate Frank-Wolfe optimization for structural SVMs. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (PMLR), 53–61.Google Scholar
  • Lan J, Zheng Y, Lomuscio A (2022) Tight neural network verification via semidefinite relaxations and linear reformulations. Proc. AAAI Conf. Artificial Intelligence 36(7):7272–7280.CrossrefGoogle Scholar
  • Latorre F, Rolland P, Cevher V (2020) Lipschitz constant estimation of neural networks via sparse polynomial optimization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.CrossrefGoogle Scholar
  • LeCun Y, Bottou L, Orr GB, Müller KR (1998) Efficient BackProp. Montavon G, Orr G, Müller K, eds. Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 1524 (Springer, Berlin, Heidelberg), 9–50.CrossrefGoogle Scholar
  • LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4):541–551.CrossrefGoogle Scholar
  • Lee J, Wilson D (2001) Polyhedral methods for piecewise-linear functions I: The lambda method. Discrete Appl. Math. 108(3):269–285.CrossrefGoogle Scholar
  • Lee GH, Alvarez-Melis D, Jaakkola TS (2019) Towards robust, locally linear deep networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Leino K, Wang Z, Fredrikson M (2021) Globally-robust neural networks. Meila M, Zhang T, eds. Internat. Conf. Machine Learn. (ICML), vol. 139 (PMLR, New York), 6212–6222.Google Scholar
  • Leofante F, Narodytska N, Pulina L, Tacchella A (2018) Automated verification of neural networks: Advances, challenges and perspectives. Preprint, submitted May 25, https://arxiv.org/abs/1805.09938.Google Scholar
  • Li L, Xie T, Li B (2022) SoK: Certified robustness for deep neural networks. 2023 IEEE Sympos. Security and Privacy (SP) (IEEE Computer Society, Washington, DC), 1289–1310.Google Scholar
  • Liang X, Xu J (2021) Biased ReLU neural networks. Neurocomput. 423:71–79.CrossrefGoogle Scholar
  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
  • Linnainmaa S (1970) The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, Helsinki, Finland. [In Finnish.]Google Scholar
  • Little W (1974) The existence of persistent states in the brain. Math. Biosci. 19(1–2):101–120.CrossrefGoogle Scholar
  • Liu X, Dvorkin V (2025) Optimization over trained neural networks: Difference-of-convex algorithm and application to data center scheduling. IEEE Control Systems Lett. 9:835–840.CrossrefGoogle Scholar
  • Liu B, Liang Y (2021) Optimal function approximation with ReLU neural networks. Neurocomput. 435:216–227.CrossrefGoogle Scholar
  • Liu X, Han X, Zhang N, Liu Q (2020) Certified monotonic neural networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 15427–15438.Google Scholar
  • Liu C, Arnon T, Lazarus C, Strong C, Barrett C, Kochenderfer MJ (2021) Algorithms for verifying deep neural networks. Foundations Trends Optim. 4(3–4):244–404.CrossrefGoogle Scholar
  • Liu Z, Wang Y, Vaidya S, Ruehle F, Halverson J, Soljačić M, Hou TY, Tegmark M (2025) KAN: Kolmogorov-Arnold Networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Lombardi M, Milano M, Bartolini A (2017) Empirical decision model learning. Artificial Intelligence 244:343–367.CrossrefGoogle Scholar
  • Lomuscio A, Maganti L (2017) An approach to reachability analysis for feed-forward ReLU neural networks. Preprint, submitted June 22, https://arxiv.org/abs/1706.07351.Google Scholar
  • Loukas A, Poiitis M, Jegelka S (2021) What training reveals about neural network complexity. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 494–508.Google Scholar
  • Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: A view from the width. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. NIPS’17: Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6232–6240.Google Scholar
  • Lueg L, Grimstad B, Mitsos A, Schweidtmann AM (2021) reluMIP: Open source tool for MILP optimization of ReLU neural networks. Accessed February 25, 2026, https://zenodo.org/records/5601907.Google Scholar
  • Lyu Z, Ko CY, Kong Z, Wong N, Lin D, Daniel L (2020) Fastened crown: Tightened neural network robustness certificates. Proc. AAAI Conf. Artificial Intelligence 34(4):5037–5044.CrossrefGoogle Scholar
  • Maas A, Hannun A, Ng A (2013) Rectifier nonlinearities improve neural network acoustic models. ICML Workshop Deep Learn. Audio, Speech Language Processing.Google Scholar
  • Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Makhoul J, Schwartz R, El-Jaroudi A (1989) Classification capabilities of two-layer neural nets. Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 635–638.Google Scholar
  • Malach E, Shalev-Shwartz S (2019) Is deeper better only when shallow is good? Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6429–6438.Google Scholar
  • Mangasarian OL (1993) Mathematical programming in neural networks. ORSA J. Comput. 5(4):349–360.LinkGoogle Scholar
  • Maragos P, Charisopoulos V, Theodosis E (2021) Tropical geometry and machine learning. Proc. IEEE 109(5):728–755.CrossrefGoogle Scholar
  • Maragno D, Wiberg H, Bertsimas D, Birbil den Hertog D, Fajemisin AO (2023) Mixed-integer optimization with constraint learning. Oper. Res. 73(2):1011–1028.LinkGoogle Scholar
  • Maragno D, Kurtz J, Röber TE, Goedhart R, Birbil ŞI, den Hertog D (2024) Finding regions of counterfactual explanations via robust optimization. INFORMS J. Comput. 36(5):1316–1334.LinkGoogle Scholar
  • Masden M (2025) Algorithmic determination of the combinatorial structure of the linear regions of ReLU neural networks. SIAM J. Appl. Algebra Geometry 9(2):374–404.Google Scholar
  • Matoba K, Dimitriadis N, Fleuret F (2022) The theoretical expressiveness of maxpooling. Preprint, submitted March 2, https://arxiv.org/abs/2203.01016.Google Scholar
  • Matoušek J (2002) Lectures on Discrete Geometry, Graduate Texts in Mathematics, vol. 212 (Springer, New York).CrossrefGoogle Scholar
  • McBride K, Sundmacher K (2019) Overview of surrogate modeling in chemical process engineering. Chemie Ingenieur Technik 91(3):228–239.CrossrefGoogle Scholar
  • McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5:115–133.CrossrefGoogle Scholar
  • McDonald T, Tsay C, Schweidtmann AM, Yorke-Smith N (2024) Mixed-integer optimisation of graph neural networks for computer-aided molecular design. Comput. Chemical Engrg. 185:108660.CrossrefGoogle Scholar
  • Mhaskar HN, Poggio T (2020) Function approximation by deep networks. Comm. Pure Appl. Anal. 19(8):4085–4095.CrossrefGoogle Scholar
  • Minsky M, Papert S (1969) Perceptrons: An Introduction to Computational Geometry (MIT Press, Cambridge, MA).Google Scholar
  • Mirman M, Gehr T, Vechev M (2018) Differentiable abstract interpretation for provably robust neural networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 3578–3586.Google Scholar
  • Misener R, Floudas CA (2012) Global optimization of mixed-integer quadratically-constrained quadratic programs (MIQCQP) through piecewise-linear and edge-concave relaxations. Math. Programming 136(1):155–182.CrossrefGoogle Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518:529–533.CrossrefGoogle Scholar
  • Montúfar G (2017) Notes on the number of linear regions of deep neural networks. Internat. Conf. Sampling Theory Appl. (SampTA) (IEEE, Piscataway, NJ).Google Scholar
  • Montúfar G, Ren Y, Zhang L (2022) Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums. SIAM J. Appl. Algebr. Geom. 6(4).Google Scholar
  • Montúfar G, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 2924–2932.Google Scholar
  • Motzkin T (1936) Beitrage zur theorie der linearen Ungleichungen. PhD thesis, University of Basel, Basel, Switzerland.Google Scholar
  • Mukhopadhyay S, Roy A, Kim LS, Govil S (1993) A polynomial time algorithm for generating neural networks for pattern classification: Its stability properties and some test results. Neural Comput. 5(2):317–330.CrossrefGoogle Scholar
  • Nair V, Hinton G (2010) Rectified linear units improve restricted Boltzmann machines. Fürnkranz J, Joachims T, eds. ICML’10: Proc. 27th Internat. Conf. Machine Learn. (ICML) (Omnipress, Madison, WI), 807–814.Google Scholar
  • Narodytska N, Kasiviswanathan S, Ryzhyk L, Sagiv M, Walsh T (2018) Verifying properties of binarized deep neural networks. AAAI Conf. Artificial Intelligence 32(1).Google Scholar
  • Nelles O, Fink A, Isermann R (2000) Local linear model trees (LOLIMOT) toolbox for nonlinear system identification. IFAC Proc. Vol. 33(15):845–850.CrossrefGoogle Scholar
  • Nesterov YE (1983) A method of solving a convex programming problem with convergence rate o(1k2). Proc. USSR Acad. Sci. 269:543–547.Google Scholar
  • Newton M, Papachristodoulou A (2021) Exploiting sparsity for neural network verification. Jadbabaie A, Lygeros J, Pappas GJ, Parrilo PA, Recht B, Tomlin CJ, Zeilinger MN, eds. Proc. 3rd Conf. Learn. Dynamics Control (L4DC), vol. 144 (PMLR, New York), 715–727.Google Scholar
  • Nguyen T, Huchette J (2022) Neural network verification as piecewise linear optimization: Formulations for the composition of staircase functions. Preprint, submitted November 17, https://arxiv.org/abs/2211.14706.Google Scholar
  • Nguyen Q, Mukkamala MC, Hein M (2018) Neural networks should be wide enough to learn disconnected decision regions. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 3740–3749.Google Scholar
  • Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: An empirical study. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • OpenAI (2022) Introducing ChatGPT. Accessed February 25, 2026, https://openai.com/blog/chatgpt.Google Scholar
  • Padberg M (2000) Approximating separable nonlinear functions via mixed zero-one programs. Oper. Res. Lett. 27(1):1–5.CrossrefGoogle Scholar
  • Papalexopoulos TP, Tjandraatmadja C, Anderson R, Vielma JP, Belanger D (2022) Constrained discrete black-box optimization using mixed-integer programming. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (ICML), vol. 162 (PMLR, New York), 17295–17322.Google Scholar
  • Park Y, Lee S, Kim G, Blei DM (2021a) Unsupervised representation learning via neural activation coding. Meila M, Zhang T, eds. Internat. Conf. Machine Learn. (ICML), vol. 139 (PMLR, New York), 8391–8400.Google Scholar
  • Park S, Yun C, Lee J, Shin J (2021b) Minimum width for universal approximation. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk Ed, Le QV (2019) SpecAugment: A simple data augmentation method for automatic speech recognition. Proc. Interspeech 2019 (International Speech Communication Association), 2613–2617.Google Scholar
  • Pascanu R, Montúfar G, Bengio Y (2014) On the number of response regions of deep feedforward networks with piecewise linear activations. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Patel RM, Dumouchelle J, Khalil E, Bodur M (2022) Neur2SP: Neural two-stage stochastic programming. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 23992–24005.Google Scholar
  • Perakis G, Tsiourvas A (2022) Optimizing objective functions from trained ReLU neural networks via sampling. Preprint, submitted May 27, https://arxiv.org/abs/2205.14189.Google Scholar
  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. Proc. 2018 Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL) (Association for Computational Linguistics, Stroudsburg, PA), 2227–2237.Google Scholar
  • Pham H, Ren A, Tahir I, Tong J, Serra T (2025) Optimization over trained (and sparse) neural networks: A surrogate within a surrogate. Preprint, submitted May 4, https://arxiv.org/abs/2505.01985.Google Scholar
  • Phuong M, Lampert CH (2020) Functional vs. parametric equivalence of ReLU networks. Internat. Conf. Learn. Representations (ICLR) (PMLR, New York).Google Scholar
  • Pilanci M, Ergen T (2020) Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. Daumé H, Singh A, eds. ICML’20: Proc. 37th Internat. Conf. Machine Learn. (ICML) (JMLR), 7695–7705.Google Scholar
  • Plate C, Hahn M, Klimek A, Ganzer C, Sundmacher K, Sager S (2025) An analysis of optimization problems involving ReLU neural networks. Preprint, submitted February 5, https://arxiv.org/abs/2502.03016.Google Scholar
  • Pokutta S, Spiegel C, Zimmer M (2020) Deep neural network training with Frank-Wolfe. Preprint, submitted October 14, https://arxiv.org/abs/2010.07243.Google Scholar
  • Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.CrossrefGoogle Scholar
  • Pulina L, Tacchella A (2010) An abstraction-refinement approach to verification of artificial neural networks. Touili T, Cook B, Jackson P, eds. Comput. Aided Verification. CAV 2010, Lecture Notes in Computer Science, vol. 6174 (Springer, Berlin, Heidelberg), 243–257.Google Scholar
  • Puthawala M, Kothari K, Lassas M, Dokmanić I, de Hoop M (2022) Globally injective ReLU networks. J. Machine Learn. Res. 23(1):4544–4598.Google Scholar
  • Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. Technical report, OpenAI, https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.Google Scholar
  • Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein J (2017) On the expressive power of deep neural networks. Precup D, Teh YW, eds. ICML’17: Proc. 34th Internat. Conf. Machine Learn. (ICML), vol. 70 (JMLR), 2847–2854.Google Scholar
  • Raghunathan A, Steinhardt J, Liang PS (2018) Semidefinite relaxations for certifying robustness to adversarial examples. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10900–10910.Google Scholar
  • Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. ICLR Workshop Track.Google Scholar
  • Raman R, Grossmann I (1994) Modelling and computational techniques for logic based integer programming. Comput. Chemical Engrg. 18(7):563–578.CrossrefGoogle Scholar
  • Ramesh A, Dhariwal P, Nichol A, Chu C, Chen M (2022) Hierarchical text-conditional image generation with CLIP latents. Preprint, submitted April 13, https://arxiv.org/abs/2204.06125.Google Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • Robinson H, Rasheed A, San O (2019) Dissecting deep neural networks. Preprint, submitted October 9, https://arxiv.org/abs/1910.03879.Google Scholar
  • Rolnick D, Kording K (2020) Reverse-engineering deep ReLU networks. Daumé H, Singh A, eds. ICML’20: Proc. 37th Internat. Conf. Machine Learn. (ICML), vol. 119 (JMLR), 8178–8187.Google Scholar
  • Rosenblatt F (1957) The perceptron—A perceiving and recognizing automaton. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Buffalo, NY.Google Scholar
  • Rössig A, Petkovic M (2021) Advances in verification of ReLU neural networks. J. Global Optim. 81:109–152.CrossrefGoogle Scholar
  • Roth K (2021) A primer on multi-neuron relaxation-based adversarial robustness certification. ICML 2021 Workshop Adversarial Machine Learn.Google Scholar
  • Roy A, Kim LS, Mukhopadhyay S (1993) A polynomial time algorithm for the construction and training of a class of multilayer perceptrons. Neural Networks 6(4):535–545.CrossrefGoogle Scholar
  • Rubies-Royo V, Calandra R, Stipanovic DM, Tomlin C (2019) Fast neural network verification via shadow prices. Preprint, submitted June 21, https://arxiv.org/abs/1902.07247.Google Scholar
  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536.CrossrefGoogle Scholar
  • Ryu M, Chow Y, Anderson R, Tjandraatmadja C, Boutilier C (2020) CaQL: Continuous action q-learning. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Safran I, Reichman D, Valiant P (2024) How many neurons does it take to approximate the maximum? Proc. 2024 Annual Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 3156–3183.Google Scholar
  • Sahiner A, Ergen T, Pauly JM, Pilanci M (2021) Vector-output ReLU neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Sahiner A, Ergen T, Ozturkler B, Pauly JM, Mardani M, Pilanci M (2024) Scaling convex neural networks with Burer-Monteiro factorization. 12th Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Salman H, Yang G, Zhang H, Hsieh CJ, Zhang P (2019) A convex relaxation barrier to tight robustness verification of neural networks. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9835–9846.Google Scholar
  • Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 4510–4520.Google Scholar
  • Sattelberg B, Cavalieri R, Kirby M, Peterson C, Beveridge R (2023) Locally linear attributes of ReLU neural networks. Frontiers Artificial Intelligence 6:1255192.CrossrefGoogle Scholar
  • Say B, Wu G, Zhou YQ, Sanner S (2017) Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 750–756.Google Scholar
  • Scaman K, Virmaux A (2018) Lipschitz regularity of deep neural networks: Analysis and efficient estimation. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3839–3848.Google Scholar
  • Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117.CrossrefGoogle Scholar
  • Schumann J, Gupta P, Nelson S (2003) On verification & validation of neural network based controllers. Engrg. Appl. Neural Networks (EANN).Google Scholar
  • Schwan R, Jones CN, Kuhn D (2023) Stability verification of neural network controllers using mixed-integer programming. IEEE Trans. Automatic Control 68(12):7514–7529.CrossrefGoogle Scholar
  • Schweidtmann AM, Mitsos A (2019) Deterministic global optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180(3):925–948.CrossrefGoogle Scholar
  • Schweidtmann AM, Weber JM, Wende C, Netze L, Mitsos A (2022) Obey validity limits of data-driven models through topological data analysis and one-class classification. Optim. Engrg. 23(2):855–876.CrossrefGoogle Scholar
  • Seck I, Loosli G, Canu S (2021) Linear program powered attack. 2021 Internat. Joint Conf. Neural Networks (IJCNN) (IEEE, Piscataway, NJ), 1–8.Google Scholar
  • Serra T (2020) Enumerative branching with less repetition. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 399–416.Google Scholar
  • Serra T, Hooker J (2020) Compact representation of near-optimal integer programming solutions. Math. Programming 182:199–232.CrossrefGoogle Scholar
  • Serra T, Ramalingam S (2020) Empirical bounds on linear regions of deep rectifier networks. Proc. AAAI Conf. Artificial Intelligence 34(4):5628–5635.CrossrefGoogle Scholar
  • Serra T, Kumar A, Ramalingam S (2020) Lossless compression of deep neural networks. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 417–430.Google Scholar
  • Serra T, Tjandraatmadja C, Ramalingam S (2018) Bounding and counting linear regions of deep neural networks. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 4558–4566.Google Scholar
  • Serra T, Yu X, Kumar A, Ramalingam S (2021) Scaling up exact neural network compression by ReLU stability. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 27081–27093.Google Scholar
  • Shi C, Emadikhiav M, Lozano L, Bergman D (2022) Careful! Training relevance is real. Preprint, submitted January 12, https://arxiv.org/abs/2201.04429.Google Scholar
  • Shi Z, Jin Q, Kolter Z, Jana S, Hsieh CJ, Zhang H (2025) Neural network verification with branch-and-bound for general nonlinearities. Gurfinkel A, Heule M, eds. Tools Algorithms Construction Anal. Systems. TACAS 2025, Lecture Notes in Computer Science, vol. 15696 (Springer, Cham, Switzerland), 315–335.Google Scholar
  • Sidrane C, Maleki A, Irfan A, Kochenderfer MJ (2022) OVERT: An algorithm for safety verification of neural network control policies for nonlinear systems. J. Machine Learn. Res. 23(117):1–45.Google Scholar
  • Sildir H, Aydin E (2022) A mixed-integer linear programming based training and feature selection method for artificial neural networks using piece-wise linear approximations. Chem. Engrg. Sci. 249:117273.CrossrefGoogle Scholar
  • Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, et al. (2017) Mastering the game of Go without human knowledge. Nature 550:354–359.CrossrefGoogle Scholar
  • Singh G, Ganvir R, Püschel M, Vechev M (2019a) Beyond the single neuron convex barrier for neural network certification. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 15098–15109.Google Scholar
  • Singh G, Gehr T, Püschel M, Vechev M (2019b) An abstract domain for certifying neural networks. Proc. ACM Programming Languages (POPL) 3:1–30.Google Scholar
  • Singh H, Kumar MP, Torr P, Dvijotham KD (2021) Overcoming the convex barrier for simplex inputs. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4871–4882.Google Scholar
  • Singh G, Gehr T, Mirman M, Püschel M, Vechev M (2018) Fast and effective robustness certification. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10825–10836.Google Scholar
  • Smith JE, Winkler RL (2006) The optimizer’s curse: Skepticism and postdecision surprise in decision analysis. Management Sci. 52(3):311–322.LinkGoogle Scholar
  • Sosnin P, Tsay C (2024) Scaling mixed-integer programming for certification of neural network controllers using bounds tightening. 2024 IEEE 63rd Conf. Decision Control (CDC) (IEEE, Piscataway, NJ), 1645–1650.Google Scholar
  • Sosnin P, Müller MN, Baader M, Tsay C, Wicker M (2025) Certified robustness to data poisoning in gradient-based training. Trans. Machine Learn. Res. (TMLR).Google Scholar
  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learn. Res. 15(56):1929–1958.Google Scholar
  • Stargalla M, Hertrich C, Reichman D (2025) The computational complexity of counting linear regions in ReLU neural networks. Preprint, submitted May 22, https://arxiv.org/abs/2505.16716.Google Scholar
  • Strong CA, Katz SM, Corso AL, Kochenderfer MJ (2022) ZoPE: A fast optimizer for ReLU networks with low-dimensional inputs. Deshmukh JV, Havelund K, Perez I, eds. NASA Formal Methods. NFM 2022, Lecture Notes in Computer Science, vol. 13260 (Springer, Cham, Switzerland), 299–317.Google Scholar
  • Strong CA, Wu H, Zeljić A, Julian KD, Katz G, Barrett C, Kochenderfer MJ (2021) Global optimization of objective functions represented by ReLU networks. Machine Learn. 112:3685–3712.CrossrefGoogle Scholar
  • Sudjianto A, Knauth W, Singh R, Yang Z, Zhang A (2020) Unwrapping the black box of deep ReLU networks: Interpretability, diagnostics, and simplification. Preprint, submitted November 8, https://arxiv.org/abs/2011.04041.Google Scholar
  • Sutskever I, Vinyals O, Le Q (2014) Sequence to sequence learning with neural networks. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 3104–3112.Google Scholar
  • Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (JMLR), 1139–1147.Google Scholar
  • Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1–9.Google Scholar
  • Takai Y, Sannai A, Cordonnier M (2021) On the number of linear functions composing deep neural network: Towards a refined definition of neural networks complexity. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., vol. 130 (PMLR, New York), 3799–3809.Google Scholar
  • Tao Q, Li L, Huang X, Xi X, Wang S, Suykens JA (2022) Piecewise linear neural networks and deep learning. Nature Rev. Methods Primers 2:42.CrossrefGoogle Scholar
  • Telgarsky M (2015) Representation benefits of deep feedforward networks. Preprint, submitted September 27, https://arxiv.org/abs/1509.08101.Google Scholar
  • Thorbjarnarson T, Yorke-Smith N (2021) On training neural networks with mixed integer programming. IJCAI-PRICAI’20 Workshop Data Sci. Meets Optim., Yokohama, Japan.Google Scholar
  • Thorbjarnarson T, Yorke-Smith N (2023) Optimal training of integer-valued neural networks with mixed integer programming. PLoS One 18(2):e0261029.CrossrefGoogle Scholar
  • Tiwari S, Konidaris G (2022) Effects of data geometry in early deep learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 30099–30113.Google Scholar
  • Tjandraatmadja C, Anderson R, Huchette J, Ma W, Patel KK, Vielma JP (2020) The convex relaxation barrier, revisited: Tightened single-neuron relaxations for neural network verification. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 21675–21686.Google Scholar
  • Tjeng V, Xiao K, Tedrake R (2019) Evaluating robustness of neural networks with mixed integer programming. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Tong J, Cai J, Serra T (2024) Optimization over trained neural networks: Taking a relaxing walk. Dilkina B, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2024, Lecture Notes in Computer Science, vol. 14743 (Springer, Cham, Switzerland), 221–233.Google Scholar
  • Trimmel M, Petzka H, Sminchisescu C (2021) TropEx: An algorithm for extracting linear terms in deep neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Tsay C, Baldea M (2019) 110th anniversary: Using data to bridge the time and length scales of process systems. Indust. Engrg. Chemistry Res. 58(36):16696–16708.CrossrefGoogle Scholar
  • Tsay C, Kronqvist J, Thebelt A, Misener R (2021) Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3068–3080.Google Scholar
  • Tseran H, Montúfar G (2021) On the expected complexity of maxout networks. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 28995–29008.Google Scholar
  • Turner M, Chmiela A, Koch T, Winkler M (2025) PySCIPOpt-ML: Embedding trained machine learning models into mixed-integer programs. Tack G, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2025, Lecture Notes in Computer Science, vol. 15763 (Springer, Cham, Switzerland), 218–234.Google Scholar
  • Unser M (2019) A representer theorem for deep neural networks. J. Machine Learn. Res. 20(110):1–30.Google Scholar
  • Valerdi JL (2024) On minimal depth in neural networks. Preprint, submitted February 23, https://arxiv.org/abs/2402.15315.Google Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. NIPS’17: Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6000–6010.Google Scholar
  • Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.CrossrefGoogle Scholar
  • Vielma JP (2019) Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Programming 177(1–2):21–53.CrossrefGoogle Scholar
  • Vielma JP, Ahmed S, Nemhauser G (2010) Mixed-integer models for nonseparable piecewise-linear optimization: Unifying framework and extensions. Oper. Res. 58(2):303–315.LinkGoogle Scholar
  • Villani MJ, Schoots N (2023) Any deep ReLU network is shallow. Preprint, submitted June 20, https://arxiv.org/abs/2306.11827.Google Scholar
  • Vincent JA, Schwager M (2021) Reachable polyhedral marching (RPM): A safety verification algorithm for robotic systems with deep neural network components. IEEE Internat. Conf. Robotics Automation (ICRA) (IEEE, Piscataway, NJ), 9029–9035.Google Scholar
  • Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, et al. (2017) StarCraft II: A new challenge for reinforcement learning. Preprint, submitted August 16, https://arxiv.org/abs/1708.04782.Google Scholar
  • Volpp M, Fröhlich LP, Fischer K, Doerr A, Falkner S, Hutter F, Daniel C (2020) Meta-learning acquisition functions for transfer learning in Bayesian optimization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Wang Y (2022) Estimation and comparison of linear regions for ReLU networks. De Raedt L, ed. Proc. 31th Internat. Joint Conf. Artificial Intelligence (IJCAI), 3544–3550.Google Scholar
  • Wang S, Sun X (2005) Generalization of hinging hyperplanes. IEEE Trans. Inform. Theory 51(12):4425–4431.CrossrefGoogle Scholar
  • Wang K, Lozano L, Bergman D, Cardonha C (2021) A two-stage exact algorithm for optimization of neural network ensemble. Stuckey PJ, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2021, Lecture Notes in Computer Science, vol. 12735 (Springer, Cham, Switzerland), 106–114.Google Scholar
  • Wang K, Lozano L, Cardonha C, Bergman D (2023) Optimizing over an ensemble of trained neural networks. INFORMS J. Comput. 35(3):652–674.LinkGoogle Scholar
  • Wang S, Pei K, Whitehouse J, Yang J, Jana S (2018a) Efficient formal safety analysis of neural networks. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6369–6379.Google Scholar
  • Wang S, Pei K, Whitehouse J, Yang J, Jana S (2018b) Formal security analysis of neural networks using symbolic intervals. SEC’18: Proc. 27th USENIX Conf. Security Sympos. (USENIX Association, San Francisco), 1599–1614.Google Scholar
  • Weng J, Ahuja N, Huang T (1992) Cresceptron: A self-organizing neural network which grows adaptively. 2021 Internat. Joint Conf. Neural Networks (IJCNN), vol. 1 (IEEE, Piscataway, NJ), 576–581.Google Scholar
  • Weng L, Zhang H, Chen H, Song Z, Hsieh CJ, Daniel L, Boning D, Dhillon I (2018) Towards fast computation of certified robustness for ReLU networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5276–5285.Google Scholar
  • Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University, Cambridge, MA.Google Scholar
  • Wicker MR, Heo J, Costabello L, Weller A (2023) Robust explanation constraints for neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Wicker M, Laurenti L, Patane A, Kwiatkowska M (2020) Probabilistic safety for Bayesian neural networks. Peters J, Sontag D, eds. Proc. 36th Conf. Uncertainty Artificial Intelligence (UAI), vol. 124 (PMLR, New York), 1198–1207.Google Scholar
  • Wicker MR, Sosnin P, Shilov I, Janik A, Mueller MN, de Montjoye YA, Weller A, Tsay C (2025) Certification for differentially private prediction in gradient-based training. Singh A, Fazel M, Hsu D, Lacoste-Julien S, Berkenkamp F, Maharaj T, Wagstaff K, Zhu J, eds. Proc. 42nd Internat. Conf. Machine Learn. (ICML), vol. 267 (PMLR, New York), 66726–66745.Google Scholar
  • Wilhelm ME, Wang C, Stuber MD (2022) Convex and concave envelopes of artificial neural network activation functions for deterministic global optimization. J. Global Optim. 85:569–594.CrossrefGoogle Scholar
  • Witte C, Lüthje JT, Schulte V, Mitsos A, Bongartz D (2025) Deterministic global optimization with trained neural networks: Is the envelope of single neurons worth it? Preprint, submitted April 28, https://optimization-online.org/2025/04/deterministic-global-optimization-with-trained-neural-networks-is-the-envelope-of-single-neurons-worth-it/.Google Scholar
  • Wong E, Kolter Z (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5286–5295.Google Scholar
  • Wong E, Schmidt F, Metzen JH, Kolter JZ (2018) Scaling provable adversarial defenses. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8410–8419.Google Scholar
  • Wright SJ (2018) Optimization algorithms for data analysis. Mahoney MW, Duchi JC, Gilbert AC, eds. The Mathematics of Data, IAS/Park City Mathematics Series, vol. 25 (American Mathematical Society, Providence, RI), 49–98.CrossrefGoogle Scholar
  • Wu G, Say B, Sanner S (2020) Scalable planning with deep neural network learned transition models. J. Artificial Intelligence Res. 68:571–606.CrossrefGoogle Scholar
  • Wu H, Zeljić A, Katz G, Barrett C (2022) Efficient neural network analysis with sum-of-infeasibilities. Fisman D, Rosu G, eds. Tools Algorithms Construction Anal. Systems. TACAS 2025, Lecture Notes in Computer Science, vol. 13243 (Springer, Cham, Switzerland), 143–163.Google Scholar
  • Xiang W, Tran HD, Johnson TT (2017) Reachable set computation and safety verification for neural networks with ReLU activations. Preprint, submitted December 21, https://arxiv.org/abs/1712.08163.Google Scholar
  • Xiao K, Tjeng V, Shafiullah N, Madry A (2019) Training for faster adversarial robustness verification via inducing ReLU stability. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Xie Y, Chen G, Li Q (2020a) A general computational framework to measure the expressiveness of complex networks using a tighter upper bound of linear regions. Preprint, submitted December 8, https://arxiv.org/abs/2012.04428.Google Scholar
  • Xie Q, Luong MT, Hovy E, Le QV (2020b) Self-training with noisy student improves ImageNet classification. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 10684–10695.Google Scholar
  • Xie J, Shen Z, Zhang C, Wang B, Qian H (2020c) Efficient projection-free online methods with stochastic recursive gradient. Proc. AAAI Conf. Artificial Intelligence 34(4):6446–6453.CrossrefGoogle Scholar
  • Xiong H, Huang L, Yu M, Liu L, Zhu F, Shao L (2020) On the number of linear regions of convolutional neural networks. Daumé III H, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn. (ICML), vol. 119 (PMLR, New York), 10514–10523.Google Scholar
  • Xu S, Vaughan J, Chen J, Zhang A, Sudjianto A (2022) Traversing the local polytopes of ReLU neural networks. AAAI Workshop AdvML (AAAI Press, Washington, DC).Google Scholar
  • Yang D, Balaprakash P, Leyffer S (2022) Modeling design and control problems involving neural network surrogates. Comput. Optim. Appl. 83:759–800.CrossrefGoogle Scholar
  • Yang X, Tran HD, Xiang W, Johnson T (2020) Reachability analysis for feed-forward neural networks using face lattices. Preprint, submitted March 2, https://arxiv.org/abs/2003.01226.Google Scholar
  • Yang X, Yamaguchi T, Tran HD, Hoxha B, Johnson TT, Prokhorov D (2021) Reachability analysis of convolutional neural networks. Preprint, submitted June 22, https://arxiv.org/abs/2106.12074.Google Scholar
  • Yarotsky D (2017) Error bounds for approximations with deep ReLU networks. Neural Networks 94:103–114.CrossrefGoogle Scholar
  • Zakrzewski RR (2001) Verification of a trained neural network accuracy. Internat. Joint Conf. Neural Networks (IJCNN), vol. 3 (IEEE, Piscataway, NJ), 1657–1662.Google Scholar
  • Zanotti L (2025) Linear-size neural network representation of piecewise affine functions in R2. Preprint, submitted March 17, https://arxiv.org/abs/2503.13001.Google Scholar
  • Zaslavsky T (1975) Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes (American Mathematical Society, Providence, RI).Google Scholar
  • Zhang R (2020) On the tightness of semidefinite relaxations for certifying robustness to adversarial examples. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3808–3820.Google Scholar
  • Zhang X, Wu D (2020) Empirical studies on the properties of linear regions in deep neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Zhang L, Naitzat G, Lim LH (2018a) Tropical geometry of deep neural networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5824–5832.Google Scholar
  • Zhang A, Lipton ZC, Li M, Smola AJ (2023a) Dive into Deep Learning (Cambridge University Press, Cambridge, UK).Google Scholar
  • Zhang H, Weng TW, Chen PY, Hsieh CJ, Daniel L (2018b) Efficient neural network robustness certification with general activation functions. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4944–4953.Google Scholar
  • Zhang S, Campos J, Feldmann C, Walz D, Sandfort F, Mathea M, Tsay C, Misener R (2023b) Optimizing over trained GNNs via symmetry breaking. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 44898–44924.Google Scholar
  • Zhang H, Chen H, Xiao C, Gowal S, Stanforth R, Li B, Boning D, Hsieh CJ (2020) Towards stable and efficient training of verifiably robust neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
  • Zhang H, Wang S, Xu K, Li L, Li B, Jana S, Hsieh CJ, Kolter JZ (2022) General cutting planes for bound-propagation-based neural network verification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1656–1670.Google Scholar
  • Zhao S, Tsay C, Kronqvist J (2023) Model-based feature selection for neural networks: A mixed-integer programming approach. Sellmann M, Tierney K, eds. Learn. Intelligent Optim. LION 2023, Lecture Notes in Computer Science, vol. 14286 (Springer, Cham, Switzerland), 223–238.Google Scholar
  • Zhou S, Schoellig AP (2019) An analysis of the expressiveness of deep neural network architectures based on their Lipschitz constants. Preprint, submitted January 18, https://arxiv.org/abs/1912.11511.Google Scholar
  • Zhou D, Brix C, Hanasusanto GA, Zhang H (2024) Scalable neural network verification with branch-and-bound inferred cutting planes. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. NIPS’24: Proc. 38th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 29324–29353.Google Scholar
  • Zhu R, Lin B, Tang H (2020) Bounding the number of linear regions in local area for neural networks with ReLU activations. Preprint, submitted July 14, https://arxiv.org/abs/2007.06803.Google Scholar
  • Zou D, Balan R, Singh M (2019) On Lipschitz bounds of general convolutional neural networks. IEEE Trans. Inform. Theory 66(3):1738–1759.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.