When Deep Learning Meets Polyhedral Theory: A Survey
References
- (2021) Training neural networks is ∃ℝ-complete. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 18293–18306.Google Scholar
- (2015) Learning activation functions to improve deep neural networks. Preprint, submitted April 21, https://arxiv.org/abs/1412.6830.Google Scholar
- (2023) A neural network-based distributional constraint learning methodology for mixed-integer stochastic optimization. Expert Systems Appl. 232:120895.Crossref, Google Scholar
- (2025) A quantile neural network framework for two-stage stochastic optimization. Expert Systems Appl. 284:127876.Crossref, Google Scholar
- (2021) A simple geometric proof for the benefit of depth in ReLU networks. Preprint, submitted January 18, https://arxiv.org/abs/2101.07126.Google Scholar
- (2019) Strong mixed-integer programming formulations for trained neural networks. Lodi A, Nagarajan V, eds. Integer Programming Combin. Optim. IPCO 2019, Lecture Notes in Computer Science, vol. 11480 (Springer, Cham, Switzerland), 27–42.Google Scholar
- (2020) Strong mixed-integer programming formulations for trained neural networks. Math. Programming 183(1–2):3–39.Crossref, Google Scholar
- (2019) Sorting out Lipschitz function approximation. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (ICML), vol. 97 (PMLR, New York), 291–301.Google Scholar
- (2017) Wasserstein generative adversarial networks. Precup D, Teh YW, eds. Internat. Conf. Machine Learn. (ICML), vol. 70 (JMLR), 214–223.Google Scholar
- (2018) Understanding deep neural networks with rectified linear units. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2025) On the expressiveness of rational ReLU neural networks with bounded depth. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2020) Deep neural networks with trainable activations and controlled Lipschitz constant. IEEE Trans. Signal Processing 68:4688–4699.Crossref, Google Scholar
- (2015) Neural machine translation by jointly learning to align and translate. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2025a) On the depth of monotone ReLU neural networks and ICNNs. Preprint, submitted May 9, https://arxiv.org/abs/2505.06169.Google Scholar
- (2025b) Better neural network expressivity: Subdividing the simplex. Preprint, submitted May 20, https://arxiv.org/abs/2505.14338.Google Scholar
- (1998) Disjunctive programming: Properties of the convex hull of feasible points. Discrete Appl. Math. 89(1–3):3–44.Crossref, Google Scholar
- (2018) Disjunctive Programming (Springer, Cham, Switzerland).Crossref, Google Scholar
- (1993) A lift-and-project cutting plane algorithm for mixed 0–1 programs. Math. Programming 58(1–3):295–324.Crossref, Google Scholar
- (1996) Mixed 0-1 programming by lift-and-project in a branch-and-cut framework. Management Sci. 42(9):1229–1246.Link, Google Scholar
- (2018) A spline theory of deep networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York).Google Scholar
- (2020) Adversarial training and provable defenses: Bridging the gap. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2021) Efficient neural network verification via layer-based semidefinite relaxations and linear cuts. Zhou Z, ed. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 2184–2190.Google Scholar
- (2009) Learning deep architectures for AI. Foundations Trends Machine Learn. 2(1):1–127.Crossref, Google Scholar
- (2021) Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290(2):405–421.Crossref, Google Scholar
- (1992) Decision tree construction via linear programming. Technical report, University of Wisconsin-Madison Department of Computer Sciences, Madison.Google Scholar
- (1990) Neural network training via linear programming. Computer Sciences Technical Report 1067, University of Wisconsin–Madison Department of Computer Sciences, Madison.Google Scholar
- (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Software 1(1):23–34.Crossref, Google Scholar
- (2022) Individual fairness guarantees for neural networks. De Raedt L, ed. Proc. 31st Internat. Joint Conf. Artificial Intelligence (IJCAI), 651–658.Google Scholar
- (2022) JANOS: An integrated predictive and prescriptive modeling framework. INFORMS J. Comput. 34(2):807–816.Link, Google Scholar
- (2023) The bemi stardust: A structured ensemble of binarized neural networks. Sellmann M, Tierney K, eds. Learn. Intelligent Optim. LION 2023, Lecture Notes in Computer Science, vol. 14286 (Springer, Cham, Switzerland), 443–458.Google Scholar
- (2019) Dota 2 with large scale deep reinforcement learning. Preprint, submitted December 13, https://arxiv.org/abs/1912.06680.Google Scholar
- (2019) Deep Frank-Wolfe for neural network optimization. 7th Internat. Conf. Learn. Representations (ICLR 2019).Google Scholar
- (2024) Training fully connected neural networks is ∃R-complete. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 36222–36237.Google Scholar
- (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Comput. Chemical Engrg. 108:250–267.Crossref, Google Scholar
- (2014) On the complexity of neural network classifiers: A comparison between shallow and deep architectures. IEEE Trans. Neural Networks Learn. Systems 25(8):1553–1565.Crossref, Google Scholar
- (2021) Some theoretical insights into Wasserstein GANs. J. Machine Learn. Res. 22(1):5287–5331.Google Scholar
- (2018) LP formulations for polynomial optimization problems. SIAM J. Optim. 28(2):1121–1150.Crossref, Google Scholar
- (2023) Principled deep neural network training through linear programming. Discrete Optim. 49:100795.Crossref, Google Scholar
- (1992) Training a 3-node neural network is NP-complete. Neural Networks 5(1):117–127.Crossref, Google Scholar
- (2020) Learning activation functions in deep (spline) neural networks. IEEE Open J. Signal Processing 1:295–309.Crossref, Google Scholar
- (2015) On mathematical programming with indicator constraints. Math. Programming 151:191–223.Crossref, Google Scholar
- (2022) Complexity of training ReLU neural network. Discrete Optim. 44:100620.Crossref, Google Scholar
- (2020) Efficient verification of ReLU-based neural networks via dependency analysis. Proc. AAAI Conf. Artificial Intelligence 34(4):3291–3299.Crossref, Google Scholar
- (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
- (2025) Decomposition polyhedra of piecewise linear functions. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (1990) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Soulié FF, Hérault J, eds. Neurocomput., NATO ASI Series, vol. 68 (Springer, Berlin, Heidelberg), 227–236.Google Scholar
- (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3–4):231–357.Crossref, Google Scholar
- (2020c) An efficient nonconvex reformulation of stagewise convex optimization problems. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8247–8258.Google Scholar
- (2018) A unified view of piecewise linear neural network verification. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4795–4804.Google Scholar
- (2020b) Branch and bound for piecewise linear neural network verification. J. Machine Learn. Res. 21(1):1574–1612.Google Scholar
- (2020a) Lagrangian decomposition for neural network verification. Peters J, Sontag D, eds. Proc. 36th Conf. Uncertainty Artificial Intelligence (UAI), vol. 124 (PMLR, New York), 370–379.Google Scholar
- (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Programming 95(2):329–357.Crossref, Google Scholar
- (2024) Constrained continuous-action reinforcement learning for supply chain inventory management. Comput. Chemical Engrg. 181:108518.Crossref, Google Scholar
- (2023) Getting away with more network pruning: From sparsity to geometry and linear regions. Cire AA, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2023, Lecture Notes in Computer Science, vol. 13884 (Springer, Cham, Switzerland), 200–218.Google Scholar
- (2024) Tightening convex relaxations of trained neural networks: A unified approach for convex and s-shaped activations. Preprint, submitted October 30, https://arxiv.org/abs/2410.23362.Google Scholar
- (2022) OMLT: Optimization & machine learning toolkit. J. Machine Learn. Res. 23(1):15829–15836.Google Scholar
- (2018) A tropical approach to neural networks with piecewise linear activations. Preprint, submitted May 22, https://arxiv.org/abs/1805.08749.Google Scholar
- (2020) Continual learning in low-rank orthogonal subspaces. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9900–9911.Google Scholar
- (2022a) Improved bounds on neural complexity for representing piecewise linear functions. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 7167–7180.Google Scholar
- (2021) Neural architecture search on ImageNet in four GPU hours: A theoretically inspired perspective. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2022b) Learning deep ReLU networks is fixed-parameter tractable. 2021 IEEE 62nd Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 696–707.Google Scholar
- (2023a) Lower and upper bounds for numbers of linear regions of graph convolutional networks. Neural Networks 168:394–404.Crossref, Google Scholar
- (2020) Semialgebraic optimization for Lipschitz constants of ReLU networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 19189–19200.Google Scholar
- (2023b) Understanding and accelerating neural architecture search with training-free and theory-grounded metrics. IEEE Trans. Pattern Anal. Machine Intelligence 46(2):749–763.Crossref, Google Scholar
- (2017) Maximum resilience of artificial neural networks. D’Souza D, Narayan Kumar K, eds. Automated Tech. Verification Anal. ATVA 2017, Lecture Notes in Computer Science, vol. 10482 (Springer, Cham, Switzerland), 251–268.Google Scholar
- (2018) Verification of binarized neural networks via inter-neuron factoring: (short paper). Piskac R, Rümmer P, eds. Verified Software. Theories Tools Experiments. VSTTE 2018, Lecture Notes in Computer Science, vol. 11294 (Springer, Cham, Switzerland), 279–290.Google Scholar
- (2022) An outer-approximation guided optimization approach for constrained neural network inverse problems. Math. Programming 196(1–2):173–202.Crossref, Google Scholar
- (2018) Exact and consistent interpretation for piecewise linear neural networks: A closed form solution. KDD’18: Proc. ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1244–1253.Google Scholar
- (2012) Multi column deep neural network for traffic sign classification. Neural Networks 32:333–338.Crossref, Google Scholar
- (2017) Parseval networks: Improving robustness to adversarial examples. Precup D, Teh YW, eds. ICML’17: Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR), 854–863.Google Scholar
- (2022) Understanding the evolution of linear regions in deep reinforcement learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10891–10903.Google Scholar
- (2004) Large scale machine learning. PhD thesis, University of Paris, Paris.Google Scholar
- (2020) Lipschitz certificates for layered network structures driven by averaged activation operators. SIAM J. Math. Data Sci. 2(2):529–557.Crossref, Google Scholar
- (2015) BinaryConnect: Training deep neural networks with binary weights during propagations. Cortes C, Lee DD, Sugiyama M, Garnett R, eds. NIPS’15: Proc. 29th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 3123–3131.Google Scholar
- (2020a) Investigating the compositional structure of deep neural networks. Nicosia G, ed. Machine Learn. Optim. Data Sci. LOD 2020, Lecture Notes in Computer Science, vol. 12565 (Springer, Cham, Switzerland), 322–334.Google Scholar
- (2020b) Understanding deep learning with activation pattern diagrams. CEUR Workshop Proc. 2742:119–126Google Scholar
- (2018) A randomized gradient-free attack on ReLU networks. Brox T, Bruhn A, Fritz M, eds. Pattern Recognition. GCPR 2018, Lecture Notes in Computer Science, vol. 11269 (Springer, Cham, Switzerland), 215–227.Google Scholar
- (2019) Provable robustness of ReLU networks via maximization of linear regions. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 89 (PMLR, New York), 2057–2066.Google Scholar
- (2020) Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. Internat. J. Comput. Vision 128:1028–1046.Crossref, Google Scholar
- (2003) A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Management Sci. 49(9):1268–1273.Link, Google Scholar
- (2017) Optimization methods for supervised machine learning: From linear models to deep learning. INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 89–114.Link, Google Scholar
- (1989) Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2:303–314.Crossref, Google Scholar
- (2007) Generating multiple solutions for mixed integer programming problems. Fischetti M, Williamson DP, eds. Integer Programming Combin. Optim. IPCO 2019, Lecture Notes in Computer Science, vol. 4513 (Springer, Cham, Switzerland), 280–294.Google Scholar
- (1960) On the significance of solving linear programming problems with some integer variables. Econometrica 28(1):30–44.Crossref, Google Scholar
- (1973) Fourier-Motzkin elimination and its dual. J. Combin. Theory Series A 14(3):288–297.Crossref, Google Scholar
- (2020) Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 5318–5331.Google Scholar
- (2022) Nonlinear approximation and (deep) ReLU networks. Constructive Approximation 55:127–172.Crossref, Google Scholar
- (2021) Scaling the convex barrier with active sets. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2020) Reinforcement learning with combinatorial actions: An application to vehicle routing. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 609–620.Google Scholar
- (2020) An analysis of adversarial attacks and defenses on autonomous driving models. 2020 IEEE Internat. Conf. Pervasive Comput. Comm. (PerCom) (IEEE, Piscataway, NJ), 1–10.Google Scholar
- (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL) (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
- (2020) Approximation algorithms for training one-node ReLU neural networks. IEEE Trans. Signal Processing 68:6696–6706.Crossref, Google Scholar
- (2025) MathOptAI.jl: Embed trained machine learning predictors into JuMP models. Preprint, submitted July 3, https://arxiv.org/abs/2507.03159.Google Scholar
- (2022) Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomput. 503:92–108.Crossref, Google Scholar
- (2018) Output range analysis for deep feedforward networks. Dutle A, Muñoz C, Narkawicz A, eds. NASA Formal Methods. NFM 2018, Lecture Notes in Computer Science, vol. 10811 (Springer, Cham, Switzerland), 121–138.Google Scholar
- (2018a) A dual approach to scalable verification of deep networks. Globerson A, Silva R, eds. Conf. Uncertainty Artificial Intelligence (UAI) (Monterey), 550–559.Google Scholar
- (2018b) Training verified learners with learned verifiers. Preprint, submitted May 25, https://arxiv.org/abs/1805.10265.Google Scholar
- (2020) Expression of fractals through neural network functions. IEEE J. Selected Areas Inform. Theory 1(1):57–66.Crossref, Google Scholar
- (2017) Formal verification of piece-wise linear feed-forward neural networks. D’Souza D, Narayan Kumar K, eds. Automated Tech. Verification Anal. ATVA 2017, Lecture Notes in Computer Science, vol. 10482 (Springer, Cham, Switzerland), 269–286.Google Scholar
- (2020) Identifying efficient sub-networks using mixed integer programming. OPT2020: 12th Annual Workshop Optim. Machine Learn.Google Scholar
- (2019) Neural architecture search: A survey. J. Machine Learn. Res. 20(1):1997–2017.Google Scholar
- (2024) Topological expressivity of ReLU neural networks. Agrawal S, Roth A, eds. Proc. 37th Conf. Learn. Theory (COLT), vol. 247 (PMLR, New York), 1599–1642.Google Scholar
- (2020) Convex geometry of two-layer ReLU networks: Implicit autoencoding and interpretable models. Chiappa S, Calandra R, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 108 (PMLR, New York), 4024–4033.Google Scholar
- (2021a) Convex geometry and duality of over-parameterized neural networks. J. Machine Learn. Res. 22(1):9646–9708.Google Scholar
- (2021b) Global optimality beyond two layers: Training deep ReLU networks via convex programs. Proc. 38th Internat. Conf. Machine Learn. (ICLR) (PMLR, New York), 2993–3003.Google Scholar
- (2021c) Implicit convex regularizers of CNN architectures: Convex optimization of two-and three-layer networks in polynomial time. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2021d) Revealing the structure of deep neural networks via convex duality. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (PMLR, New York), 3004–3014.Google Scholar
- (2024) Path regularization: A convexity and sparsity inducing regularization for parallel ReLU networks. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 59761–59786.Google Scholar
- (2025) The convex landscape of neural networks: Characterizing global optima and stationary points via Lasso models. IEEE Trans. Inform. Theory 71(5):3854–3870.Crossref, Google Scholar
- (2023) Globally optimal training of neural networks with threshold activation functions. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2022) Demystifying batch normalization in ReLU networks: Equivalent convex optimization models and implicit regularization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2018) Robust physical-world attacks on deep learning visual classification. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
- (2023a) Quasi-equivalence between width and depth of neural networks. J. Machine Learn. Res. 24(183):1–22.Google Scholar
- (2023b) Deep ReLU networks have surprisingly simple polytopes. Preprint, submitted May 16, https://arxiv.org/abs/2305.09145.Google Scholar
- (2020) Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. IEEE Trans. Automatic Control 67(1):1–15.Crossref, Google Scholar
- (2019) Efficient and accurate estimation of Lipschitz constants for deep neural networks. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 11427–11438.Google Scholar
- (2020) AReN: Assured ReLU NN architecture for model predictive control of LTI systems. HSCC’20: Proc. 23rd Internat. Conf. Hybrid Systems: Comput. Control (Association for Computing Machinery, New York), 1–11.Google Scholar
- (2022) Complete verification via multi-neuron relaxation guided branch-and-bound. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2019) Adversarial attacks on medical machine learning. Science 363(6433):1287–1289.Crossref, Google Scholar
- (2018) Deep neural networks and mixed integer linear optimization. Constraints 23:296–309.Crossref, Google Scholar
- (1826) Solution d’une question particuliére du calcul des inégalités. Nouveau Bull. Des Sci. Par la Société Philomatique de Paris.Google Scholar
- (2025) SMLE: Safe machine learning via embedded overapproximation. AAAI Conf. Artificial Intelligence 39(26):27286–27294.Google Scholar
- (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
- (2024) Training neural networks is NP-hard in fixed dimension. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 44039–44049.Google Scholar
- (2024) Complexity of injectivity and verification of ReLU neural networks. Preprint, submitted May 30, https://arxiv.org/abs/2405.19805.Google Scholar
- (2022) The computational complexity of ReLU network training parameterized by data dimensionality. J. Artificial Intelligence Res. 74:1775–1790.Crossref, Google Scholar
- (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics 36:193–202.Crossref, Google Scholar
- (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3):183–192.Crossref, Google Scholar
- (2020) Hyperplane arrangements of trained ConvNets are biased. Preprint, submitted March 17, https://arxiv.org/abs/2003.07797.Google Scholar
- (2022) Are all linear regions created equal? Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 151 (PMLR, New York), 6573–6590.Google Scholar
- (2021) Optimization problems for machine learning: A survey. Eur. J. Oper. Res. 290(3):807–828.Crossref, Google Scholar
- (2020) VectorNet: Encoding HD maps and agent dynamics from vectorized representation. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
- (2012) Using piecewise linear functions for solving MINLPs. Lee J, Leyffer S, eds. Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications, vol. 154 (Springer, New York), 287–314.Google Scholar
- (2021) Compressing interpretable representations of piecewise linear neural networks using neuro-fuzzy models. IEEE Sympos. Series Comput. Intelligence (SSCI) (IEEE, Piscataway, NJ), 2057–2066.Google Scholar
- (2011) Deep sparse rectifier neural networks. Gordon G, Dunson D, Dudík M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 315–323.Google Scholar
- (2021) Training of ReLU activated multilayerd neural networks with mixed integer linear programs. Technical Report No. 2021-01, Hochschule Niederrhein, Fachbereich Elektrotechnik & Informatik, Krefeld, Germany.Google Scholar
- (2021) Tight hardness results for training depth-2 ReLU networks. 12th Innovations Theoret. Comput. Sci. Conf. (ITCS 2021), Leibniz International Proceedings in Informatics (LIPIcs), vol. 185 (Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Wadern, Germany), 22:1–22:14.Google Scholar
- (2023) Data-driven robust optimization using deep neural networks. Comput. Oper. Res. 151:106087.Crossref, Google Scholar
- (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
- (2015) Explaining and harnessing adversarial examples. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2013) Maxout networks. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (JMLR), 1319–1327.Google Scholar
- (2014) Generative adversarial nets. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 2672–2680.Google Scholar
- (2019) Property inference for deep neural networks. ASE’19: Proc. 34th IEEE/ACM Internat. Conf. Automated Software Engrg. (IEEE, Piscataway, NJ), 797–809.Google Scholar
- (2024) On the number of regions of piecewise linear neural networks. J. Comput. Appl. Math. 441:115667.Crossref, Google Scholar
- (2018) On the effectiveness of interval bound propagation for training verifiably robust models. Preprint, submitted October 30, https://arxiv.org/abs/1810.12715.Google Scholar
- (2014) Towards end-to-end speech recognition with recurrent neural networks. Xing EP, Jebara T, eds. ICML’14: Proc. 31st Internat. Conf. Machine Learn. (ICML), vol. 32 (JMLR), II-1764–II-1772.Google Scholar
- (2022) On transversality of bent hyperplane arrangements and the topological expressiveness of ReLU neural networks. SIAM J. Appl. Algebra Geometry 6(2):216–242.Crossref, Google Scholar
- (2023) Hidden symmetries of ReLU networks. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlet J, eds. ICML’23: Proc. 40th Internat. Conf. Machine Learn. (ICML) (JMLR), 11734–11760.Google Scholar
- (2025) Depth-bounds for neural networks via the braid arrangement. Thirty-ninth Annual Conf. Neural Inform. Processing Systems (OpenReview).Google Scholar
- (2019) ReLU networks as surrogate models in mixed-integer linear programs. Comput. Chemical Engrg. 131:106580.Crossref, Google Scholar
- (2012) Generalized disjunctive programming: A framework for formulation and alternative algorithms for MINLP optimization. Lee J, Leyffer S, eds. Mixed Integer Nonlinear Programming, The IMA Volumes in Mathematics and its Applications, vol. 154 (Springer, New York), 93–115.Google Scholar
- (2023) Lower bounds on the depth of integral ReLU neural networks via lattice polytopes. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405:947–951.Crossref, Google Scholar
- (2021) Single-neuron convexification for binarized neural networks. Preprint, submitted May 27, https://optimization-online.org/?p=17148.Google Scholar
- (2019a) Complexity of linear regions in deep networks. Internat. Conf. Machine Learn. (ICML) (PMLR, New York).Google Scholar
- (2019b) Deep ReLU networks have surprisingly few activation patterns. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 18293–18306.Google Scholar
- (2017) Approximating continuous functions by ReLU nets of minimal width. Preprint, submitted October 31, https://arxiv.org/abs/1710.11278.Google Scholar
- (2021) OSIP: Tightened bound propagation for the verification of ReLU neural networks. Calinescu R, Păsăreanu CS, eds. Software Engrg. Formal Methods. SEFM 2021, Lecture Notes in Computer Science, vol. 13085 (Springer, Cham, Switzerland), 463–480.Google Scholar
- (2021) Neural networks behave as hash encoders: An empirical study. Preprint, submitted January 14, https://arxiv.org/abs/2101.05490.Google Scholar
- (2020) ReLU deep neural networks and linear finite elements. J. Comput. Math. 38(3):502–527.Crossref, Google Scholar
- (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. IEEE Internat. Conf. Comput. Vision (ICCV) (IEEE, Piscataway, NJ), 1026–1034.Google Scholar
- (2016) Deep residual learning for image recognition. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1625–1634.Google Scholar
- (2021) DEEPSPLIT: An efficient splitting method for neural network verification via indirect effect analysis. Zhou Z, ed. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 2549–2555.Google Scholar
- (2022) Repairing misclassifications in neural networks using limited data. SAC’22: Proc. ACM/SIGAPP Sympos. Appl. Comput. (Association for Computing Machinery, New York), 1625–1634.Google Scholar
- (2024) Neural networks and (virtual) extended formulations. Preprint, submitted November 5, https://arxiv.org/abs/2411.03006.Google Scholar
- (2023) Towards lower bounds on the depth of ReLU neural networks. SIAM J. Discrete Math. 37(2):997–1029.Crossref, Google Scholar
- (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 29(6):82–97.Crossref, Google Scholar
- (2021) Using activation histograms to bound the number of affine regions in ReLU feed-forward neural networks. Preprint, submitted March 31, https://arxiv.org/abs/2103.17174.Google Scholar
- (2019) A framework for the construction of upper bounds on the number of affine linear regions of ReLU feed-forward neural networks. IEEE Trans. Inform. Theory 65(11):7304–7324.Crossref, Google Scholar
- (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.Crossref, Google Scholar
- (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8):2554–2558.Crossref, Google Scholar
- (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.Crossref, Google Scholar
- (2020a) Sharp rate of convergence for deep neural network classifiers under the teacher-student setting. Preprint, submitted January 19, https://arxiv.org/abs/2001.06892.Google Scholar
- (2020b) Measuring model complexity of neural networks with curve activation functions. KDD’20: Proc. 26th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1521–1531.Google Scholar
- (2021) Model complexity of deep learning: A survey. Knowledge Inform. Systems 63:2585–2619.Crossref, Google Scholar
- (2021) Training certifiably robust neural networks with efficient local Lipschitz bounds. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 22745–22757.Google Scholar
- (2020) A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37:100270.Crossref, Google Scholar
- (2022) Nonconvex piecewise linear functions: Advanced formulations and simple modeling tools. Oper. Res. 71(5):1835–1856.Link, Google Scholar
- (2018) Limitations of the Lipschitz constant as a defense against adversarial examples. Alzate C, ed. ECML PKDD 2018 Workshops. ECML PKDD 2018, Lecture Notes in Computer Science, vol. 11329 (Springer, Cham, Switzerland), 16–29.Google Scholar
- (2020) Un-rectifying non-linear networks for signal representation. IEEE Trans. Signal Processing 68:196–210.Crossref, Google Scholar
- (2019) Training binarized neural networks using MIP and CP. Schiex T, de Givry S, eds. Principles Practice Constraint Programming. CP 2019, Lecture Notes in Computer Science, vol. 11802 (Springer, Cham, Switzerland), 401–417.Google Scholar
- (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Bach F, Blei D, eds. ICML’15: Proc. 32nd Internat. Conf. Machine Learn. (ICML), vol. 37 (JMLR), 448–456.Google Scholar
- (1984) Modelling with integer variables. Korte B, Ritter K, eds. Mathematical Programming at Oberwolfach II, Mathematical Programming Studies, vol. 22 (Springer, Berlin, Heidelberg), 167–184.Crossref, Google Scholar
- (2020) Efficient exact verification of binarized neural networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1782–1795.Google Scholar
- (2020) ARCH-COMP20 category report: Artificial intelligence and neural network control systems (AINNCS) for continuous and hybrid systems plants. EPiC Series Comput. 74:107–139.Crossref, Google Scholar
- (2020) Exactly computing the local Lipschitz constant of ReLU networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 7344–7353.Google Scholar
- (2019) Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 14082–14092.Google Scholar
- (2024) Neural embedded mixed-integer optimization for location-routing problems. Preprint, submitted December 7, https://arxiv.org/abs/2412.05665.Google Scholar
- (2021) Ordered counterfactual explanation by mixed-integer linear optimization. Proc. AAAI Conf. Artificial Intelligence 35(13):11564–11574.Crossref, Google Scholar
- (2020) Efficient representation and approximation of model predictive control laws via deep learning. IEEE Trans. Cybernetics 50(9):3866–3878.Crossref, Google Scholar
- (2025) Deterministic global optimization over trained Kolmogorov Arnold networks. Preprint, submitted March 4, https://arxiv.org/abs/2503.02807.Google Scholar
- (2020) Integrating deep learning models and multiparametric programming. Comput. Chemical Engrg. 136:106801.Crossref, Google Scholar
- (2017) Reluplex: An efficient SMT solver for verifying deep neural networks. Majumdar R, Kunčak V, eds. Comput. Aided Verification. CAV 2017, Lecture Notes in Computer Science, vol. 10426 (Springer, Cham, Switzerland), 97–117.Google Scholar
- (2019) The marabou framework for verification and analysis of deep neural networks. Dillig I, Tasiran S, eds. Comput. Aided Verification. CAV 2019, Lecture Notes in Computer Science, vol. 11561 (Springer, Cham, Switzerland), 443–452.Google Scholar
- (2022) Origami in N dimensions: How feed-forward networks manufacture linear separability. Preprint, submitted March 21, https://arxiv.org/abs/2203.11355.Google Scholar
- (2022) Neural networks with linear threshold activations: Structure and algorithms. Aardal K, Sanità L, eds. Integer Programming Combin. Optim. IPCO 2022, Lecture Notes in Computer Science, vol. 13265 (Springer, Cham, Switzerland), 347–360.Google Scholar
- (2023) Neural networks with linear threshold activations: Structure and algorithms. Math. Programming 206:333–356.Crossref, Google Scholar
- (2021) Peregrinn: Penalized-relaxation greedy neural network verifier. Internat. Conf. Comput. Aided Verification (Springer, Cham, Switzerland), 287–300.Google Scholar
- (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980.Google Scholar
- (2022) Modeling the ac power flow equations with optimally compact neural networks: Application to unit commitment. Electric Power Systems Res. 213:108282.Crossref, Google Scholar
- (2021) Formal analysis of neural network-based systems in the aircraft domain. Huisman M, Păsăreanu C, Zhan N, eds. Formal Methods. FM 2021, Lecture Notes in Computer Science, vol. 13047 (Springer, Cham, Switzerland), 730–740.Google Scholar
- (2012) ImageNet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. NIPS’12: Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1097–1105.Google Scholar
- (2021) Between steps: Intermediate relaxations between big-M and convex hull formulations. Stuckey PJ, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2021, Lecture Notes in Computer Science, vol. 12735 (Springer, Cham, Switzerland), 200–218.Google Scholar
- (2025) P-split formulations: A class of intermediate formulations between big-M and convex hull for disjunctive constraints. Math. Programming, ePub ahead of print June 9, https://doi.org/10.1007/s10107-025-02232-1.Crossref, Google Scholar
- (2019) Equivalent and approximate transformations of deep neural networks. Preprint, submitted April 11, https://arxiv.org/abs/1905.1142.Google Scholar
- (2013) Block-coordinate Frank-Wolfe optimization for structural SVMs. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (PMLR), 53–61.Google Scholar
- (2022) Tight neural network verification via semidefinite relaxations and linear reformulations. Proc. AAAI Conf. Artificial Intelligence 36(7):7272–7280.Crossref, Google Scholar
- (2020) Lipschitz constant estimation of neural networks via sparse polynomial optimization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2015) Deep learning. Nature 521(7553):436–444.Crossref, Google Scholar
- (1998) Efficient BackProp. Montavon G, Orr G, Müller K, eds. Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 1524 (Springer, Berlin, Heidelberg), 9–50.Crossref, Google Scholar
- (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4):541–551.Crossref, Google Scholar
- (2001) Polyhedral methods for piecewise-linear functions I: The lambda method. Discrete Appl. Math. 108(3):269–285.Crossref, Google Scholar
- (2019) Towards robust, locally linear deep networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2021) Globally-robust neural networks. Meila M, Zhang T, eds. Internat. Conf. Machine Learn. (ICML), vol. 139 (PMLR, New York), 6212–6222.Google Scholar
- (2018) Automated verification of neural networks: Advances, challenges and perspectives. Preprint, submitted May 25, https://arxiv.org/abs/1805.09938.Google Scholar
- (2022) SoK: Certified robustness for deep neural networks. 2023 IEEE Sympos. Security and Privacy (SP) (IEEE Computer Society, Washington, DC), 1289–1310.Google Scholar
- (2021) Biased ReLU neural networks. Neurocomput. 423:71–79.Crossref, Google Scholar
- (2015) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
- (1970) The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki, Helsinki, Finland. [In Finnish.]Google Scholar
- (1974) The existence of persistent states in the brain. Math. Biosci. 19(1–2):101–120.Crossref, Google Scholar
- (2025) Optimization over trained neural networks: Difference-of-convex algorithm and application to data center scheduling. IEEE Control Systems Lett. 9:835–840.Crossref, Google Scholar
- (2021) Optimal function approximation with ReLU neural networks. Neurocomput. 435:216–227.Crossref, Google Scholar
- (2020) Certified monotonic neural networks. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 15427–15438.Google Scholar
- (2021) Algorithms for verifying deep neural networks. Foundations Trends Optim. 4(3–4):244–404.Crossref, Google Scholar
- (2025) KAN: Kolmogorov-Arnold Networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2017) Empirical decision model learning. Artificial Intelligence 244:343–367.Crossref, Google Scholar
- (2017) An approach to reachability analysis for feed-forward ReLU neural networks. Preprint, submitted June 22, https://arxiv.org/abs/1706.07351.Google Scholar
- (2021) What training reveals about neural network complexity. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 494–508.Google Scholar
- (2017) The expressive power of neural networks: A view from the width. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. NIPS’17: Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6232–6240.Google Scholar
- (2021) reluMIP: Open source tool for MILP optimization of ReLU neural networks. Accessed February 25, 2026, https://zenodo.org/records/5601907.Google Scholar
- (2020) Fastened crown: Tightened neural network robustness certificates. Proc. AAAI Conf. Artificial Intelligence 34(4):5037–5044.Crossref, Google Scholar
- (2013) Rectifier nonlinearities improve neural network acoustic models. ICML Workshop Deep Learn. Audio, Speech Language Processing.Google Scholar
- (2018) Towards deep learning models resistant to adversarial attacks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (1989) Classification capabilities of two-layer neural nets. Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 635–638.Google Scholar
- (2019) Is deeper better only when shallow is good? Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6429–6438.Google Scholar
- (1993) Mathematical programming in neural networks. ORSA J. Comput. 5(4):349–360.Link, Google Scholar
- (2021) Tropical geometry and machine learning. Proc. IEEE 109(5):728–755.Crossref, Google Scholar
- (2023) Mixed-integer optimization with constraint learning. Oper. Res. 73(2):1011–1028.Link, Google Scholar
- (2024) Finding regions of counterfactual explanations via robust optimization. INFORMS J. Comput. 36(5):1316–1334.Link, Google Scholar
- (2025) Algorithmic determination of the combinatorial structure of the linear regions of ReLU neural networks. SIAM J. Appl. Algebra Geometry 9(2):374–404.Google Scholar
- (2022) The theoretical expressiveness of maxpooling. Preprint, submitted March 2, https://arxiv.org/abs/2203.01016.Google Scholar
- (2002) Lectures on Discrete Geometry, Graduate Texts in Mathematics, vol. 212 (Springer, New York).Crossref, Google Scholar
- (2019) Overview of surrogate modeling in chemical process engineering. Chemie Ingenieur Technik 91(3):228–239.Crossref, Google Scholar
- (1943) A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5:115–133.Crossref, Google Scholar
- (2024) Mixed-integer optimisation of graph neural networks for computer-aided molecular design. Comput. Chemical Engrg. 185:108660.Crossref, Google Scholar
- (2020) Function approximation by deep networks. Comm. Pure Appl. Anal. 19(8):4085–4095.Crossref, Google Scholar
- (1969) Perceptrons: An Introduction to Computational Geometry (MIT Press, Cambridge, MA).Google Scholar
- (2018) Differentiable abstract interpretation for provably robust neural networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 3578–3586.Google Scholar
- (2012) Global optimization of mixed-integer quadratically-constrained quadratic programs (MIQCQP) through piecewise-linear and edge-concave relaxations. Math. Programming 136(1):155–182.Crossref, Google Scholar
- (2015) Human-level control through deep reinforcement learning. Nature 518:529–533.Crossref, Google Scholar
- (2017) Notes on the number of linear regions of deep neural networks. Internat. Conf. Sampling Theory Appl. (SampTA) (IEEE, Piscataway, NJ).Google Scholar
- (2022) Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums. SIAM J. Appl. Algebr. Geom. 6(4).Google Scholar
- (2014) On the number of linear regions of deep neural networks. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 2924–2932.Google Scholar
- (1936) Beitrage zur theorie der linearen Ungleichungen. PhD thesis, University of Basel, Basel, Switzerland.Google Scholar
- (1993) A polynomial time algorithm for generating neural networks for pattern classification: Its stability properties and some test results. Neural Comput. 5(2):317–330.Crossref, Google Scholar
- (2010) Rectified linear units improve restricted Boltzmann machines. Fürnkranz J, Joachims T, eds. ICML’10: Proc. 27th Internat. Conf. Machine Learn. (ICML) (Omnipress, Madison, WI), 807–814.Google Scholar
- (2018) Verifying properties of binarized deep neural networks. AAAI Conf. Artificial Intelligence 32(1).Google Scholar
- (2000) Local linear model trees (LOLIMOT) toolbox for nonlinear system identification. IFAC Proc. Vol. 33(15):845–850.Crossref, Google Scholar
- (1983) A method of solving a convex programming problem with convergence rate o(1k2). Proc. USSR Acad. Sci. 269:543–547.Google Scholar
- (2021) Exploiting sparsity for neural network verification. Jadbabaie A, Lygeros J, Pappas GJ, Parrilo PA, Recht B, Tomlin CJ, Zeilinger MN, eds. Proc. 3rd Conf. Learn. Dynamics Control (L4DC), vol. 144 (PMLR, New York), 715–727.Google Scholar
- (2022) Neural network verification as piecewise linear optimization: Formulations for the composition of staircase functions. Preprint, submitted November 17, https://arxiv.org/abs/2211.14706.Google Scholar
- (2018) Neural networks should be wide enough to learn disconnected decision regions. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 3740–3749.Google Scholar
- (2018) Sensitivity and generalization in neural networks: An empirical study. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- OpenAI (2022) Introducing ChatGPT. Accessed February 25, 2026, https://openai.com/blog/chatgpt.Google Scholar
- (2000) Approximating separable nonlinear functions via mixed zero-one programs. Oper. Res. Lett. 27(1):1–5.Crossref, Google Scholar
- (2022) Constrained discrete black-box optimization using mixed-integer programming. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (ICML), vol. 162 (PMLR, New York), 17295–17322.Google Scholar
- (2021a) Unsupervised representation learning via neural activation coding. Meila M, Zhang T, eds. Internat. Conf. Machine Learn. (ICML), vol. 139 (PMLR, New York), 8391–8400.Google Scholar
- (2021b) Minimum width for universal approximation. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2019) SpecAugment: A simple data augmentation method for automatic speech recognition. Proc. Interspeech 2019 (International Speech Communication Association), 2613–2617.Google Scholar
- (2014) On the number of response regions of deep feedforward networks with piecewise linear activations. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2022) Neur2SP: Neural two-stage stochastic programming. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 23992–24005.Google Scholar
- (2022) Optimizing objective functions from trained ReLU neural networks via sampling. Preprint, submitted May 27, https://arxiv.org/abs/2205.14189.Google Scholar
- (2018) Deep contextualized word representations. Proc. 2018 Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL) (Association for Computational Linguistics, Stroudsburg, PA), 2227–2237.Google Scholar
- (2025) Optimization over trained (and sparse) neural networks: A surrogate within a surrogate. Preprint, submitted May 4, https://arxiv.org/abs/2505.01985.Google Scholar
- (2020) Functional vs. parametric equivalence of ReLU networks. Internat. Conf. Learn. Representations (ICLR) (PMLR, New York).Google Scholar
- (2020) Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. Daumé H, Singh A, eds. ICML’20: Proc. 37th Internat. Conf. Machine Learn. (ICML) (JMLR), 7695–7705.Google Scholar
- (2025) An analysis of optimization problems involving ReLU neural networks. Preprint, submitted February 5, https://arxiv.org/abs/2502.03016.Google Scholar
- (2020) Deep neural network training with Frank-Wolfe. Preprint, submitted October 14, https://arxiv.org/abs/2010.07243.Google Scholar
- (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- (2010) An abstraction-refinement approach to verification of artificial neural networks. Touili T, Cook B, Jackson P, eds. Comput. Aided Verification. CAV 2010, Lecture Notes in Computer Science, vol. 6174 (Springer, Berlin, Heidelberg), 243–257.Google Scholar
- (2022) Globally injective ReLU networks. J. Machine Learn. Res. 23(1):4544–4598.Google Scholar
- (2018) Improving language understanding by generative pre-training. Technical report, OpenAI, https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.Google Scholar
- (2017) On the expressive power of deep neural networks. Precup D, Teh YW, eds. ICML’17: Proc. 34th Internat. Conf. Machine Learn. (ICML), vol. 70 (JMLR), 2847–2854.Google Scholar
- (2018) Semidefinite relaxations for certifying robustness to adversarial examples. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10900–10910.Google Scholar
- (2018) Searching for activation functions. ICLR Workshop Track.Google Scholar
- (1994) Modelling and computational techniques for logic based integer programming. Comput. Chemical Engrg. 18(7):563–578.Crossref, Google Scholar
- (2022) Hierarchical text-conditional image generation with CLIP latents. Preprint, submitted April 13, https://arxiv.org/abs/2204.06125.Google Scholar
- (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- (2019) Dissecting deep neural networks. Preprint, submitted October 9, https://arxiv.org/abs/1910.03879.Google Scholar
- (2020) Reverse-engineering deep ReLU networks. Daumé H, Singh A, eds. ICML’20: Proc. 37th Internat. Conf. Machine Learn. (ICML), vol. 119 (JMLR), 8178–8187.Google Scholar
- (1957) The perceptron—A perceiving and recognizing automaton. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Buffalo, NY.Google Scholar
- (2021) Advances in verification of ReLU neural networks. J. Global Optim. 81:109–152.Crossref, Google Scholar
- (2021) A primer on multi-neuron relaxation-based adversarial robustness certification. ICML 2021 Workshop Adversarial Machine Learn.Google Scholar
- (1993) A polynomial time algorithm for the construction and training of a class of multilayer perceptrons. Neural Networks 6(4):535–545.Crossref, Google Scholar
- (2019) Fast neural network verification via shadow prices. Preprint, submitted June 21, https://arxiv.org/abs/1902.07247.Google Scholar
- (1986) Learning representations by back-propagating errors. Nature 323:533–536.Crossref, Google Scholar
- (2020) CaQL: Continuous action q-learning. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2024) How many neurons does it take to approximate the maximum? Proc. 2024 Annual Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 3156–3183.Google Scholar
- (2021) Vector-output ReLU neural network problems are copositive programs: Convex analysis of two layer networks and polynomial-time algorithms. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2024) Scaling convex neural networks with Burer-Monteiro factorization. 12th Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2019) A convex relaxation barrier to tight robustness verification of neural networks. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9835–9846.Google Scholar
- (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 4510–4520.Google Scholar
- (2023) Locally linear attributes of ReLU neural networks. Frontiers Artificial Intelligence 6:1255192.Crossref, Google Scholar
- (2017) Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. Proc. 30th Internat. Joint Conf. Artificial Intelligence (IJCAI), 750–756.Google Scholar
- (2018) Lipschitz regularity of deep neural networks: Analysis and efficient estimation. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3839–3848.Google Scholar
- (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117.Crossref, Google Scholar
- (2003) On verification & validation of neural network based controllers. Engrg. Appl. Neural Networks (EANN).Google Scholar
- (2023) Stability verification of neural network controllers using mixed-integer programming. IEEE Trans. Automatic Control 68(12):7514–7529.Crossref, Google Scholar
- (2019) Deterministic global optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180(3):925–948.Crossref, Google Scholar
- (2022) Obey validity limits of data-driven models through topological data analysis and one-class classification. Optim. Engrg. 23(2):855–876.Crossref, Google Scholar
- (2021) Linear program powered attack. 2021 Internat. Joint Conf. Neural Networks (IJCNN) (IEEE, Piscataway, NJ), 1–8.Google Scholar
- (2020) Enumerative branching with less repetition. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 399–416.Google Scholar
- (2020) Compact representation of near-optimal integer programming solutions. Math. Programming 182:199–232.Crossref, Google Scholar
- (2020) Empirical bounds on linear regions of deep rectifier networks. Proc. AAAI Conf. Artificial Intelligence 34(4):5628–5635.Crossref, Google Scholar
- (2020) Lossless compression of deep neural networks. Hebrard E, Musliu N, eds. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2020, Lecture Notes in Computer Science, vol. 12296 (Springer, Cham, Switzerland), 417–430.Google Scholar
- (2018) Bounding and counting linear regions of deep neural networks. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 4558–4566.Google Scholar
- (2021) Scaling up exact neural network compression by ReLU stability. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 27081–27093.Google Scholar
- (2022) Careful! Training relevance is real. Preprint, submitted January 12, https://arxiv.org/abs/2201.04429.Google Scholar
- (2025) Neural network verification with branch-and-bound for general nonlinearities. Gurfinkel A, Heule M, eds. Tools Algorithms Construction Anal. Systems. TACAS 2025, Lecture Notes in Computer Science, vol. 15696 (Springer, Cham, Switzerland), 315–335.Google Scholar
- (2022) OVERT: An algorithm for safety verification of neural network control policies for nonlinear systems. J. Machine Learn. Res. 23(117):1–45.Google Scholar
- (2022) A mixed-integer linear programming based training and feature selection method for artificial neural networks using piece-wise linear approximations. Chem. Engrg. Sci. 249:117273.Crossref, Google Scholar
- (2017) Mastering the game of Go without human knowledge. Nature 550:354–359.Crossref, Google Scholar
- (2019a) Beyond the single neuron convex barrier for neural network certification. Wallach HM, Larochelle HM, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. NIPS’19: Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 15098–15109.Google Scholar
- (2019b) An abstract domain for certifying neural networks. Proc. ACM Programming Languages (POPL) 3:1–30.Google Scholar
- (2021) Overcoming the convex barrier for simplex inputs. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4871–4882.Google Scholar
- (2018) Fast and effective robustness certification. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 10825–10836.Google Scholar
- (2006) The optimizer’s curse: Skepticism and postdecision surprise in decision analysis. Management Sci. 52(3):311–322.Link, Google Scholar
- (2024) Scaling mixed-integer programming for certification of neural network controllers using bounds tightening. 2024 IEEE 63rd Conf. Decision Control (CDC) (IEEE, Piscataway, NJ), 1645–1650.Google Scholar
- (2025) Certified robustness to data poisoning in gradient-based training. Trans. Machine Learn. Res. (TMLR).Google Scholar
- (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learn. Res. 15(56):1929–1958.Google Scholar
- (2025) The computational complexity of counting linear regions in ReLU neural networks. Preprint, submitted May 22, https://arxiv.org/abs/2505.16716.Google Scholar
- (2022) ZoPE: A fast optimizer for ReLU networks with low-dimensional inputs. Deshmukh JV, Havelund K, Perez I, eds. NASA Formal Methods. NFM 2022, Lecture Notes in Computer Science, vol. 13260 (Springer, Cham, Switzerland), 299–317.Google Scholar
- (2021) Global optimization of objective functions represented by ReLU networks. Machine Learn. 112:3685–3712.Crossref, Google Scholar
- (2020) Unwrapping the black box of deep ReLU networks: Interpretability, diagnostics, and simplification. Preprint, submitted November 8, https://arxiv.org/abs/2011.04041.Google Scholar
- (2014) Sequence to sequence learning with neural networks. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 3104–3112.Google Scholar
- (2013) On the importance of initialization and momentum in deep learning. Dasgupta S, McAllester D, eds. ICML’13: Proc. 30th Internat. Conf. Machine Learn. (ICML), vol. 28 (JMLR), 1139–1147.Google Scholar
- (2014) Intriguing properties of neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2015) Going deeper with convolutions. 2015 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 1–9.Google Scholar
- (2021) On the number of linear functions composing deep neural network: Towards a refined definition of neural networks complexity. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., vol. 130 (PMLR, New York), 3799–3809.Google Scholar
- (2022) Piecewise linear neural networks and deep learning. Nature Rev. Methods Primers 2:42.Crossref, Google Scholar
- (2015) Representation benefits of deep feedforward networks. Preprint, submitted September 27, https://arxiv.org/abs/1509.08101.Google Scholar
- (2021) On training neural networks with mixed integer programming. IJCAI-PRICAI’20 Workshop Data Sci. Meets Optim., Yokohama, Japan.Google Scholar
- (2023) Optimal training of integer-valued neural networks with mixed integer programming. PLoS One 18(2):e0261029.Crossref, Google Scholar
- (2022) Effects of data geometry in early deep learning. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 30099–30113.Google Scholar
- (2020) The convex relaxation barrier, revisited: Tightened single-neuron relaxations for neural network verification. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 21675–21686.Google Scholar
- (2019) Evaluating robustness of neural networks with mixed integer programming. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2024) Optimization over trained neural networks: Taking a relaxing walk. Dilkina B, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2024, Lecture Notes in Computer Science, vol. 14743 (Springer, Cham, Switzerland), 221–233.Google Scholar
- (2021) TropEx: An algorithm for extracting linear terms in deep neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2019) 110th anniversary: Using data to bridge the time and length scales of process systems. Indust. Engrg. Chemistry Res. 58(36):16696–16708.Crossref, Google Scholar
- (2021) Partition-based formulations for mixed-integer optimization of trained ReLU neural networks. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3068–3080.Google Scholar
- (2021) On the expected complexity of maxout networks. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. NIPS’21: Proc. 35th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 28995–29008.Google Scholar
- (2025) PySCIPOpt-ML: Embedding trained machine learning models into mixed-integer programs. Tack G, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2025, Lecture Notes in Computer Science, vol. 15763 (Springer, Cham, Switzerland), 218–234.Google Scholar
- (2019) A representer theorem for deep neural networks. J. Machine Learn. Res. 20(110):1–30.Google Scholar
- (2024) On minimal depth in neural networks. Preprint, submitted February 23, https://arxiv.org/abs/2402.15315.Google Scholar
- (2017) Attention is all you need. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. NIPS’17: Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6000–6010.Google Scholar
- (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.Crossref, Google Scholar
- (2019) Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Programming 177(1–2):21–53.Crossref, Google Scholar
- (2010) Mixed-integer models for nonseparable piecewise-linear optimization: Unifying framework and extensions. Oper. Res. 58(2):303–315.Link, Google Scholar
- (2023) Any deep ReLU network is shallow. Preprint, submitted June 20, https://arxiv.org/abs/2306.11827.Google Scholar
- (2021) Reachable polyhedral marching (RPM): A safety verification algorithm for robotic systems with deep neural network components. IEEE Internat. Conf. Robotics Automation (ICRA) (IEEE, Piscataway, NJ), 9029–9035.Google Scholar
- (2017) StarCraft II: A new challenge for reinforcement learning. Preprint, submitted August 16, https://arxiv.org/abs/1708.04782.Google Scholar
- (2020) Meta-learning acquisition functions for transfer learning in Bayesian optimization. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2022) Estimation and comparison of linear regions for ReLU networks. De Raedt L, ed. Proc. 31th Internat. Joint Conf. Artificial Intelligence (IJCAI), 3544–3550.Google Scholar
- (2005) Generalization of hinging hyperplanes. IEEE Trans. Inform. Theory 51(12):4425–4431.Crossref, Google Scholar
- (2021) A two-stage exact algorithm for optimization of neural network ensemble. Stuckey PJ, ed. Integration Constraint Programming Artificial Intelligence Oper. Res. CPAIOR 2021, Lecture Notes in Computer Science, vol. 12735 (Springer, Cham, Switzerland), 106–114.Google Scholar
- (2023) Optimizing over an ensemble of trained neural networks. INFORMS J. Comput. 35(3):652–674.Link, Google Scholar
- (2018a) Efficient formal safety analysis of neural networks. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 6369–6379.Google Scholar
- (2018b) Formal security analysis of neural networks using symbolic intervals. SEC’18: Proc. 27th USENIX Conf. Security Sympos. (USENIX Association, San Francisco), 1599–1614.Google Scholar
- (1992) Cresceptron: A self-organizing neural network which grows adaptively. 2021 Internat. Joint Conf. Neural Networks (IJCNN), vol. 1 (IEEE, Piscataway, NJ), 576–581.Google Scholar
- (2018) Towards fast computation of certified robustness for ReLU networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5276–5285.Google Scholar
- (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University, Cambridge, MA.Google Scholar
- (2023) Robust explanation constraints for neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2020) Probabilistic safety for Bayesian neural networks. Peters J, Sontag D, eds. Proc. 36th Conf. Uncertainty Artificial Intelligence (UAI), vol. 124 (PMLR, New York), 1198–1207.Google Scholar
- (2025) Certification for differentially private prediction in gradient-based training. Singh A, Fazel M, Hsu D, Lacoste-Julien S, Berkenkamp F, Maharaj T, Wagstaff K, Zhu J, eds. Proc. 42nd Internat. Conf. Machine Learn. (ICML), vol. 267 (PMLR, New York), 66726–66745.Google Scholar
- (2022) Convex and concave envelopes of artificial neural network activation functions for deterministic global optimization. J. Global Optim. 85:569–594.Crossref, Google Scholar
- (2025) Deterministic global optimization with trained neural networks: Is the envelope of single neurons worth it? Preprint, submitted April 28, https://optimization-online.org/2025/04/deterministic-global-optimization-with-trained-neural-networks-is-the-envelope-of-single-neurons-worth-it/.Google Scholar
- (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5286–5295.Google Scholar
- (2018) Scaling provable adversarial defenses. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8410–8419.Google Scholar
- (2018) Optimization algorithms for data analysis. Mahoney MW, Duchi JC, Gilbert AC, eds. The Mathematics of Data, IAS/Park City Mathematics Series, vol. 25 (American Mathematical Society, Providence, RI), 49–98.Crossref, Google Scholar
- (2020) Scalable planning with deep neural network learned transition models. J. Artificial Intelligence Res. 68:571–606.Crossref, Google Scholar
- (2022) Efficient neural network analysis with sum-of-infeasibilities. Fisman D, Rosu G, eds. Tools Algorithms Construction Anal. Systems. TACAS 2025, Lecture Notes in Computer Science, vol. 13243 (Springer, Cham, Switzerland), 143–163.Google Scholar
- (2017) Reachable set computation and safety verification for neural networks with ReLU activations. Preprint, submitted December 21, https://arxiv.org/abs/1712.08163.Google Scholar
- (2019) Training for faster adversarial robustness verification via inducing ReLU stability. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2020a) A general computational framework to measure the expressiveness of complex networks using a tighter upper bound of linear regions. Preprint, submitted December 8, https://arxiv.org/abs/2012.04428.Google Scholar
- (2020b) Self-training with noisy student improves ImageNet classification. 2018 IEEE/CVF Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ), 10684–10695.Google Scholar
- (2020c) Efficient projection-free online methods with stochastic recursive gradient. Proc. AAAI Conf. Artificial Intelligence 34(4):6446–6453.Crossref, Google Scholar
- (2020) On the number of linear regions of convolutional neural networks. Daumé III H, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn. (ICML), vol. 119 (PMLR, New York), 10514–10523.Google Scholar
- (2022) Traversing the local polytopes of ReLU neural networks. AAAI Workshop AdvML (AAAI Press, Washington, DC).Google Scholar
- (2022) Modeling design and control problems involving neural network surrogates. Comput. Optim. Appl. 83:759–800.Crossref, Google Scholar
- (2020) Reachability analysis for feed-forward neural networks using face lattices. Preprint, submitted March 2, https://arxiv.org/abs/2003.01226.Google Scholar
- (2021) Reachability analysis of convolutional neural networks. Preprint, submitted June 22, https://arxiv.org/abs/2106.12074.Google Scholar
- (2017) Error bounds for approximations with deep ReLU networks. Neural Networks 94:103–114.Crossref, Google Scholar
- (2001) Verification of a trained neural network accuracy. Internat. Joint Conf. Neural Networks (IJCNN), vol. 3 (IEEE, Piscataway, NJ), 1657–1662.Google Scholar
- (2025) Linear-size neural network representation of piecewise affine functions in R2. Preprint, submitted March 17, https://arxiv.org/abs/2503.13001.Google Scholar
- (1975) Facing Up to Arrangements: Face-Count Formulas for Partitions of Space by Hyperplanes (American Mathematical Society, Providence, RI).Google Scholar
- (2020) On the tightness of semidefinite relaxations for certifying robustness to adversarial examples. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3808–3820.Google Scholar
- (2020) Empirical studies on the properties of linear regions in deep neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2018a) Tropical geometry of deep neural networks. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML), vol. 80 (PMLR, New York), 5824–5832.Google Scholar
- (2023a) Dive into Deep Learning (Cambridge University Press, Cambridge, UK).Google Scholar
- (2018b) Efficient neural network robustness certification with general activation functions. Bengio S, Beygelzimer A, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4944–4953.Google Scholar
- (2023b) Optimizing over trained GNNs via symmetry breaking. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. NIPS’23: Proc. 37th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 44898–44924.Google Scholar
- (2020) Towards stable and efficient training of verifiably robust neural networks. Internat. Conf. Learn. Representations (ICLR).Google Scholar
- (2022) General cutting planes for bound-propagation-based neural network verification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1656–1670.Google Scholar
- (2023) Model-based feature selection for neural networks: A mixed-integer programming approach. Sellmann M, Tierney K, eds. Learn. Intelligent Optim. LION 2023, Lecture Notes in Computer Science, vol. 14286 (Springer, Cham, Switzerland), 223–238.Google Scholar
- (2019) An analysis of the expressiveness of deep neural network architectures based on their Lipschitz constants. Preprint, submitted January 18, https://arxiv.org/abs/1912.11511.Google Scholar
- (2024) Scalable neural network verification with branch-and-bound inferred cutting planes. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. NIPS’24: Proc. 38th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 29324–29353.Google Scholar
- (2020) Bounding the number of linear regions in local area for neural networks with ReLU activations. Preprint, submitted July 14, https://arxiv.org/abs/2007.06803.Google Scholar
- (2019) On Lipschitz bounds of general convolutional neural networks. IEEE Trans. Inform. Theory 66(3):1738–1759.Crossref, Google Scholar

