Wasserstein Distributionally Robust Shallow Convex Neural Networks
Published Online:26 Aug 2025https://doi.org/10.1287/ijoo.2024.0048
References
- (2012) A Connectionist Machine for Genetic Hillclimbing, vol. 28 (Springer Science & Business Media, New York).Google Scholar
- (2005) MVF—Multivariate test functions library in C for unconstrained global optimization. Technical report, Department of Mathematics, U.P. Diliman, Manila, Philippines.Google Scholar
- (2021) Introduction to neural network verification. Foundations Trends Programming Languages 7(1–2):1–157.Google Scholar
- (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5(4–5):185–196.Google Scholar
- (2018) A review of data-driven building energy consumption prediction studies. Renewable Sustainable Energy Rev. 81(1):1192–1205.Google Scholar
- (2022) Algorithm 1027: NOMAD version 4: Nonlinear optimization with the MADS algorithm. ACM Trans. Math. Software 48(3):1–22.Google Scholar
- (2023b) Efficient global optimization of two-layer ReLU networks: Quadratic-time algorithms and adversarial training. SIAM J. Math. Data Sci. 5(2):446–474.Google Scholar
- (2023a) Wasserstein distributional robustness of neural networks. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Inc., Red Hook, NY), 26322–26347.Google Scholar
- (2024) A Python benchmark functions framework for numerical optimisation problems. Technical report, School of Computer Science, University of Birmingham, UK.Google Scholar
- (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3:463–482.Google Scholar
- (2017) Automatic differentiation in machine learning: A survey. J. Machine Learn. Res. 18(1):5595–5637.Google Scholar
- (2011) Algorithms for hyper-parameter optimization, vol. 24, 1–9.Google Scholar
- (2015) Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Sci. Discovery 8(1):014008.Google Scholar
- (2023) Deep Learning: Foundations and Concepts (Springer Nature, New York).Google Scholar
- (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
- (2015) Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vision 51:22–45.Google Scholar
- (2013) Unidimensional and evolution methods for optimal transportation. Unpublished PhD thesis, Université Paris Sud-Paris XI, Scuola normale superiore, Pise, Italie.Google Scholar
- (2014) Subgradient methods. Notes for EE364b, Stanford University, Spring 2014. https://web.stanford.edu/class/ee364b/lectures/subgrad_method_notes.pdf.Google Scholar
- (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Google Scholar
- (2022) Machine learning in power systems: Is it time to trust it? IEEE Power Energy Magazine 20(3):32–41.Google Scholar
- (2018) A robust learning approach for regression models based on distributionally robust optimization. J. Machine Learn. Res. 19(13):1–48.Google Scholar
- (2020) Distributionally robust learning. Foundations Trends Optim. 4(1–2):1–243.Google Scholar
- (2022) Scientific machine learning through physics-informed neural networks: Where we are and what’s next. J. Sci. Comput. 92(3):88.Google Scholar
- (2002) Foundations of Bilevel Programming (Springer Science & Business Media, New York).Google Scholar
- (2016) CVXPY: A Python-embedded modeling language for convex optimization. J. Machine Learn. Res. 17(83):1–5.Google Scholar
- (2005) Applying support vector machines to predict building energy consumption in tropical region. Energy Buildings 37(5):545–553.Google Scholar
- (2021) PhyGeoNet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 428:110079.Google Scholar
- (2023) SAFE artificial intelligence in finance. Finance Res. Lett. 56:104088.Google Scholar
- (2024) Clarabel: An interior-point solver for conic programs with quadratic objectives. Technical report, Department of Engineering Science, University of Oxford, Oxford, UK.Google Scholar
- (1941) A theory of steady-state activity in nerve-fiber networks: I. Definitions and preliminary lemmas. Bull. Math. Biophysics 3:63–69.Google Scholar
- (2017) Safety verification of deep neural networks. Comput. Aided Verification 29th Internat. Conf. Proc. (Springer, New York), 3–29.Google Scholar
- (2021) Training certifiably robust neural networks with efficient local Lipschitz bounds. Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 22745–22757.Google Scholar
- (1993) Simulated annealing: Practice versus theory. Math. Comput. Model. 18(11):29–57.Google Scholar
- (2021) Physics-informed machine learning. Nature Rev Phys. 3(6):422–440.Google Scholar
- (2021) A recipe for global convergence guarantee in deep neural networks. Proc. AAAI Conf. Artificial Intelligence, vol. 35, 8074–8082.Google Scholar
- (1994) Experiences with optimizers in structural design. Proc. Conf. Adaptive Comput. Engrg. Design Control, vol. 94, 14–27.Google Scholar
- (2023) UCI machine learning repository. Accessed August 2024, http://archive.ics.uci.edu/ml.Google Scholar
- (2019) Generalized sliced Wasserstein distances. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–12.Google Scholar
- (2023) Machine learning operations (MLOps): Overview, definition, and architecture. IEEE Access 11:31866–31879.Google Scholar
- (2024) Adversarial training of two-layer polynomial and ReLU activation networks via convex optimization. Preprint, submitted May 22, https://arxiv.org/abs/2405.14033.Google Scholar
- (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS, Catonsville, MY), 130–166.Link, Google Scholar
- (2020) Large-scale methods for distributionally robust optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 8847–8860.Google Scholar
- (2022) Distributionally robust learning with stable adversarial training. IEEE Trans. Knowledge Data Engrg. 35(11):11288–11300.Google Scholar
- (2021) Physics-informed neural networks with hard constraints for inverse design. SIAM J. Sci. Comput. 43(6):B1105–B1132.Google Scholar
- (2015) Short-term load forecasting in a non-residential building contrasting models and attributes. Energy Buildings 92:322–330.Google Scholar
- (2024) CRONOS: Enhancing deep learning with scalable GPU accelerated convex neural networks. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Adv. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY), 102973–103004.Google Scholar
- (2022) Fast convex optimization for two-layer ReLU networks: Equivalent model classes and cone decompositions. , eds. Proc. 39th Internat. Conf. Machine Learn., vol. 162 (Proceedings of Machine Learning Research, New York), 15770–15816.Google Scholar
- (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1):115–166.Google Scholar
- (2018) Forecasting power consumption for higher educational institutions based on machine learning. J. Supercomputing 74:3778–3800.Google Scholar
- (2010) Building-level occupancy data to improve ARIMA-based electricity use forecasts. Proc. 2nd ACM Workshop Embedded Sensing Systems Energy-Efficiency Building (Association for Computing Machinery, New York), 13–18.Google Scholar
- (2025) Sliced-Wasserstein distance-based data selection. Preprint, submitted April 17, https://arxiv.org/abs/2504.12918.Google Scholar
- (2024) Sliced-Wasserstein-based anomaly detection and open dataset for localized critical peak rebates. NeurIPS 2024 Workshop Tackling Climate Change Machine Learn.Google Scholar
- (2019) Statistical aspects of Wasserstein distances. Annual Rev. Statist. Appl. 6(1):405–431.Google Scholar
- (2020) An Invitation to Statistics in Wasserstein Space (Springer Nature, New York).Google Scholar
- (2019) Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inform. Processing Systems, vol. 32, 8024–8035.Google Scholar
- (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
- (2013) A benchmark of kriging-based infill criteria for noisy optimization. Structural Multidisciplinary Optim. 48:607–626.Google Scholar
- (2020) Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. Internat. Conf. Machine Learn. (PMLR, New York), 7695–7705.Google Scholar
- (2007) Virtual power plant and system integration of distributed energy resources. IET Renewable Power Generation 1(1):10–16.Google Scholar
- (2022) Distributionally robust conditional quantile prediction with fixed design. Management Sci. 68(3):1639–1658.Link, Google Scholar
- (2022) Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Computers Biol. Medicine 149:106043.Google Scholar
- (2020) Distributionally robust neural networks. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
- (2014) Demand response and smart grids—A survey. Renewable Sustainable Energy Rev. 30:461–478.Google Scholar
- (2024) The boundary of neural network trainability is fractal. Preprint, submitted February 9, https://arxiv.org/abs/2402.06184.Google Scholar
- (2022) Closing the loop: A framework for trustworthy machine learning in power systems. Proc. 11th Bulk Power Systems Dynam. Control Sympos., 1–21.Google Scholar
- (2012) Energy efficiency. UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/242/energy+efficiency.Google Scholar
- (2000) Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
- (2020) Learning optimal power flow: Worst-case guarantees for neural networks. 2020 IEEE Internat. Conf. Comm. Control Comput. Tech. Smart Grids, vol. 11, 1–7.Google Scholar
- (2021) The hidden convex optimization landscape of regularized two-layer ReLU networks: An exact characterization of optimal solutions. Internat. Conf. Learn. Representations, 1–26.Google Scholar
- (2015) Probabilistic baseline estimation via Gaussian process. 2015 IEEE Power Energy Soc. General Meeting (IEEE, Piscataway, NJ), 1–5.Google Scholar
- (1996) Evaluating evolutionary algorithms. Artificial Intelligence 85(1–2):245–276.Google Scholar
- (2003) Introduction to Applied Nonlinear Dynamical Systems and Chaos, vol. 4.Google Scholar
- (1978) Model Building in Mathematical Programming (John Wiley & Sons, New York).Google Scholar
- (2023) Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges. Reliability Engrg. System Safety 230:108900.Google Scholar
- (2021) B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425:109913.Google Scholar
- (2020) Physics-informed generative adversarial networks for stochastic differential equations. SIAM J. Sci. Comput. 42(1):A292–A317.Google Scholar
- (1998) Concrete compressive strength. UCI machine learning repository. https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength.Google Scholar
- (2022) On linear optimization over Wasserstein balls. Math. Programming 195(1):1107–1122.Google Scholar
- (2021) Data quality management in the internet of things. Sensors 21(17):5834.Google Scholar
- (2020) Physics-informed multi-LSTM networks for metamodeling of nonlinear structures. Comput. Methods Appl. Mech. Engrg. 369:113226.Google Scholar
- (2024) Scalable neural network verification with branch-and-bound inferred cutting planes. Proc. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2021) Machine learning for metal additive manufacturing: Predicting temperature and melt pool fluid dynamics using physics-informed neural networks. Comput. Mech. 67:619–635.Google Scholar

