Wasserstein Distributionally Robust Shallow Convex Neural Networks

Published Online:https://doi.org/10.1287/ijoo.2024.0048

References

  • Ackley D (2012) A Connectionist Machine for Genetic Hillclimbing, vol. 28 (Springer Science & Business Media, New York).Google Scholar
  • Adorio EP (2005) MVF—Multivariate test functions library in C for unconstrained global optimization. Technical report, Department of Mathematics, U.P. Diliman, Manila, Philippines.Google Scholar
  • Albarghouthi A (2021) Introduction to neural network verification. Foundations Trends Programming Languages 7(1–2):1–157.Google Scholar
  • Amari S-i (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5(4–5):185–196.Google Scholar
  • Amasyali K, El-Gohary NM (2018) A review of data-driven building energy consumption prediction studies. Renewable Sustainable Energy Rev. 81(1):1192–1205.Google Scholar
  • Audet C, Le Digabel S, Montplaisir VR, Tribes C (2022) Algorithm 1027: NOMAD version 4: Nonlinear optimization with the MADS algorithm. ACM Trans. Math. Software 48(3):1–22.Google Scholar
  • Bai Y, Gautam T, Sojoudi S (2023b) Efficient global optimization of two-layer ReLU networks: Quadratic-time algorithms and adversarial training. SIAM J. Math. Data Sci. 5(2):446–474.Google Scholar
  • Bai X, He G, Jiang Y, Obloj J (2023a) Wasserstein distributional robustness of neural networks. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Inc., Red Hook, NY), 26322–26347.Google Scholar
  • Baronti L, Castellani M (2024) A Python benchmark functions framework for numerical optimisation problems. Technical report, School of Computer Science, University of Birmingham, UK.Google Scholar
  • Bartlett PL, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3:463–482.Google Scholar
  • Baydin AG, Pearlmutter BA, Radul AA, Siskind JM (2017) Automatic differentiation in machine learning: A survey. J. Machine Learn. Res. 18(1):5595–5637.Google Scholar
  • Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization, vol. 24, 1–9.Google Scholar
  • Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD (2015) Hyperopt: A Python library for model selection and hyperparameter optimization. Computational Sci. Discovery 8(1):014008.Google Scholar
  • Bishop CM, Bishop H (2023) Deep Learning: Foundations and Concepts (Springer Nature, New York).Google Scholar
  • Bishop CM, Nasrabadi NM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
  • Bonneel N, Rabin J, Peyré G, Pfister H (2015) Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vision 51:22–45.Google Scholar
  • Bonnotte N (2013) Unidimensional and evolution methods for optimal transportation. Unpublished PhD thesis, Université Paris Sud-Paris XI, Scuola normale superiore, Pise, Italie.Google Scholar
  • Boyd S, Park J (2014) Subgradient methods. Notes for EE364b, Stanford University, Spring 2014. https://web.stanford.edu/class/ee364b/lectures/subgrad_method_notes.pdf.Google Scholar
  • Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Google Scholar
  • Chatzivasileiadis S, Venzke A, Stiasny J, Misyris G (2022) Machine learning in power systems: Is it time to trust it? IEEE Power Energy Magazine 20(3):32–41.Google Scholar
  • Chen R, Paschalidis IC (2018) A robust learning approach for regression models based on distributionally robust optimization. J. Machine Learn. Res. 19(13):1–48.Google Scholar
  • Chen R, Paschalidis IC (2020) Distributionally robust learning. Foundations Trends Optim. 4(1–2):1–243.Google Scholar
  • Cuomo S, Di Cola VS, Giampaolo F, Rozza G, Raissi M, Piccialli F (2022) Scientific machine learning through physics-informed neural networks: Where we are and what’s next. J. Sci. Comput. 92(3):88.Google Scholar
  • Dempe S (2002) Foundations of Bilevel Programming (Springer Science & Business Media, New York).Google Scholar
  • Diamond S, Boyd S (2016) CVXPY: A Python-embedded modeling language for convex optimization. J. Machine Learn. Res. 17(83):1–5.Google Scholar
  • Dong B, Cao C, Lee SE (2005) Applying support vector machines to predict building energy consumption in tropical region. Energy Buildings 37(5):545–553.Google Scholar
  • Gao H, Sun L, Wang JX (2021) PhyGeoNet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 428:110079.Google Scholar
  • Giudici P, Raffinetti E (2023) SAFE artificial intelligence in finance. Finance Res. Lett. 56:104088.Google Scholar
  • Goulart PJ, Chen Y (2024) Clarabel: An interior-point solver for conic programs with quadratic objectives. Technical report, Department of Engineering Science, University of Oxford, Oxford, UK.Google Scholar
  • Householder AS (1941) A theory of steady-state activity in nerve-fiber networks: I. Definitions and preliminary lemmas. Bull. Math. Biophysics 3:63–69.Google Scholar
  • Huang X, Kwiatkowska M, Wang S, Wu M (2017) Safety verification of deep neural networks. Comput. Aided Verification 29th Internat. Conf. Proc. (Springer, New York), 3–29.Google Scholar
  • Huang Y, Zhang H, Shi Y, Kolter JZ, Anandkumar A (2021) Training certifiably robust neural networks with efficient local Lipschitz bounds. Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 22745–22757.Google Scholar
  • Ingber L (1993) Simulated annealing: Practice versus theory. Math. Comput. Model. 18(11):29–57.Google Scholar
  • Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physics-informed machine learning. Nature Rev Phys. 3(6):422–440.Google Scholar
  • Kawaguchi K, Sun Q (2021) A recipe for global convergence guarantee in deep neural networks. Proc. AAAI Conf. Artificial Intelligence, vol. 35, 8074–8082.Google Scholar
  • Keane A (1994) Experiences with optimizers in structural design. Proc. Conf. Adaptive Comput. Engrg. Design Control, vol. 94, 14–27.Google Scholar
  • Kelly M, Longjohn R, Nottingham K (2023) UCI machine learning repository. Accessed August 2024, http://archive.ics.uci.edu/ml.Google Scholar
  • Kolouri S, Nadjahi K, Simsekli U, Badeau R, Rohde G (2019) Generalized sliced Wasserstein distances. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 1–12.Google Scholar
  • Kreuzberger D, Kühl N, Hirschl S (2023) Machine learning operations (MLOps): Overview, definition, and architecture. IEEE Access 11:31866–31879.Google Scholar
  • Kuelbs D, Lall S, Pilanci M (2024) Adversarial training of two-layer polynomial and ReLU activation networks via convex optimization. Preprint, submitted May 22, https://arxiv.org/abs/2405.14033.Google Scholar
  • Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. Operations Research & Management Science in the Age of Analytics (INFORMS, Catonsville, MY), 130–166.LinkGoogle Scholar
  • Levy D, Carmon Y, Duchi JC, Sidford A (2020) Large-scale methods for distributionally robust optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 8847–8860.Google Scholar
  • Liu J, Shen Z, Cui P, Zhou L, Kuang K, Li B (2022) Distributionally robust learning with stable adversarial training. IEEE Trans. Knowledge Data Engrg. 35(11):11288–11300.Google Scholar
  • Lu L, Pestourie R, Yao W, Wang Z, Verdugo F, Johnson SG (2021) Physics-informed neural networks with hard constraints for inverse design. SIAM J. Sci. Comput. 43(6):B1105–B1132.Google Scholar
  • Massana J, Pous C, Burgas L, Melendez J, Colomer J (2015) Short-term load forecasting in a non-residential building contrasting models and attributes. Energy Buildings 92:322–330.Google Scholar
  • Miria Feng ZF, Pilanci M (2024) CRONOS: Enhancing deep learning with scalable GPU accelerated convex neural networks. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Adv. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY), 102973–103004.Google Scholar
  • Mishkin A, Sahiner A, Pilanci M (2022) Fast convex optimization for two-layer ReLU networks: Equivalent model classes and cone decompositions. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., vol. 162 (Proceedings of Machine Learning Research, New York), 15770–15816.Google Scholar
  • Mohajerin Esfahani P, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1):115–166.Google Scholar
  • Moon J, Park J, Hwang E, Jun S (2018) Forecasting power consumption for higher educational institutions based on machine learning. J. Supercomputing 74:3778–3800.Google Scholar
  • Newsham GR, Birt BJ (2010) Building-level occupancy data to improve ARIMA-based electricity use forecasts. Proc. 2nd ACM Workshop Embedded Sensing Systems Energy-Efficiency Building (Association for Computing Machinery, New York), 13–18.Google Scholar
  • Pallage J, Lesage-Landry A (2025) Sliced-Wasserstein distance-based data selection. Preprint, submitted April 17, https://arxiv.org/abs/2504.12918.Google Scholar
  • Pallage J, Scherrer B, Naccache S, Bélanger C, Lesage-Landry A (2024) Sliced-Wasserstein-based anomaly detection and open dataset for localized critical peak rebates. NeurIPS 2024 Workshop Tackling Climate Change Machine Learn.Google Scholar
  • Panaretos VM, Zemel Y (2019) Statistical aspects of Wasserstein distances. Annual Rev. Statist. Appl. 6(1):405–431.Google Scholar
  • Panaretos VM, Zemel Y (2020) An Invitation to Statistics in Wasserstein Space (Springer Nature, New York).Google Scholar
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inform. Processing Systems, vol. 32, 8024–8035.Google Scholar
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, et al. (2011) Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12:2825–2830.Google Scholar
  • Picheny V, Wagner T, Ginsbourger D (2013) A benchmark of kriging-based infill criteria for noisy optimization. Structural Multidisciplinary Optim. 48:607–626.Google Scholar
  • Pilanci M, Ergen T (2020) Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. Internat. Conf. Machine Learn. (PMLR, New York), 7695–7705.Google Scholar
  • Pudjianto D, Ramsay C, Strbac G (2007) Virtual power plant and system integration of distributed energy resources. IET Renewable Power Generation 1(1):10–16.Google Scholar
  • Qi M, Cao Y, Shen ZJ (2022) Distributionally robust conditional quantile prediction with fixed design. Management Sci. 68(3):1639–1658.LinkGoogle Scholar
  • Rasheed K, Qayyum A, Ghaly M, Al-Fuqaha A, Razi A, Qadir J (2022) Explainable, trustworthy, and ethical machine learning for healthcare: A survey. Computers Biol. Medicine 149:106043.Google Scholar
  • Sagawa S, Koh PW, Hashimoto TB, Liang P (2020) Distributionally robust neural networks. Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Shafieezadeh-Abadeh S, Kuhn D, Esfahani PM (2019) Regularization via mass transportation. J. Machine Learn. Res. 20(103):1–68.Google Scholar
  • Siano P (2014) Demand response and smart grids—A survey. Renewable Sustainable Energy Rev. 30:461–478.Google Scholar
  • Sohl-Dickstein J (2024) The boundary of neural network trainability is fractal. Preprint, submitted February 9, https://arxiv.org/abs/2402.06184.Google Scholar
  • Stiasny J, Chevalier S, Nellikkath R, Sævarsson B, Chatzivasileiadis S (2022) Closing the loop: A framework for trustworthy machine learning in power systems. Proc. 11th Bulk Power Systems Dynam. Control Sympos., 1–21.Google Scholar
  • Tsanas A, Xifara A (2012) Energy efficiency. UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/242/energy+efficiency.Google Scholar
  • van der Vaart AW (2000) Asymptotic Statistics, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Google Scholar
  • Venzke A, Qu G, Low S, Chatzivasileiadis S (2020) Learning optimal power flow: Worst-case guarantees for neural networks. 2020 IEEE Internat. Conf. Comm. Control Comput. Tech. Smart Grids, vol. 11, 1–7.Google Scholar
  • Wang Y, Lacotte J, Pilanci M (2021) The hidden convex optimization landscape of regularized two-layer ReLU networks: An exact characterization of optimal solutions. Internat. Conf. Learn. Representations, 1–26.Google Scholar
  • Weng Y, Rajagopal R (2015) Probabilistic baseline estimation via Gaussian process. 2015 IEEE Power Energy Soc. General Meeting (IEEE, Piscataway, NJ), 1–5.Google Scholar
  • Whitley D, Rana S, Dzubera J, Mathias KE (1996) Evaluating evolutionary algorithms. Artificial Intelligence 85(1–2):245–276.Google Scholar
  • Wiggins S (2003) Introduction to Applied Nonlinear Dynamical Systems and Chaos, vol. 4.Google Scholar
  • Williams HP (1978) Model Building in Mathematical Programming (John Wiley & Sons, New York).Google Scholar
  • Xu Y, Kohtz S, Boakye J, Gardoni P, Wang P (2023) Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges. Reliability Engrg. System Safety 230:108900.Google Scholar
  • Yang L, Meng X, Karniadakis GE (2021) B-PINNs: Bayesian physics-informed neural networks for forward and inverse PDE problems with noisy data. J. Comput. Phys. 425:109913.Google Scholar
  • Yang L, Zhang D, Karniadakis GE (2020) Physics-informed generative adversarial networks for stochastic differential equations. SIAM J. Sci. Comput. 42(1):A292–A317.Google Scholar
  • Yeh IC (1998) Concrete compressive strength. UCI machine learning repository. https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength.Google Scholar
  • Yue MC, Kuhn D, Wiesemann W (2022) On linear optimization over Wasserstein balls. Math. Programming 195(1):1107–1122.Google Scholar
  • Zhang L, Jeong D, Lee S (2021) Data quality management in the internet of things. Sensors 21(17):5834.Google Scholar
  • Zhang R, Liu Y, Sun H (2020) Physics-informed multi-LSTM networks for metamodeling of nonlinear structures. Comput. Methods Appl. Mech. Engrg. 369:113226.Google Scholar
  • Zhou D, Brix C, Hanasusanto GA, Zhang H (2024) Scalable neural network verification with branch-and-bound inferred cutting planes. Proc. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Zhu Q, Liu Z, Yan J (2021) Machine learning for metal additive manufacturing: Predicting temperature and melt pool fluid dynamics using physics-informed neural networks. Comput. Mech. 67:619–635.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.