Stochastic Zeroth-Order Functional Constrained Optimization: Oracle Complexity and Applications
Published Online:30 Nov 2022https://doi.org/10.1287/ijoo.2022.0085
References
- (2017) Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Red Hook, NY), 1836–1846.Google Scholar
- (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Kalai AT, Mohri M, eds. 23rd Conf. Learn. Theory (Omnipress, Madison, WI), 28–40.Google Scholar
- (2018) Efficient solution of quadratically constrained quadratic subproblems within the mesh adaptive direct search algorithm. Eur. J. Oper. Res. 268(1):13–24.Google Scholar
- (2019) Bayesian Optimization and Data Science (Springer, Cham, Switzerland).Google Scholar
- (2019) ADMMBO: Bayesian optimization with unknown constraints using ADMM. J. Machine Learn. Res. 20(123):1–26.Google Scholar
- (2018) Mesh-based Nelder–Mead algorithm for inequality constrained optimization. Comput. Optim. Appl. 71(2):331–352.Google Scholar
- (2004) A pattern search filter method for nonlinear programming without derivatives. SIAM J. Optim. 14(4):980–1010.Google Scholar
- (2006) Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17(1):188–217.Google Scholar
- (2009) A progressive barrier for derivative-free nonlinear programming. SIAM J. Optim. 20(1):445–472.Google Scholar
- (2017) Derivative-Free and Blackbox Optimization (Springer, Cham, Switzerland).Google Scholar
- (2015) Linear equalities in blackbox optimization. Comput. Optim. Appl. 61(1):1–23.Google Scholar
- (2014) NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints. Preprint, submitted March 8, https://doi.org/10.48550/arXiv.1403.1931.Google Scholar
- (2020) Gaussian process optimization with failures: Classification and convergence proof. J. Global Optim. 78(3):483–506.Google Scholar
- (2020) BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Red Hook, NY), 21524–21538.Google Scholar
- (2018) Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3459–3468.Google Scholar
- (2022) Zeroth-order nonconvex stochastic optimization: Handling constraints, high-dimensionality and saddle-points. Foundations Comput. Math. 22(1):35–76.Google Scholar
- (2022) Stochastic multilevel composition optimization algorithms with level-independent convergence rates. SIAM J. Optim. 32(2):519–544.Google Scholar
- (2017) First-Order Methods in Optimization (SIAM, Philadelphia).Google Scholar
- (2012) Smoothing and first order methods: A unified framework. SIAM J. Optim. 22(2):557–580.Google Scholar
- (2021) Global convergence rate analysis of a generic line search algorithm with noise. SIAM J. Optim. 31(2):1489–1518.Google Scholar
- (2012) Random search for hyper-parameter optimization. J. Machine Learn. Res. 13(2):281–305.Google Scholar
- (1954) Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4):737–744.Google Scholar
- (2022) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming, ePub ahead of print January 21, https://doi.org/10.1007/s10107-021-01742-y.Google Scholar
- (2013) Algorithms for Minimization Without Derivatives (Dover Publications, Mineola, NY).Google Scholar
- (2013) Inexact restoration method for derivative-free optimization with smooth constraints. SIAM J. Optim. 23(2):1189–1213.Google Scholar
- (2006) Grid restrained Nelder-Mead algorithm. Comput. Optim. Appl. 34(3):359–375.Google Scholar
- (2017) Stan: A probabilistic programming language. J. Statist. Software 76(1):1–32.Google Scholar
- (2014) Stochastic gradient Hamiltonian Monte Carlo. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn. (PMLR), 1683–1691.Google Scholar
- (2020) Provably robust blackbox optimization for reinforcement learning. Kaelbling LP, Kragic D, Sugiura K, eds. Proc. Conf. Robot Learn., vol. 100 (PMLR), 683–696.Google Scholar
- (2013) Use of quadratic models with mesh-adaptive direct search for constrained black box optimization. Optim. Methods Software 28(1):139–158.Google Scholar
- (2009) Introduction to Derivative-Free Optimization, vol. 8 (SIAM, Philadelphia).Google Scholar
- (2003) Relaxations and randomized methods for nonconvex QCQPs. EE392o Class Notes, Stanford University, Stanford, CA. https://web.stanford.edu/class/ee392o/relaxations.pdf.Google Scholar
- (2003) Accelerated randomized stochastic optimization. Ann. Statist. 31(4):1260–1281.Google Scholar
- (2017) The proximal point method revisited. Preprint, submitted December 17, https://doi.org/10.48550/arXiv.1712.06038.Google Scholar
- (2017) UCI Machine Learning Repository. Accessed August 1, 2022, http://archive.ics.uci.edu/ml.Google Scholar
- (1987) Hybrid Monte Carlo. Phys. Lett. B 195(2):216–222.Google Scholar
- (2015) Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inform. Theory 61(5):2788–2806.Google Scholar
- (2022) Constrained stochastic blackbox optimization using a progressive barrier and probabilistic estimates. Math. Programming, ePub ahead of print March 30, https://doi.org/10.1007/s10107-022-01787-7.Google Scholar
- (2017) An inexact restoration derivative-free filter method for nonlinear programming. Comput. Appl. Math. 36(1):693–718.Google Scholar
- (2019) Neural architecture search: A survey. J. Machine Learn. Res. 20(1):1997–2017.Google Scholar
- (2021) Scalable constrained Bayesian optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR), 730–738.Google Scholar
- (2014) A line-search based derivative-free approach for nonsmooth constrained optimization. SIAM J. Optim. 24(3):959–992.Google Scholar
- (2018) A tutorial on Bayesian optimization. Preprint, submitted July 8, https://doi.org/10.48550/arXiv.1807.02811.Google Scholar
- (2020) Robotic table tennis with model-free reinforcement learning. Preprint, submitted March 31, https://arxiv.org/abs/2003.14398.Google Scholar
- (2014) Bayesian optimization with inequality constraints. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn. (PMLR), 937–945.Google Scholar
- (2014) Bayesian optimization with unknown constraints. Zhang N, Tian J, eds. Proc. 30th Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 250–259.Google Scholar
- (2016) A general framework for constrained Bayesian optimization using information-based search. J. Machine Learn. Res. 17(160):1–53.Google Scholar
- (1992) A single series from the Gibbs sampler provides a false sense of security. Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds. Bayesian Statistics 4 (Oxford University Press, Oxford, UK), 625–632.Google Scholar
- (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Development Staff Report 148, Federal Reserve Bank of Minneapolis, Minneapolis.Google Scholar
- (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Google Scholar
- (2020) A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1):960–979.Google Scholar
- (2011) Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 73(2):123–214.Google Scholar
- (2017) Google Vizier: A service for black-box optimization. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1487–1495.Google Scholar
- (2016) Deep Learning, vol. 1 (MIT Press, Cambridge, MA).Google Scholar
- (2016) laGP: Large-scale spatial modeling via local approximate Gaussian processes in R. J. Statist. Software 72(1):1–46.Google Scholar
- (2016) Modeling an augmented Lagrangian for blackbox constrained optimization. Technometrics 58(1):1–11.Google Scholar
- (2014) A merit function approach for direct search. SIAM J. Optim. 24(4):1980–1998.Google Scholar
- (2020) Bayesian optimization for adaptive experimental design: A review. IEEE Access 8:13937–13948.Google Scholar
- (2021) A primal-dual algorithm with line search for general convex-concave saddle point problems. SIAM J. Optim. 31(2):1299–1329.Google Scholar
- (2018) Hyperparameter optimization: A spectral approach. 6th Internat. Conf. Learn. Representations (Vancouver, Canada), https://openreview.net/forum?id=H1zriGeCZ.Google Scholar
- (2015) Predictive entropy search for Bayesian optimization with unknown constraints. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (PMLR), 1699–1707.Google Scholar
- (2014) The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Machine Learn. Res. 15(1):1593–1623.Google Scholar
- (1961) Direct search solution of numerical and statistical problems. J. ACM 8(2):212–229.Google Scholar
- (2012) Query complexity of derivative-free optimization. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2672–2680.Google Scholar
- (2020) High-dimensional Bayesian optimization via nested Riemannian manifolds. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates, Red Hook, NY), 20939–20951.Google Scholar
- (2020) Bayesian optimization meets Riemannian manifolds in robot learning. Kaelbling LP, Kragic D, Sugiura K, eds. Proc. Conf. Robot Learn., vol. 100 (PMLR), 233–246.Google Scholar
- (1998) Markov chain Monte Carlo in practice: A roundtable discussion. Amer. Statist. 52(2):93–100.Google Scholar
- (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.Google Scholar
- (2003) Optimization by direct search: New perspectives on some classical and modern methods. SIAM Rev. 45(3):385–482.Google Scholar
- (2009) Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, Toronto.Google Scholar
- (2017) Lookahead Bayesian optimization with inequality constraints. Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Red Hook, NY), 1890–1900.Google Scholar
- (2019) Derivative-free optimization methods. Acta Numer. 28(May):287–404.Google Scholar
- (2011) A survey on wireless body area networks. Wireless Networks 17(1):1–18.Google Scholar
- (2010) MNIST handwritten digit database. Accessed August 1, 2022, http://yann.lecun.com/exdb/mnist/.Google Scholar
- (2015) A taxonomy of constraints in simulation-based optimization. Preprint, submitted May 28, https://doi.org/10.48550/arXiv.1505.07881.Google Scholar
- (2015) Molecular Dynamics: With Deterministic and Stochastic Numerical Methods (Springer, Cham, Switzerland).Google Scholar
- (2001) Some practical guidelines for effective sample size determination. Amer. Statist. 55(3):187–193.Google Scholar
- (2019) Constrained Bayesian optimization with noisy experiments. Bayesian Anal. 14(2):495–519.Google Scholar
- (2002) A globally convergent augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM J. Optim. 12(4):1075–1089.Google Scholar
- (2022) Stochastic zeroth-order Riemannian derivative estimation and optimization. Math. Oper. Res., epub ahead of print September 8, https://doi.org/10.1287/moor.2022.1302.Google Scholar
- (2017) Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Machine Learn. Res. 18(1):6765–6816.Google Scholar
- (2020) A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine 37(5):43–54.Google Scholar
- (2009) A derivative-free algorithm for inequality constrained nonlinear programming via smoothing of an ℓ∞ penalty function. SIAM J. Optim. 20(1):1–29.Google Scholar
- (2010) Sequential penalty derivative-free methods for nonlinear constrained optimization. SIAM J. Optim. 20(5):2614–2635.Google Scholar
- (2012) Adaptive MCMC with Bayesian optimization. Teh YW, Titterington M, eds. Proc. 13th Conf. Artificial Intelligence Statist. (PMLR), 751–760.Google Scholar
- (2018) Simple random search provides a competitive approach to reinforcement learning. Preprint, submitted March 19, https://doi.org/10.48550/arXiv.1803.07055.Google Scholar
- (1994) Application of Bayesian approach to numerical methods of global and stochastic optimization. J. Global Optim. 4(4):347–365.Google Scholar
- (2012) Bayesian Approach to Global Optimization: Theory and Applications (Springer Science & Business Media, New York).Google Scholar
- (2007) A companion for the Kiefer–Wolfowitz–Blum stochastic approximation algorithm. Ann. Statist. 35(4):1749–1772.Google Scholar
- (2017) GOSAC: Global optimization with surrogate approximation of constraints. J. Global Optim. 69(1):117–136.Google Scholar
- (2011) MCMC using Hamiltonian dynamics. Brooks S, Gelman A, Jones G, Meng X-L, eds. Handbook of Markov Chain Monte Carlo (CRC Press, Boca Raton, FL), 113–162.Google Scholar
- (1965) A simplex method for function minimization. Comput. J. 7(4):308–313.Google Scholar
- (1983) Problem Complexity and Method Efficiency in Optimization (John Wiley & Sons, Chichester, UK).Google Scholar
- (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.Google Scholar
- (2014) Proximal algorithms. Foundations Trends Optim. 1(3):127–239.Google Scholar
- (2019) Pytorch: An imperative style, high-performance deep learning library. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EA, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Red Hook, NY), 8024–8035.Google Scholar
- (2010) PyMC: Bayesian stochastic modelling in Python. J. Statist. Software 35(4):1–81.Google Scholar
- (2015) The emerging internet of things marketplace from an industrial perspective: A survey. IEEE Trans. Emerging Topics Comput. 3(4):585–598.Google Scholar
- (2016) Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian. Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Red Hook, NY), 1435–1443.Google Scholar
- (2006) CODA: Convergence diagnosis and output analysis for MCMC. R News 6(1):7–11.Google Scholar
- (2020) The statistical filter approach to constrained optimization. Technometrics 62(3):303–312.Google Scholar
- (1964) An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 7(2):155–162.Google Scholar
- (2015) Convex Analysis (Princeton University Press, Princeton, NJ).Google Scholar
- (2020) Learning to learn by zeroth-order oracle. Internat. Conf. Learn. Representations.Google Scholar
- (2021) A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization. SIAM J. Control Optim. 59(3):2301–2320.Google Scholar
- (2019) Toward gradient free and projection free stochastic optimization. Chaudhuri K, Sugiyama M, eds. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR), 3468–3477.Google Scholar
- (2017) Evolution strategies as a scalable alternative to reinforcement learning. Preprint, submitted March 10, https://doi.org/10.48550/arXiv.1703.03864.Google Scholar
- (2015) Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104(1):148–175.Google Scholar
- (2013) On the complexity of bandit and derivative-free stochastic convex optimization. Shalev-Shwartz S, Steinwart I, eds. 26th Annual Conf. Learn. Theory Proc. (PMLR), 3–24.Google Scholar
- (2012) Practical Bayesian optimization of machine learning algorithms. Pereira F, Burges CJ, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates, Red Hook, NY), 2951–2959.Google Scholar
- (1987) A stochastic approximation technique for generating maximum likelihood parameter estimates. Proc. 1987 Amer. Control Conf. (IEEE, Piscataway, NJ), 1161–1167.Google Scholar
- (2005) Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1962) Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics 4(4):441–461.Google Scholar
- (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Le Cam LM, Neyman J, Scott EL, eds. Proc. Sixth Berkeley Sympos. Math. Statist. Probab., vol. 2: Probab. Theory (Regents of the University of California–Berkeley, Berkeley, CA), 583–602.Google Scholar
- (2013) On the importance of initialization and momentum in deep learning. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learning (PMLR), 1139–1147.Google Scholar
- (2016) A sequential quadratic programming algorithm for equality-constrained optimization without derivatives. Optim. Lett. 10(2):383–399.Google Scholar
- (2019) Safe convex learning under uncertain constraints. Proc. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR), 2106–2114.Google Scholar
- (2013) Adaptive Hamiltonian and Riemann manifold Monte Carlo. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 89 (PMLR), 1462–1470.Google Scholar
- (2008) Distributed segmentation and classification of human actions using a wearable motion sensor network. 2008 IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition Workshops (IEEE, Piscataway, NJ), 1–8.Google Scholar

