Variance Regularization in Sequential Bayesian Optimization

Michael Jong Kim
Corresponding Author
Michael Jong Kim
https://orcid.org/0000-0002-1858-4142
Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
Search for more papers by this author

Michael Jong Kim

Corresponding Author

Michael Jong Kim

https://orcid.org/0000-0002-1858-4142

Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada

Search for more papers by this author

Published Online:14 Apr 2020https://doi.org/10.1287/moor.2019.1019

References

[1] Afèche P, Ata B (2013) Bayesian dynamic pricing in queuing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.Link, Google Scholar
[2] Ash RB (1972) Real Analysis and Probability (Academic Press, New York).Google Scholar
[3] Ban GY, El Karoui N, Lim AEB (2018) Machine learning and portfolio optimization. Management Sci. 64(3):1136–1154.Link, Google Scholar
[4] Banjević D, Kim MJ (2019) Thompson sampling for stochastic control: The continuous parameter case. IEEE Trans. Automatic Control 64(10):4137–4152.Crossref, Google Scholar
[5] Bensoussan A, Keppo J, Sethi S (2009) Optimal consumption and portfolio decisions with partially observed real prices. Math. Finance 19(2):215–236.Crossref, Google Scholar
[6] Bertsekas DP (1999) Nonlinear Programming (Athena Scientific, Nashua, NH).Google Scholar
[7] Bertsekas DP (2012) Dynamic Programming and Optimal Control Volume II: Approximate Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
[8] Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
[9] Boyd S, Vandenberghe S (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[10] Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1–3):33–57.Crossref, Google Scholar
[11] Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynamics Control 27(1):87–108.Crossref, Google Scholar
[12] Dai M, Huang S, Keppo J (2019) Opaque bank assets and optimal equity capital. J. Econom. Dynamics Control 100:369–394.Google Scholar
[13] Dayal S (1977) A converse of Taylor’s theorem for functions on Banach spaces. Proc. Amer. Math. Soc. 65(2):265–273.Google Scholar
[14] Dayanik S, Gürler Ü (2002) An adaptive Bayesian replacement policy with minimal repair. Oper. Res. 50(3):552–558.Link, Google Scholar
[15] Duchi JC, Namkoong H (2019) Variance-based regularization with convex objectives. J. Machine Learn. Res. 20(68):1–55.Google Scholar
[16] Ghosal S, Van der Vaart AW (2007) Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35(1):192–223.Crossref, Google Scholar
[17] Gotoh J, Kim MJ, Lim AEB (2018) Robust empirical optimization is almost the same as mean-variance optimization. Oper. Res. Lett. 46(4):448–452.Crossref, Google Scholar
[18] Gotoh J, Kim MJ, Lim AEB (2017) Calibration of distributionally robust empirical optimization models. Preprint submitted November 17, https://arxiv.org/abs/1711.06565.Google Scholar
[19] Gotoh J, Shinozaki K, Takeda A (2013) Robust portfolio techniques for mitigating the fragility of CVaR minimization and generalization to coherent risk measures. Quant. Finance 13(10):1621–1635.Crossref, Google Scholar
[20] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
[21] Hernández-Lerma O (1989) Adaptive Markov Control Processes (Springer-Verlag, New York).Crossref, Google Scholar
[22] Jain A, Rudi N, Wang T (2015) Demand estimation and ordering under censoring: Stock-out timing is (almost) all you need. Oper. Res. 63(1):134–150.Link, Google Scholar
[23] Keppo J, Moscarini G, Smith L (2008) The demand for more information: More heat than light. J. Econom. Theory 138(1):21–50.Crossref, Google Scholar
[24] Keppo J, Shumway T, Weagley D (2019) Can individual investors time bubbles? Working paper, National University of Singapore, Singapore.Google Scholar
[25] Keppo J, Smith L, Davydov D (2008) Optimal electoral timing: Exercise wisely and you may live longer. Rev. Econom. Stud. 75(2):597–628.Crossref, Google Scholar
[26] Keskin B (2014) Optimal dynamic pricing with demand model uncertainty: A squared-coefficient-of-variation rule for learning and earning. Working paper, Duke University, Durham, NC.Google Scholar
[27] Keskin B, Birge J (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Google Scholar
[28] Kim MJ (2016) Robust control of partially observable failing systems. Oper. Res. 64(4):999–1014.Link, Google Scholar
[29] Kim MJ (2017) Thompson sampling for stochastic control: The finite parameter case. IEEE Trans. Automatic Control 62(12):6415–6422.Crossref, Google Scholar
[30] Kim MJ, Lim AEB (2016) Robust multi-armed bandit problems. Management Sci. 62(1):264–285.Abstract, Google Scholar
[31] Kim MJ, Lim AEB (2019) Approximating the Gittins index for Bayesian bandits. Working paper, University of British Columbia, Vancouver.Google Scholar
[32] Kim MJ, Makis V (2013) Joint optimization of sampling and control of partially observable failing systems. Oper. Res. 61(3):777–790.Link, Google Scholar
[33] Lam H (2016) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.Link, Google Scholar
[34] Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. J. Discrete Event Systems 13(1/2):79–110.Crossref, Google Scholar
[35] Rieder U (1975) Bayesian dynamic programming. Adv. Appl. Probab. 7(2):330–348.Crossref, Google Scholar
[36] Rockafellar RT, Uryasev S (2013) The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surv. Oper. Res. Management Sci. 18(S1–S2):33–53.Crossref, Google Scholar
[37] Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
[38] Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
[39] Shen X, Wasserman L (2001) Rates of convergence of posterior distributions. Ann. Statist. 29(3):687–714.Crossref, Google Scholar
[40] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Crossref, Google Scholar
[41] Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J. Royal Statist. Soc. B 58(1):267–288.Crossref, Google Scholar
[42] Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control 42(5):674–690.Crossref, Google Scholar
[43] Walker S (2004) New approaches to Bayesian consistency. Ann. Statist. 32(5):2028–2043.Crossref, Google Scholar
[44] Whittle P (2000) Probability via Expectation (Springer-Verlag, New York).Crossref, Google Scholar
[45] Xu H, Caramanis C, Mannor S (2010) Robust regression and Lasso. IEEE Trans. Inform. Theory 56(7):3561–3574.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 45, Issue 3

August 2020

Pages 797-1192, C2

Article Information

Metrics

Information

Received:December 08, 2017
Accepted:June 07, 2019
Published Online:April 14, 2020

Cite as

Michael Jong Kim (2020) Variance Regularization in Sequential Bayesian Optimization. Mathematics of Operations Research 45(3):966-992.

https://doi.org/10.1287/moor.2019.1019

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Variance Regularization in Sequential Bayesian Optimization

References

Volume 45, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News