Variance Regularization in Sequential Bayesian Optimization

Published Online:https://doi.org/10.1287/moor.2019.1019

References

  • [1] Afèche P, Ata B (2013) Bayesian dynamic pricing in queuing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.LinkGoogle Scholar
  • [2] Ash RB (1972) Real Analysis and Probability (Academic Press, New York).Google Scholar
  • [3] Ban GY, El Karoui N, Lim AEB (2018) Machine learning and portfolio optimization. Management Sci. 64(3):1136–1154.LinkGoogle Scholar
  • [4] Banjević D, Kim MJ (2019) Thompson sampling for stochastic control: The continuous parameter case. IEEE Trans. Automatic Control 64(10):4137–4152.CrossrefGoogle Scholar
  • [5] Bensoussan A, Keppo J, Sethi S (2009) Optimal consumption and portfolio decisions with partially observed real prices. Math. Finance 19(2):215–236.CrossrefGoogle Scholar
  • [6] Bertsekas DP (1999) Nonlinear Programming (Athena Scientific, Nashua, NH).Google Scholar
  • [7] Bertsekas DP (2012) Dynamic Programming and Optimal Control Volume II: Approximate Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
  • [8] Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
  • [9] Boyd S, Vandenberghe S (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [10] Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1–3):33–57.CrossrefGoogle Scholar
  • [11] Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynamics Control 27(1):87–108.CrossrefGoogle Scholar
  • [12] Dai M, Huang S, Keppo J (2019) Opaque bank assets and optimal equity capital. J. Econom. Dynamics Control 100:369–394.Google Scholar
  • [13] Dayal S (1977) A converse of Taylor’s theorem for functions on Banach spaces. Proc. Amer. Math. Soc. 65(2):265–273.Google Scholar
  • [14] Dayanik S, Gürler Ü (2002) An adaptive Bayesian replacement policy with minimal repair. Oper. Res. 50(3):552–558.LinkGoogle Scholar
  • [15] Duchi JC, Namkoong H (2019) Variance-based regularization with convex objectives. J. Machine Learn. Res. 20(68):1–55.Google Scholar
  • [16] Ghosal S, Van der Vaart AW (2007) Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35(1):192–223.CrossrefGoogle Scholar
  • [17] Gotoh J, Kim MJ, Lim AEB (2018) Robust empirical optimization is almost the same as mean-variance optimization. Oper. Res. Lett. 46(4):448–452.CrossrefGoogle Scholar
  • [18] Gotoh J, Kim MJ, Lim AEB (2017) Calibration of distributionally robust empirical optimization models. Preprint submitted November 17, https://arxiv.org/abs/1711.06565.Google Scholar
  • [19] Gotoh J, Shinozaki K, Takeda A (2013) Robust portfolio techniques for mitigating the fragility of CVaR minimization and generalization to coherent risk measures. Quant. Finance 13(10):1621–1635.CrossrefGoogle Scholar
  • [20] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • [21] Hernández-Lerma O (1989) Adaptive Markov Control Processes (Springer-Verlag, New York).CrossrefGoogle Scholar
  • [22] Jain A, Rudi N, Wang T (2015) Demand estimation and ordering under censoring: Stock-out timing is (almost) all you need. Oper. Res. 63(1):134–150.LinkGoogle Scholar
  • [23] Keppo J, Moscarini G, Smith L (2008) The demand for more information: More heat than light. J. Econom. Theory 138(1):21–50.CrossrefGoogle Scholar
  • [24] Keppo J, Shumway T, Weagley D (2019) Can individual investors time bubbles? Working paper, National University of Singapore, Singapore.Google Scholar
  • [25] Keppo J, Smith L, Davydov D (2008) Optimal electoral timing: Exercise wisely and you may live longer. Rev. Econom. Stud. 75(2):597–628.CrossrefGoogle Scholar
  • [26] Keskin B (2014) Optimal dynamic pricing with demand model uncertainty: A squared-coefficient-of-variation rule for learning and earning. Working paper, Duke University, Durham, NC.Google Scholar
  • [27] Keskin B, Birge J (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Google Scholar
  • [28] Kim MJ (2016) Robust control of partially observable failing systems. Oper. Res. 64(4):999–1014.LinkGoogle Scholar
  • [29] Kim MJ (2017) Thompson sampling for stochastic control: The finite parameter case. IEEE Trans. Automatic Control 62(12):6415–6422.CrossrefGoogle Scholar
  • [30] Kim MJ, Lim AEB (2016) Robust multi-armed bandit problems. Management Sci. 62(1):264–285.AbstractGoogle Scholar
  • [31] Kim MJ, Lim AEB (2019) Approximating the Gittins index for Bayesian bandits. Working paper, University of British Columbia, Vancouver.Google Scholar
  • [32] Kim MJ, Makis V (2013) Joint optimization of sampling and control of partially observable failing systems. Oper. Res. 61(3):777–790.LinkGoogle Scholar
  • [33] Lam H (2016) Robust sensitivity analysis for stochastic systems. Math. Oper. Res. 41(4):1248–1275.LinkGoogle Scholar
  • [34] Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. J. Discrete Event Systems 13(1/2):79–110.CrossrefGoogle Scholar
  • [35] Rieder U (1975) Bayesian dynamic programming. Adv. Appl. Probab. 7(2):330–348.CrossrefGoogle Scholar
  • [36] Rockafellar RT, Uryasev S (2013) The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surv. Oper. Res. Management Sci. 18(S1–S2):33–53.CrossrefGoogle Scholar
  • [37] Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • [38] Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.LinkGoogle Scholar
  • [39] Shen X, Wasserman L (2001) Rates of convergence of posterior distributions. Ann. Statist. 29(3):687–714.CrossrefGoogle Scholar
  • [40] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.CrossrefGoogle Scholar
  • [41] Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J. Royal Statist. Soc. B 58(1):267–288.CrossrefGoogle Scholar
  • [42] Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control 42(5):674–690.CrossrefGoogle Scholar
  • [43] Walker S (2004) New approaches to Bayesian consistency. Ann. Statist. 32(5):2028–2043.CrossrefGoogle Scholar
  • [44] Whittle P (2000) Probability via Expectation (Springer-Verlag, New York).CrossrefGoogle Scholar
  • [45] Xu H, Caramanis C, Mannor S (2010) Robust regression and Lasso. IEEE Trans. Inform. Theory 56(7):3561–3574.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.