Episodic Bayesian Optimal Control with Unknown Randomness Distributions

Published Online:https://doi.org/10.1287/opre.2023.0446

References

  • Abbasi-Yadkori Y, Szepesvári C (2015) Bayesian optimal control of smoothly parameterized systems. UAI’15: Proc. 31st Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 2–11.Google Scholar
  • Abeille M, Lazaric A (2018) Improved regret bounds for Thompson sampling in linear quadratic control problems. Internat. Conf. Machine Learn. (PMLR, New York), 1–9.Google Scholar
  • Basar T, Bernhard P (2008) H∞-Optimal Control and Related Minimax Design Problems – A Dynamic Game Approach (Birkhäuser, Boston).Google Scholar
  • Bertsekas D, Shreve S (1978) Stochastic Optimal Control, the Discrete Time Case (Academic Press, New York).Google Scholar
  • Bielecki TR, Chen T, Cialenco I, Cousin A, Jeanblanc M (2019) Adaptive robust control under model uncertainty. SIAM J. Control Optim. 57(2):925–946.CrossrefGoogle Scholar
  • Blanchet J, Shapiro A (2023) Statistical limit theorems in distributionally robust optimization. WSC’23: Proc. Winter Simulation Conf. (IEEE Press, Piscataway, NJ), 31–45.Google Scholar
  • Doob JL (1948) Application of the theory of martingales. Le Calcul Des Probabilites et Ses Applications (Centre National de la Recherche Scientifique, Paris), 23–27. [In French.]Google Scholar
  • Duff MO (2002) Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes (University of Massachusetts Amherst, Amherst, MA).Google Scholar
  • Fernholz LT (1983) Von Mises Calculus for Statistical Functionals, Lecture Notes in Statistics, vol. 19 (Springer-Verlag, New York).CrossrefGoogle Scholar
  • Gilboa I, Schmeidler D (1989) Maxmin expected utility with non-unique prior. J. Math. Econom. 18(2):141–153.CrossrefGoogle Scholar
  • González-Trejo JI, Hernández-Lerma O, Hoyos-Reyes LF (2002) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.CrossrefGoogle Scholar
  • Hansen LP, Sargent G, Turmuhambetova G, Williams N (2006) Robust control and model misspecification. J. Econom. Theory 128(1):45–90.CrossrefGoogle Scholar
  • Kumar PR, Varaiya P (2015) Bayesian adaptive control. Stochastic Systems: Estimation, Identification, and Adaptive Control (Society for Industrial and Applied Mathematics, Philadelphia), 231–258.CrossrefGoogle Scholar
  • Lan G, Shapiro A (2024) Numerical methods for convex multistage stochastic optimization. Foundations Trends Optim. 6(2):63–144.CrossrefGoogle Scholar
  • Lim AEB, Shanthikumar GJ, Shen ZJM (2006) Model uncertainty, robust optimization and learning. Models, Methods, and Applications for Innovative Decision Making, Tutorials in Operations Research (INFORMS, Cantonsville, MD), 66–94.LinkGoogle Scholar
  • Lin Y, Ren Y, Zhou E (2022) Bayesian risk Markov decision processes. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 17430–17442.Google Scholar
  • Liu T, Lin Y, Zhou E (2024) Bayesian stochastic gradient descent for stochastic optimization with streaming input data. SIAM J. Optim. 34(1):389–418.CrossrefGoogle Scholar
  • Osband I, Van Roy B (2014) Near-optimal reinforcement learning in factored MDPs. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (MIT Press, Cambridge, MA), 604–612.Google Scholar
  • Osband I, Van Roy B (2017) Why is posterior sampling better than optimism for reinforcement learning? Internat. Conf. Machine Learning (PMLR), 2701–2710.Google Scholar
  • Osband I, Russo D, Van Roy B (2013) (More) efficient reinforcement learning via posterior sampling. NIPS’13: Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (Curran Associates, Red Hook, NY), 3003–3011.Google Scholar
  • Pereira M, Pinto L (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1–3):359–375.CrossrefGoogle Scholar
  • Rieder U (1975) Bayesian dynamic programming. Adv. Appl. Probab. 7(2):330–348.CrossrefGoogle Scholar
  • Schwartz L (1965) On Bayes procedures. Z Wahrscheinlichkeitstheorie Verw. Gebiete 4:10–26.CrossrefGoogle Scholar
  • Shapiro A (2012) Minimax and risk averse multistage stochastic programming. Eur. J. Oper. Res. 219(3):719–726.CrossrefGoogle Scholar
  • Shapiro A, Cheng Y (2021) Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett. 49(5):676–681.CrossrefGoogle Scholar
  • Shapiro A, Dentcheva D, Ruszczyński A (2021) Lectures on Stochastic Programming: Modeling and Theory, 3rd ed. (Society for Industrial and Applied Mathematics, Philadelphia).CrossrefGoogle Scholar
  • Shapiro A, Zhou E, Lin Y (2023) Bayesian distributionally robust optimization. SIAM J. Optim. 33(2):1279–1304.CrossrefGoogle Scholar
  • Sîrbu M (2014) A note on the strong formulation of stochastic control problems with model uncertainty. Electronic Comm. Probab. 19(81):1–10.Google Scholar
  • Strens MJA (2000) A Bayesian framework for reinforcement learning. ICML’00: Proc. 17th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 943–950.Google Scholar
  • Theocharous G, Wen Z, Abbasi-Yadkori Y, Vlassis N (2017) Posterior sampling for large scale reinforcement learning. Preprint, submitted November 21, https://arxiv.org/abs/1711.07979.Google Scholar
  • Tzortzis I, Charalambous CD, Charalambous T (2019) Infinite horizon average cost dynamic programming subject to total variation distance ambiguity. SIAM J. Control Optim. 57(4):2843–2872.CrossrefGoogle Scholar
  • van der Vaart A (1998) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Van Parys BPG, Kuhn D, Goulart PJ, Morari M (2016) Distributionally robust control of constrained stochastic systems. IEEE Trans. Automatic Control 61(2):430–442.Google Scholar
  • Wu D, Zhu H, Zhou E (2018) A Bayesian risk approach to data-driven stochastic optimization: Formulations and asymptotics. SIAM J. Optim. 28(2):1588–1612.CrossrefGoogle Scholar
  • Yang I (2018) Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Trans. Automatic Control 66(8):3863–3870.CrossrefGoogle Scholar
  • Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.