Episodic Bayesian Optimal Control with Unknown Randomness Distributions

Alexander Shapiro
Alexander Shapiro
[email protected]
https://orcid.org/0000-0002-4776-0053
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Search for more papers by this author
,
Enlu Zhou
Corresponding Author
Enlu Zhou
[email protected]
https://orcid.org/0000-0001-5399-6508
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Search for more papers by this author
,
Yifan Lin
Yifan Lin
[email protected]
https://orcid.org/0000-0002-6967-8237
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Search for more papers by this author
,
Yuhao Wang
Yuhao Wang
[email protected]
https://orcid.org/0000-0003-2943-434X
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332
Search for more papers by this author

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

Search for more papers by this author

Enlu Zhou

Corresponding Author

Enlu Zhou

[email protected]

https://orcid.org/0000-0001-5399-6508

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

Search for more papers by this author

Yifan Lin

[email protected]

https://orcid.org/0000-0002-6967-8237

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

Search for more papers by this author

Yuhao Wang

[email protected]

https://orcid.org/0000-0003-2943-434X

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332

Search for more papers by this author

Published Online:23 Jul 2025https://doi.org/10.1287/opre.2023.0446

References

Abbasi-Yadkori Y, Szepesvári C (2015) Bayesian optimal control of smoothly parameterized systems. UAI’15: Proc. 31st Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 2–11.Google Scholar
Abeille M, Lazaric A (2018) Improved regret bounds for Thompson sampling in linear quadratic control problems. Internat. Conf. Machine Learn. (PMLR, New York), 1–9.Google Scholar
Basar T, Bernhard P (2008) H∞-Optimal Control and Related Minimax Design Problems – A Dynamic Game Approach (Birkhäuser, Boston).Google Scholar
Bertsekas D, Shreve S (1978) Stochastic Optimal Control, the Discrete Time Case (Academic Press, New York).Google Scholar
Bielecki TR, Chen T, Cialenco I, Cousin A, Jeanblanc M (2019) Adaptive robust control under model uncertainty. SIAM J. Control Optim. 57(2):925–946.Crossref, Google Scholar
Blanchet J, Shapiro A (2023) Statistical limit theorems in distributionally robust optimization. WSC’23: Proc. Winter Simulation Conf. (IEEE Press, Piscataway, NJ), 31–45.Google Scholar
Doob JL (1948) Application of the theory of martingales. Le Calcul Des Probabilites et Ses Applications (Centre National de la Recherche Scientifique, Paris), 23–27. [In French.]Google Scholar
Duff MO (2002) Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes (University of Massachusetts Amherst, Amherst, MA).Google Scholar
Fernholz LT (1983) Von Mises Calculus for Statistical Functionals, Lecture Notes in Statistics, vol. 19 (Springer-Verlag, New York).Crossref, Google Scholar
Gilboa I, Schmeidler D (1989) Maxmin expected utility with non-unique prior. J. Math. Econom. 18(2):141–153.Crossref, Google Scholar
González-Trejo JI, Hernández-Lerma O, Hoyos-Reyes LF (2002) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.Crossref, Google Scholar
Hansen LP, Sargent G, Turmuhambetova G, Williams N (2006) Robust control and model misspecification. J. Econom. Theory 128(1):45–90.Crossref, Google Scholar
Kumar PR, Varaiya P (2015) Bayesian adaptive control. Stochastic Systems: Estimation, Identification, and Adaptive Control (Society for Industrial and Applied Mathematics, Philadelphia), 231–258.Crossref, Google Scholar
Lan G, Shapiro A (2024) Numerical methods for convex multistage stochastic optimization. Foundations Trends Optim. 6(2):63–144.Crossref, Google Scholar
Lim AEB, Shanthikumar GJ, Shen ZJM (2006) Model uncertainty, robust optimization and learning. Models, Methods, and Applications for Innovative Decision Making, Tutorials in Operations Research (INFORMS, Cantonsville, MD), 66–94.Link, Google Scholar
Lin Y, Ren Y, Zhou E (2022) Bayesian risk Markov decision processes. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. NIPS’22: Proc. 36th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 17430–17442.Google Scholar
Liu T, Lin Y, Zhou E (2024) Bayesian stochastic gradient descent for stochastic optimization with streaming input data. SIAM J. Optim. 34(1):389–418.Crossref, Google Scholar
Osband I, Van Roy B (2014) Near-optimal reinforcement learning in factored MDPs. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (MIT Press, Cambridge, MA), 604–612.Google Scholar
Osband I, Van Roy B (2017) Why is posterior sampling better than optimism for reinforcement learning? Internat. Conf. Machine Learning (PMLR), 2701–2710.Google Scholar
Osband I, Russo D, Van Roy B (2013) (More) efficient reinforcement learning via posterior sampling. NIPS’13: Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (Curran Associates, Red Hook, NY), 3003–3011.Google Scholar
Pereira M, Pinto L (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1–3):359–375.Crossref, Google Scholar
Rieder U (1975) Bayesian dynamic programming. Adv. Appl. Probab. 7(2):330–348.Crossref, Google Scholar
Schwartz L (1965) On Bayes procedures. Z Wahrscheinlichkeitstheorie Verw. Gebiete 4:10–26.Crossref, Google Scholar
Shapiro A (2012) Minimax and risk averse multistage stochastic programming. Eur. J. Oper. Res. 219(3):719–726.Crossref, Google Scholar
Shapiro A, Cheng Y (2021) Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett. 49(5):676–681.Crossref, Google Scholar
Shapiro A, Dentcheva D, Ruszczyński A (2021) Lectures on Stochastic Programming: Modeling and Theory, 3rd ed. (Society for Industrial and Applied Mathematics, Philadelphia).Crossref, Google Scholar
Shapiro A, Zhou E, Lin Y (2023) Bayesian distributionally robust optimization. SIAM J. Optim. 33(2):1279–1304.Crossref, Google Scholar
Sîrbu M (2014) A note on the strong formulation of stochastic control problems with model uncertainty. Electronic Comm. Probab. 19(81):1–10.Google Scholar
Strens MJA (2000) A Bayesian framework for reinforcement learning. ICML’00: Proc. 17th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 943–950.Google Scholar
Theocharous G, Wen Z, Abbasi-Yadkori Y, Vlassis N (2017) Posterior sampling for large scale reinforcement learning. Preprint, submitted November 21, https://arxiv.org/abs/1711.07979.Google Scholar
Tzortzis I, Charalambous CD, Charalambous T (2019) Infinite horizon average cost dynamic programming subject to total variation distance ambiguity. SIAM J. Control Optim. 57(4):2843–2872.Crossref, Google Scholar
van der Vaart A (1998) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Van Parys BPG, Kuhn D, Goulart PJ, Morari M (2016) Distributionally robust control of constrained stochastic systems. IEEE Trans. Automatic Control 61(2):430–442.Google Scholar
Wu D, Zhu H, Zhou E (2018) A Bayesian risk approach to data-driven stochastic optimization: Formulations and asymptotics. SIAM J. Optim. 28(2):1588–1612.Crossref, Google Scholar
Yang I (2018) Wasserstein distributionally robust stochastic control: A data-driven approach. IEEE Trans. Automatic Control 66(8):3863–3870.Crossref, Google Scholar
Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar

Volume 74, Issue 3

May-June 2026

Pages v-x, 1153-1728, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:August 15, 2023
Accepted:June 03, 2025
Published Online:July 23, 2025

Cite as

Alexander Shapiro, Enlu Zhou, Yifan Lin, Yuhao Wang (2025) Episodic Bayesian Optimal Control with Unknown Randomness Distributions. Operations Research 74(3):1437-1455.

https://doi.org/10.1287/opre.2023.0446

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Episodic Bayesian Optimal Control with Unknown Randomness Distributions

References

Volume 74, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News