Beyond Discounted Returns: Robust Markov Decision Processes with Average and Blackwell Optimality
Published Online:4 Mar 2026https://doi.org/10.1287/opre.2023.0694
References
- (2003) Spectral theorem for convex monotone homogeneous maps, and ergodic control. Nonlinear Anal. (Oxf.): Theory Methods Appl. 52(2):637–679.Crossref, Google Scholar
- (2016) Uniqueness of the fixed point of nonexpansive semidifferentiable maps. Trans. Amer. Math. Soc. 368(2):1271–1320.Crossref, Google Scholar
- (2019) The operator approach to entropy games. Theory Comput. Syst. 63(5):1089–1130.Crossref, Google Scholar
- (2009) The complexity of solving stochastic games on graphs. Internat. Sympos. Algorithms Comput. (Springer, Berlin, Heidelberg), 112–121. Crossref, Google Scholar
- (1995) On the generation of Markov decision processes. J. Oper. Res. Soc. 46(3):354–361.Crossref, Google Scholar
- (2011) Markov Decision Processes with Applications to Finance (Springer Science & Business Media, Berlin, Heidelberg).Crossref, Google Scholar
- (2001) Infinite-horizon policy-gradient estimation. J. Artificial Intelligence Res. 15:319–350. Crossref, Google Scholar
- (2021) Fast algorithms for l∞ constrained s-rectangular robust MDPs. Adv. Neural Inform. Processing Systems 34:25982–25992.Google Scholar
- (2013) Artificial intelligence framework for simulating clinical decision-making: A Markov decision process approach. Artificial Intelligence Med. 57(1):9–19.Crossref, Google Scholar
- (1976) The asymptotic theory of stochastic games. Math. Oper. Res. 1(3):197–208.Link, Google Scholar
- (1987) An expected average reward criterion. Stochastic Processes Appl. 26:123–140.Crossref, Google Scholar
- (1968) The big match. Ann. Math. Statist. 39(1):159–163.Crossref, Google Scholar
- (2015) Definable zero-sum stochastic games. Math. Oper. Res. 40(1):171–191.Link, Google Scholar
- (2016) OpenAI gym. Preprint, submitted June 5, https://arxiv.org/abs/1606.01540.Google Scholar
- (2022) Robust imitation learning against variations in environment dynamics. Internat. Conf. Machine Learn. (PMLR, New York), 2828–2852.Google Scholar
- (1992) The complexity of stochastic games. Inform. Comput. 96(2):203–224.Crossref, Google Scholar
- (2015) Markov Decision Process (MDP) toolbox for python. https://github.com/sawcordwell/pymdptoolbox.Google Scholar
- (2000) An Introduction to o-Minimal Geometry (Istituti editoriali e poligrafici internazionali Pisa, Pisa).Google Scholar
- (2010) Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1):203–213.Link, Google Scholar
- (2016) Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Networks Learn. Systems 28(3):653–664.Crossref, Google Scholar
- (2020) Average-reward model-free reinforcement learning: A systematic review and literature mapping. Preprint, submitted October 18, https://arxiv.org/abs/2010.08920.Google Scholar
- (1957) Recursive games. Contributions Theory Games 3(39):47–78.Google Scholar
- (2012) Handbook of Markov Decision Processes: Methods and Applications, vol. 40 (Springer Science & Business Media, New York).Google Scholar
- (1957) Stochastic games with zero stop probabilities. Contributions Theory Games 3:179–187.Google Scholar
- (1997) Bounded parameter Markov decision processes. Eur. Conf. Planning (Springer, Berlin, Heidelberg), 234–246.Google Scholar
- (2018) Data uncertainty in Markov chains: Application to cost-effectiveness analyses of medical innovations. Oper. Res. 66(3):697–715.Link, Google Scholar
- (2023) Robust Markov decision processes: Beyond rectangularity. Math. Oper. Res. 48(1):203–226.Link, Google Scholar
- (2021) First-order methods for Wasserstein distributionally robust MDP. Internat. Conf. Machine Learn (PMLR, New York), 2010–2019.Google Scholar
- (2024) Reducing Blackwell and average optimality to discounted MDPS via the Blackwell discount factor. Adv. Neural Inform. Processing Systems 36.Google Scholar
- (2025) On the convex formulations of robust Markov decision processes. Math. Oper. Res. 50(3):1681–1706.Link, Google Scholar
- (2023) Robustness of proactive intensive care unit transfer policies. Oper. Res. 71(5):1653–1688.Link, Google Scholar
- (2013) Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM 60(1):1–16.Crossref, Google Scholar
- (2021) Partial policy iteration for L1-robust Markov decision processes. J. Machine Learn. Res. 22(1):12612–12657.Google Scholar
- (2022) Robust phi-divergence MDPs. Adv. Neural Inform. Processing Systems 35:32680–32693.Crossref, Google Scholar
- (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
- (1974) Repeated games with absorbing states. Ann. Statist. 724–738. Google Scholar
- (2023) Policy gradient for rectangular robust Markov decision processes. Adv. Neural Inform. Processing Systems 36:59477–59501.Google Scholar
- (2015) Advances in zero-sum dynamic games. Handbook of Game Theory with Economic Applications, vol. 4 (Elsevier, Amsterdam), 27–93.Google Scholar
- (2007) Robust, risk-sensitive, and data-driven control of Markov decision processes. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
- (2003) An algorithm to identify and compute average optimal policies in multichain Markov decision processes. Math. Oper. Res. 28(3):553–586.Link, Google Scholar
- (2023) Policy gradient algorithms for robust MDPs with non-rectangular uncertainty sets. Preprint, submitted May 30, https://arxiv.org/abs/2305.19004.Google Scholar
- (2022) First-order policy optimization for robust Markov decision process. Preprint, submitted September 21, https://arxiv.org/abs/2209.10579.Google Scholar
- (1969) Stochastic games with perfect information and time average payoff. SIAM Rev. 11(4):604–607.Crossref, Google Scholar
- (2016) Robust MDPs with k-rectangular uncertainty. Math. Oper. Res. 41(4):1484–1509.Link, Google Scholar
- (2015) Repeated Games, vol. 55 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
- (2003) Stochastic Games and Applications, vol. 570 (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2005) Robust control of Markov decision processes with uncertain transition probabilities. Oper. Res. 53(5):780–798.Link, Google Scholar
- (2022) Sample complexity of robust reinforcement learning with a generative model. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 9582–9602.Google Scholar
- (1997) Application of stochastic dynamic programming to optimal fire management of a spatially structured threatened species. Proc. Internat. Congress Modelling Simulation, MODSIM, 813–817.Google Scholar
- (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2022) Robust Markov decision processes with data-driven, distance-based ambiguity sets. SIAM J. Optim. 32(2):989–1017.Crossref, Google Scholar
- (2024) A family of-rectangular robust MDPs: Relative conservativeness, asymptotic analyses, and finite-sample properties. SIAM J. Optim. 34(2):1540–1568.Crossref, Google Scholar
- (2019) A tutorial on zero-sum stochastic games. Preprint, submitted May 16, https://arxiv.org/abs/1905.06577.Google Scholar
- (1973) Markov decision processes with uncertain transition probabilities. Oper. Res. 21(3):728–740.Link, Google Scholar
- (1953) Stochastic games. Proc. Natl. Acad. Sci. USA 39(10):1095–1100.Crossref, Google Scholar
- (2002) A First Course on Zero-Sum Repeated Games, vol. 37 (Springer Science & Business Media, New York).Google Scholar
- (2017) Markov decision processes for screening and treatment of chronic diseases. Markov Decision Processes in Practice (Springer, Cham, Switzerland), 189–222. Crossref, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2007) Bounded parameter Markov decision processes with average reward criterion. Internat. Conf. Comput. Learn. Theory (Springer, Berlin, Heidelberg), 263–277.Google Scholar
- (1998) O-minimal structures and real analytic geometry. Current Developments Math. 1998(1):105–152.Crossref, Google Scholar
- (2021) Robust inverse reinforcement learning under transition dynamics mismatch. Adv. Neural Inform. Processing Systems 34:25917–25931.Google Scholar
- (2021) Topics in Optimal Transportation, vol. 58 (American Mathematical Society, Providence, RI).Google Scholar
- (2023a) Policy gradient in robust MDPs with global convergence guarantee. ICML, Proceedings of Machine Learning Research, vol. 202 (PMLR, New York), 35763–35797.Google Scholar
- (2023b) Robust average-reward Markov decision processes. Proc. AAAI Conf. Artificial Intelligence 37:15215–15223.Google Scholar
- (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
- (2010) Distributionally robust Markov decision processes. Adv. Neural Inform. Processing Systems 23.Google Scholar
- (2017) Robust Markov decision processes for medical treatment decisions. Optim. Online.Google Scholar
- (1996) The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158(1–2):343–359.Crossref, Google Scholar

