Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

Kousha Etessami
Kousha Etessami
https://orcid.org/0000-0001-5700-3462
School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom;
Search for more papers by this author
,
Alistair Stewart
Corresponding Author
Alistair Stewart
Department of Computer Science, University of Southern California, Los Angeles, California 90089;
Search for more papers by this author
,
Mihalis Yannakakis
Mihalis Yannakakis
Department of Computer Science, Columbia University, New York, New York 10027
Search for more papers by this author

Kousha Etessami

https://orcid.org/0000-0001-5700-3462

School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, United Kingdom;

Search for more papers by this author

Alistair Stewart

Corresponding Author

Alistair Stewart

Department of Computer Science, University of Southern California, Los Angeles, California 90089;

Search for more papers by this author

Mihalis Yannakakis

Department of Computer Science, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:5 Dec 2019https://doi.org/10.1287/moor.2018.0970

Abstract

We show that one can compute the least nonnegative solution (also known as the least fixed point) for a system of probabilistic min (max) polynomial equations, to any desired accuracy $ɛ$ > 0 in time polynomial in both the encoding size of the system and in log(1/ $ɛ$ ). These are Bellman optimality equations for important classes of infinite-state Markov decision processes (MDPs), including branching MDPs (BMDPs), which generalize classic multitype branching stochastic processes. We thus obtain the first polynomial time algorithm for computing, to any desired precision, optimal (maximum and minimum) extinction probabilities for BMDPs. Our algorithms are based on a novel generalization of Newton’s method, which employs linear programming in each iteration. We also provide polynomial-time (P-time) algorithms for computing an $ɛ$ -optimal policy for both maximizing and minimizing extinction probabilities in a BMDP, whereas we note a hardness result for computing an exact optimal policy. Furthermore, improving on prior results, we provide more efficient P-time algorithms for qualitative analysis of BMDPs, that is, for determining whether the maximum or minimum extinction probability is 1, and, if so, computing a policy that achieves this. We also observe some complexity consequences of our results for branching simple stochastic games, which generalize BMDPs.

cover image Mathematics of Operations Research

Volume 45, Issue 1

February 2020

Pages 1-401, C2

Article Information

Supplemental Material

Metrics

Information

Received:January 22, 2016
Accepted:July 22, 2018
Published Online:December 05, 2019

Cite as

Kousha Etessami, Alistair Stewart, Mihalis Yannakakis (2019) Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations. Mathematics of Operations Research 45(1):34-62.

https://doi.org/10.1287/moor.2018.0970

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations

Abstract

Volume 45, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News