Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

Mengdi Wang
Corresponding Author
Mengdi Wang
http://orcid.org/0000-0002-2101-9507
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Mengdi Wang

Corresponding Author

Mengdi Wang

http://orcid.org/0000-0002-2101-9507

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:16 Oct 2019https://doi.org/10.1287/moor.2019.1000

Abstract

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value–policy duality, the algorithm adaptively samples state–action–state transitions and makes exponentiated primal–dual updates. We show that it finds an ɛ-optimal policy using nearly linear runtime in the worst case for a fixed value of the discount factor. When the Markov decision process is ergodic and specified in some special data formats, for fixed values of certain ergodicity parameters, the algorithm finds an ɛ-optimal policy using sample size and time linear in the total number of state–action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

cover image Mathematics of Operations Research

Volume 45, Issue 2

May 2020

Pages 403-795, C2

Article Information

Metrics

Information

Received:May 28, 2018
Accepted:December 23, 2018
Published Online:October 16, 2019

Cite as

Mengdi Wang (2019) Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time. Mathematics of Operations Research 45(2):517-546.

https://doi.org/10.1287/moor.2019.1000

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

Abstract

Volume 45, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News