The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

Ian Post
Ian Post
[email protected]
University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
Search for more papers by this author
,
Yinyu Ye
Yinyu Ye
[email protected]
Stanford University, Stanford, California 94305
Search for more papers by this author

Ian Post

[email protected]

University of Waterloo, Waterloo, Ontario N2L 3G1 Canada

Search for more papers by this author

Yinyu Ye

[email protected]

Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:10 Feb 2015https://doi.org/10.1287/moor.2014.0699

Abstract

We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(n³m² log²n) iterations if the discount factor is uniform and O(n⁵m³ log²n) iterations if each action has a distinct discount factor. Previously the simplex method was known to run in polynomial time only for discounted MDPs where the discount was bounded away from 1.

cover image Mathematics of Operations Research

Volume 40, Issue 4

November 2015

Pages 797-1088

Article Information

Metrics

Information

Received:May 21, 2013
Published Online:February 10, 2015

Cite as

Ian Post, Yinyu Ye (2015) The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes. Mathematics of Operations Research 40(4):859-868.

https://doi.org/10.1287/moor.2014.0699

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

Abstract

Volume 40, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News