The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

Ian Post
Ian Post
[email protected]
University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
Search for more papers by this author
,
Yinyu Ye
Yinyu Ye
[email protected]
Stanford University, Stanford, California 94305
Search for more papers by this author

Ian Post

[email protected]

University of Waterloo, Waterloo, Ontario N2L 3G1 Canada

Search for more papers by this author

Yinyu Ye

[email protected]

Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:10 Feb 2015https://doi.org/10.1287/moor.2014.0699

References

Bellman RE (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
Bertsekas DP (1996) Dynamic Programming and Optimal Control (Athena Scientific, Nashua, NH).Google Scholar
de Ghellinck GT (1960) Les problèmes de décisions sequentielles. Cahiers du Centre d’Etudes de Recherche Opérationelle 2:161–179.Google Scholar
D’Epenoux F (1960) Sur un problème de production et de stockage dans l’aléatoire. Revue Française de Recherche Opérationelle 14:3–16.Google Scholar
Fearnley J (2010) Exponential lower bounds for policy iteration. Automata, Languages and Programming, Lecture Notes in Computer Science, Vol. 6199 (Springer, Berlin), 551–562.Crossref, Google Scholar
Friedmann O (2009) An exponential lower bound for the parity game strategy improvement algorithm as we know it. Pitts A, ed. Proc. 24th ACM/IEEE Sympos. Logic in Computer Science, LICS ’09 (IEEE, Piscataway, NJ), 145–156.Crossref, Google Scholar
Friedmann O (2011) A subexponential lower bound for Zadeh’s pivoting rule for solving linear programs and games. Integer Programming and Combinatoral Optimization, Lecture Notes in Computer Science, Vol. 6655 (Springer, Berlin), 192–206.Crossref, Google Scholar
Friedmann O, Hansen TD, Zwick U (2011) Subexponential lower bounds for randomized pivoting rules for the simplex algorithm. Fortnow L, Vadhan SP, eds. Proc. 43rd ACM Sympos. Theory of Comput., STOC ’11 (ACM, New York), 283–292.Crossref, Google Scholar
Hansen TD, Kaplan H, Zwick U (2014) Dantzig’s pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles. Chekuri C, ed. Proc. 25th ACM-SIAM Sympos. Discrete Algorithms, SODA ’14 (SIAM, Philadelphia), 847–860.Crossref, Google Scholar
Hansen TD, Miltersen PB, Zwick U (2013) Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM 60(1):1:1–1:16.Crossref, Google Scholar
Hansen TD, Zwick U (2010) Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. Cheong O, Chwa K-Y, Park K, eds. Algorithms and Computation, Lecture Notes in Computer Science, Vol. 6506 (Springer, Berlin), 415–426.Crossref, Google Scholar
Hochbaum DS, Naor J(S) (1994) Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23(6):1179–1192.Crossref, Google Scholar
Howard R (1960) Dynamic Programming and Markov Decision Processes (MIT, Cambridge, MA).Google Scholar
Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. Besnard P, Hanks S, eds. Proc. 11th Uncertainty in Artificial Intelligence, UAI ’95 (Morgan Kaufmann, San Francisco), 394–402.Google Scholar
Madani O (2002) On policy iteration as a Newton’s method and polynomial policy iteration algorithms. Dechter R, Sutton RS, eds. Proc. 18th Nat. Conf. Artificial Intelligence, AAAI ’02 (AAAI Press, Palo Alto, CA), 273–278.Google Scholar
Madani O, Thorup M, Zwick U (2010) Discounted deterministic Markov decision processes and discounted all-pairs shortest paths. ACM Trans. Algorithms 6(2):33:1–33:25.Crossref, Google Scholar
Manne AS (1960) Linear programming and sequential decisions. Management Sci. 6(3):259–267.Link, Google Scholar
Mansour Y, Singh S (1999) On the complexity of policy iteration. Laskey KB, Prade H, eds. Proc. 15th Uncertainty in Artificial Intelligence, UAI ’99 (Morgan Kaufmann, San Francisco), 401–408.Google Scholar
Melekopoglou M, Condon A (1994) On the complexity of the policy improvement algorithm for Markov decision processes. ORSA J. Comput. 6(2):188–192.Link, Google Scholar
Papadimitriou C, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
Post I, Ye Y (2013) The simplex method is strongly polynomial for deterministic Markov decision processes. Khanna S, ed. Proc. 24th ACM-SIAM Sympos. Discrete Algorithms, SODA ’13 (SIAM, Philadelphia), 1465–1473.Crossref, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Crossref, Google Scholar
Ye Y (2005) A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30(3):733–749.Link, Google Scholar
Ye Y (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar

cover image Mathematics of Operations Research

Volume 40, Issue 4

November 2015

Pages 797-1088

Article Information

Metrics

Information

Received:May 21, 2013
Published Online:February 10, 2015

Cite as

Ian Post, Yinyu Ye (2015) The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes. Mathematics of Operations Research 40(4):859-868.

https://doi.org/10.1287/moor.2014.0699

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Simplex Method is Strongly Polynomial for Deterministic Markov Decision Processes

References

Volume 40, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News