Upper Bounds for All and Max-Gain Policy Iteration Algorithms on Deterministic MDPs

Ritesh Goenka
Corresponding Author
Ritesh Goenka
[email protected]
https://orcid.org/0000-0002-5004-8112
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author
,
Eashan Gupta
Eashan Gupta
[email protected]
https://orcid.org/0000-0002-9266-8731
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820
Search for more papers by this author
,
Sushil Khyalia
Sushil Khyalia
[email protected]
https://orcid.org/0009-0007-5688-3315
Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author
,
Shivaram Kalyanakrishnan
Shivaram Kalyanakrishnan
[email protected]
https://orcid.org/0009-0006-7707-6056
Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai 400076, India
Search for more papers by this author

Ritesh Goenka

Corresponding Author

Ritesh Goenka

[email protected]

https://orcid.org/0000-0002-5004-8112

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Eashan Gupta

[email protected]

https://orcid.org/0000-0002-9266-8731

Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820

Search for more papers by this author

Sushil Khyalia

[email protected]

https://orcid.org/0009-0007-5688-3315

Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Shivaram Kalyanakrishnan

[email protected]

https://orcid.org/0009-0006-7707-6056

Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai 400076, India

Search for more papers by this author

Published Online:10 Apr 2025https://doi.org/10.1287/moor.2023.0317

References

[1] Ahrens W (1897) Ueber das Gleichungssystem einer Kirchhoff’schen galvanischen Stromverzweigung. Mathematische Annalen 49(2):311–324.Crossref, Google Scholar
[2] AlBdaiwi BF (2018) On the number of cycles in a graph. Math. Slovaca 68(1):1–10.Crossref, Google Scholar
[3] Aldred REL, Thomassen C (1997) On the number of cycles in 3-connected cubic graphs. J. Combin. Theory Ser. B 71(1):79–84.Crossref, Google Scholar
[4] Aldred REL, Thomassen C (2008) On the maximum number of cycles in a planar graph. J. Graph Theory 57(3):255–264.Crossref, Google Scholar
[5] Allender EW (1985) On the number of cycles possible in digraphs with large girth. Discrete Appl. Math. 10(3):211–225.Crossref, Google Scholar
[6] Alt H, Fuchs U, Kriegel K (1999) On the number of simple cycles in planar graphs. Combinatorics Probab. Comput. 8(5):397–405.Crossref, Google Scholar
[7] Arman A, Tsaturian S (2019) The maximum number of cycles in a graph with fixed number of edges. Electronic J. Combinatorics 26(4):P4.42.Crossref, Google Scholar
[8] Arman A, Gunderson DS, Tsaturian S (2016) Triangle-free graphs with the maximum number of cycles. Discrete Math. 339(2):699–711.Crossref, Google Scholar
[9] Ashutosh K, Consul S, Dedhia B, Khirwadkar P, Shah S, Kalyanakrishnan S (2020) Lower bounds for policy iteration on multi-action MDPs. 2020 59th IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1744–1749.Google Scholar
[10] Avis D, Moriyama S (2009) On combinatorial properties of linear program digraphs. Avis D, Bremner D, Deza A, eds. Polyhedral Computation, CRM Proceedings and Lecture Notes, vol. 48 (American Mathematical Society, Providence, RI), 1–13.Crossref, Google Scholar
[11] Bellman R (1957) Dynamic Programming, 1st ed. (Princeton University Press, Princeton, NJ).Google Scholar
[12] Brègman LM (1973) Certain properties of nonnegative matrices and their permanents. Doklady Akademii Nauk SSSR 211(1):27–30.Google Scholar
[13] Buchin K, Knauer C, Kriegel K, Schulz A, Seidel R (2007) On the number of cycles in planar graphs. Lin G, ed. Computing and Combinatorics (Springer, Berlin), 97–107.Crossref, Google Scholar
[14] Dantzig GB (1963) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
[15] de Mier A, Noy M (2012) On the maximum number of cycles in outerplanar and series-parallel graphs. Graphs Combinatorics 28(2):265–275.Crossref, Google Scholar
[16] Delivorias PN, Richter RJ (1994) Maximum path digraphs. Discrete Appl. Math. 50(3):221–237.Crossref, Google Scholar
[17] Derman C (1970) Finite State Markovian Decision Processes, Mathematics in Science and Engineering, vol. 67 (Academic Press, New York).Google Scholar
[18] Dorfman D, Kaplan H, Zwick U (2019) A faster deterministic exponential time algorithm for energy games and mean payoff games. 46th Internat. Colloquium Automata Languages Programming, Leibniz International Proceedings in Informatics, vol. 132 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Wadern, Germany), 114.Google Scholar
[19] Durocher S, Gunderson DS, Li PC, Skala M (2015) Cycle-maximal triangle-free graphs. Discrete Math. 338(2):274–290.Crossref, Google Scholar
[20] Dvořák Z, Morrison N, Noel JA, Norin S, Postle L (2021) Bounding the number of cycles in a graph in terms of its degree sequence. Eur. J. Combinatorics 91:103206.Crossref, Google Scholar
[21] Entringer RC, Slater PJ (1981) On the maximum number of cycles in a graph. Ars Combinatoria 11:289–294.Google Scholar
[22] Fearnley J (2010) Exponential lower bounds for policy iteration. Abramsky S, Gavoille C, Kirchner C, Meyer auf der Heide F, Spirakis PG, eds. Automata, Languages and Programming (Springer, Berlin), 551–562.Crossref, Google Scholar
[23] Feinberg EA, Huang J (2014) The value iteration algorithm is not strongly polynomial for discounted dynamic programming. Oper. Res. Lett. 42(2):130–131.Crossref, Google Scholar
[24] Gerbner D, Keszegh B, Palmer C, Patkós B (2018) On the number of cycles in a graph with restricted cycle lengths. SIAM J. Discrete Math. 32(1):266–279.Crossref, Google Scholar
[25] Golumbic MC, Perl Y (1979) Generalized Fibonacci maximum path graphs. Discrete Math. 28(3):237–245.Crossref, Google Scholar
[26] Guichard DR (1996) The maximum number of cycles in graphs. Congressus Numerantium 121:211–215.Google Scholar
[27] Gupta A, Kalyanakrishnan S (2017) Improved strong worst-case upper bounds for MDP planning. Sierra C, ed. Proc. 26th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence), 1788–1794.Google Scholar
[28] Hansen TD (2012) Worst-case analysis of strategy iteration and the simplex method. Unpublished PhD thesis, Aarhus University, Aarhus, Denmark.Google Scholar
[29] Hansen TD, Zwick U (2010) Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. Cheong O, Chwa KY, Park K, eds. Algorithms and Computation (Springer, Berlin), 415–426.Crossref, Google Scholar
[30] Hansen TD, Kaplan H, Zwick U (2014) Dantzig’s pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 847–860.Google Scholar
[31] Hansen TD, Paterson M, Zwick U (2014) Improved upper bounds for random-edge and random-jump on abstract cubes. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 874–881.Google Scholar
[32] Hollanders R, Delvenne JC, Jungers RM (2012) The complexity of policy iteration is exponential for discounted Markov decision processes. 2012 IEEE 51st IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 5997–6002.Google Scholar
[33] Hollanders R, Gerencsér B, Delvenne JC, Jungers RM (2016) Improved bound on the worst case complexity of policy iteration. Oper. Res. Lett. 44(2):267–272.Crossref, Google Scholar
[34] Howard RA (1960) Dynamic Programming and Markov Processes (MIT Press, Cambridge, MA).Google Scholar
[35] Kalyanakrishnan S, Mall U, Goyal R (2016) Batch-switching policy iteration. Kambhampati S, ed. Proc. 25th Internat. Joint Conf. Artificial Intelligence (AAAI Press / International Joint Conferences on Artificial Intelligence, Palo Alto, CA), 3147–3153.Google Scholar
[36] Kalyanakrishnan S, Misra N, Gopalan A (2016) Randomised procedures for initialising and switching actions in policy iteration. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3145–3151.Google Scholar
[37] Kitahara T, Mizuno S (2013) A bound for the number of different basic solutions generated by the simplex method. Math. Programming 137(1–2):579–586.Crossref, Google Scholar
[38] Knor M (1994) On the number of cycles in k-connected graphs. Acta Math. Universitatis Comenianae 63(2):315–321.Google Scholar
[39] Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. Besnard P, Hanks S, eds. Proc. 11th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 394–402.Google Scholar
[40] Madani O, Thorup M, Zwick U (2010) Discounted deterministic Markov decision processes and discounted all-pairs shortest paths. ACM Trans. Algorithms 6(2):1–25.Crossref, Google Scholar
[41] Mansour Y, Singh S (1999) On the complexity of policy iteration. Laskey KB, Prade H, eds. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 401–408.Google Scholar
[42] Megiddo N (1983) Towards a genuinely polynomial algorithm for linear programming. SIAM J. Comput. 12(2):347–353.Crossref, Google Scholar
[43] Meier J (2008) Groups, Graphs and Trees: An Introduction to the Geometry of Infinite Groups (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[44] Melekopoglou M, Condon A (1994) On the complexity of the policy improvement algorithm for Markov decision processes. ORSA J. Comput. 6(2):188–192.Link, Google Scholar
[45] Morrison N, Roberts A, Scott A (2021) Maximising the number of cycles in graphs with forbidden subgraphs. J. Combin. Theory Ser. B 147:201–237.Crossref, Google Scholar
[46] Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
[47] Perl Y (1987) Digraphs with maximum number of paths and cycles. Networks 17(3):295–305.Crossref, Google Scholar
[48] Post I, Ye Y (2015) The simplex method is strongly polynomial for deterministic Markov decision processes. Math. Oper. Res. 40(4):859–868.Link, Google Scholar
[49] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics (John Wiley & Sons, Inc., New York).Google Scholar
[50] Rautenbach D, Stella I (2005) On the maximum number of cycles in a Hamiltonian graph. Discrete Math. 304(1):101–107.Crossref, Google Scholar
[51] Reid KB (1976) Cycles in the complement of a tree. Discrete Math. 15(2):163–174.Crossref, Google Scholar
[52] Scherrer B (2016) Improved and generalized upper bounds on the complexity of policy iteration. Math. Oper. Res. 41(3):758–774.Link, Google Scholar
[53] Shi Y (1994) The number of cycles in a Hamilton graph. Discrete Math. 133(1):249–257.Crossref, Google Scholar
[54] Soules GW (2005) Permanental bounds for nonnegative matrices via decomposition. Linear Algebra Its Appl. 394:73–89.Crossref, Google Scholar
[55] Szepesvári C (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan & Claypool Publishers, San Rafael, CA).Crossref, Google Scholar
[56] Takács L (1988) On the limit distribution of the number of cycles in a random graph. J. Appl. Probab. 25(A):359–376.Crossref, Google Scholar
[57] Taraviya M, Kalyanakrishnan S (2020) A tighter analysis of randomised policy iteration. Adams RP, Gogate V, eds. Proc. 35th Uncertainty Artificial Intelligence Conf., Proceedings of Machine Learning Research, vol. 115 (PMLR, New York), 519–529.Google Scholar
[58] Volkmann L (1996) Estimations for the number of cycles in a graph. Periodica Math. Hungarica 33(2):153–161.Crossref, Google Scholar
[59] Ye Y (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar
[60] Zhou B (1988) The maximum number of cycles in the complement of a tree. Discrete Math. 69(1):85–94.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 51, Issue 1

February 2026

Pages iv-viii, 1-851

Article Information

Metrics

Information

Received:October 16, 2023
Accepted:February 20, 2025
Published Online:April 10, 2025

Cite as

Ritesh Goenka, Eashan Gupta, Sushil Khyalia, Shivaram Kalyanakrishnan (2025) Upper Bounds for All and Max-Gain Policy Iteration Algorithms on Deterministic MDPs. Mathematics of Operations Research 51(1):806-828.

https://doi.org/10.1287/moor.2023.0317

Keywords

Acknowledgments

The authors thank Pratyush Agarwal and Mulinti Shaik Wajid for carefully reading this manuscript and providing helpful suggestions. The authors also thank the anonymous reviewers and editors for their valuable comments, which helped improve the quality of this paper. Eashan Gupta and Sushil Khyalia contributed equally.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Upper Bounds for All and Max-Gain Policy Iteration Algorithms on Deterministic MDPs

References

Volume 51, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News