Upper Bounds for All and Max-Gain Policy Iteration Algorithms on Deterministic MDPs

Published Online:https://doi.org/10.1287/moor.2023.0317

References

  • [1] Ahrens W (1897) Ueber das Gleichungssystem einer Kirchhoff’schen galvanischen Stromverzweigung. Mathematische Annalen 49(2):311–324.CrossrefGoogle Scholar
  • [2] AlBdaiwi BF (2018) On the number of cycles in a graph. Math. Slovaca 68(1):1–10.CrossrefGoogle Scholar
  • [3] Aldred REL, Thomassen C (1997) On the number of cycles in 3-connected cubic graphs. J. Combin. Theory Ser. B 71(1):79–84.CrossrefGoogle Scholar
  • [4] Aldred REL, Thomassen C (2008) On the maximum number of cycles in a planar graph. J. Graph Theory 57(3):255–264.CrossrefGoogle Scholar
  • [5] Allender EW (1985) On the number of cycles possible in digraphs with large girth. Discrete Appl. Math. 10(3):211–225.CrossrefGoogle Scholar
  • [6] Alt H, Fuchs U, Kriegel K (1999) On the number of simple cycles in planar graphs. Combinatorics Probab. Comput. 8(5):397–405.CrossrefGoogle Scholar
  • [7] Arman A, Tsaturian S (2019) The maximum number of cycles in a graph with fixed number of edges. Electronic J. Combinatorics 26(4):P4.42.CrossrefGoogle Scholar
  • [8] Arman A, Gunderson DS, Tsaturian S (2016) Triangle-free graphs with the maximum number of cycles. Discrete Math. 339(2):699–711.CrossrefGoogle Scholar
  • [9] Ashutosh K, Consul S, Dedhia B, Khirwadkar P, Shah S, Kalyanakrishnan S (2020) Lower bounds for policy iteration on multi-action MDPs. 2020 59th IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1744–1749.Google Scholar
  • [10] Avis D, Moriyama S (2009) On combinatorial properties of linear program digraphs. Avis D, Bremner D, Deza A, eds. Polyhedral Computation, CRM Proceedings and Lecture Notes, vol. 48 (American Mathematical Society, Providence, RI), 1–13.CrossrefGoogle Scholar
  • [11] Bellman R (1957) Dynamic Programming, 1st ed. (Princeton University Press, Princeton, NJ).Google Scholar
  • [12] Brègman LM (1973) Certain properties of nonnegative matrices and their permanents. Doklady Akademii Nauk SSSR 211(1):27–30.Google Scholar
  • [13] Buchin K, Knauer C, Kriegel K, Schulz A, Seidel R (2007) On the number of cycles in planar graphs. Lin G, ed. Computing and Combinatorics (Springer, Berlin), 97–107.CrossrefGoogle Scholar
  • [14] Dantzig GB (1963) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [15] de Mier A, Noy M (2012) On the maximum number of cycles in outerplanar and series-parallel graphs. Graphs Combinatorics 28(2):265–275.CrossrefGoogle Scholar
  • [16] Delivorias PN, Richter RJ (1994) Maximum path digraphs. Discrete Appl. Math. 50(3):221–237.CrossrefGoogle Scholar
  • [17] Derman C (1970) Finite State Markovian Decision Processes, Mathematics in Science and Engineering, vol. 67 (Academic Press, New York).Google Scholar
  • [18] Dorfman D, Kaplan H, Zwick U (2019) A faster deterministic exponential time algorithm for energy games and mean payoff games. 46th Internat. Colloquium Automata Languages Programming, Leibniz International Proceedings in Informatics, vol. 132 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Wadern, Germany), 114.Google Scholar
  • [19] Durocher S, Gunderson DS, Li PC, Skala M (2015) Cycle-maximal triangle-free graphs. Discrete Math. 338(2):274–290.CrossrefGoogle Scholar
  • [20] Dvořák Z, Morrison N, Noel JA, Norin S, Postle L (2021) Bounding the number of cycles in a graph in terms of its degree sequence. Eur. J. Combinatorics 91:103206.CrossrefGoogle Scholar
  • [21] Entringer RC, Slater PJ (1981) On the maximum number of cycles in a graph. Ars Combinatoria 11:289–294.Google Scholar
  • [22] Fearnley J (2010) Exponential lower bounds for policy iteration. Abramsky S, Gavoille C, Kirchner C, Meyer auf der Heide F, Spirakis PG, eds. Automata, Languages and Programming (Springer, Berlin), 551–562.CrossrefGoogle Scholar
  • [23] Feinberg EA, Huang J (2014) The value iteration algorithm is not strongly polynomial for discounted dynamic programming. Oper. Res. Lett. 42(2):130–131.CrossrefGoogle Scholar
  • [24] Gerbner D, Keszegh B, Palmer C, Patkós B (2018) On the number of cycles in a graph with restricted cycle lengths. SIAM J. Discrete Math. 32(1):266–279.CrossrefGoogle Scholar
  • [25] Golumbic MC, Perl Y (1979) Generalized Fibonacci maximum path graphs. Discrete Math. 28(3):237–245.CrossrefGoogle Scholar
  • [26] Guichard DR (1996) The maximum number of cycles in graphs. Congressus Numerantium 121:211–215.Google Scholar
  • [27] Gupta A, Kalyanakrishnan S (2017) Improved strong worst-case upper bounds for MDP planning. Sierra C, ed. Proc. 26th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence), 1788–1794.Google Scholar
  • [28] Hansen TD (2012) Worst-case analysis of strategy iteration and the simplex method. Unpublished PhD thesis, Aarhus University, Aarhus, Denmark.Google Scholar
  • [29] Hansen TD, Zwick U (2010) Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. Cheong O, Chwa KY, Park K, eds. Algorithms and Computation (Springer, Berlin), 415–426.CrossrefGoogle Scholar
  • [30] Hansen TD, Kaplan H, Zwick U (2014) Dantzig’s pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 847–860.Google Scholar
  • [31] Hansen TD, Paterson M, Zwick U (2014) Improved upper bounds for random-edge and random-jump on abstract cubes. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 874–881.Google Scholar
  • [32] Hollanders R, Delvenne JC, Jungers RM (2012) The complexity of policy iteration is exponential for discounted Markov decision processes. 2012 IEEE 51st IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 5997–6002.Google Scholar
  • [33] Hollanders R, Gerencsér B, Delvenne JC, Jungers RM (2016) Improved bound on the worst case complexity of policy iteration. Oper. Res. Lett. 44(2):267–272.CrossrefGoogle Scholar
  • [34] Howard RA (1960) Dynamic Programming and Markov Processes (MIT Press, Cambridge, MA).Google Scholar
  • [35] Kalyanakrishnan S, Mall U, Goyal R (2016) Batch-switching policy iteration. Kambhampati S, ed. Proc. 25th Internat. Joint Conf. Artificial Intelligence (AAAI Press / International Joint Conferences on Artificial Intelligence, Palo Alto, CA), 3147–3153.Google Scholar
  • [36] Kalyanakrishnan S, Misra N, Gopalan A (2016) Randomised procedures for initialising and switching actions in policy iteration. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3145–3151.Google Scholar
  • [37] Kitahara T, Mizuno S (2013) A bound for the number of different basic solutions generated by the simplex method. Math. Programming 137(1–2):579–586.CrossrefGoogle Scholar
  • [38] Knor M (1994) On the number of cycles in k-connected graphs. Acta Math. Universitatis Comenianae 63(2):315–321.Google Scholar
  • [39] Littman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. Besnard P, Hanks S, eds. Proc. 11th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 394–402.Google Scholar
  • [40] Madani O, Thorup M, Zwick U (2010) Discounted deterministic Markov decision processes and discounted all-pairs shortest paths. ACM Trans. Algorithms 6(2):1–25.CrossrefGoogle Scholar
  • [41] Mansour Y, Singh S (1999) On the complexity of policy iteration. Laskey KB, Prade H, eds. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 401–408.Google Scholar
  • [42] Megiddo N (1983) Towards a genuinely polynomial algorithm for linear programming. SIAM J. Comput. 12(2):347–353.CrossrefGoogle Scholar
  • [43] Meier J (2008) Groups, Graphs and Trees: An Introduction to the Geometry of Infinite Groups (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [44] Melekopoglou M, Condon A (1994) On the complexity of the policy improvement algorithm for Markov decision processes. ORSA J. Comput. 6(2):188–192.LinkGoogle Scholar
  • [45] Morrison N, Roberts A, Scott A (2021) Maximising the number of cycles in graphs with forbidden subgraphs. J. Combin. Theory Ser. B 147:201–237.CrossrefGoogle Scholar
  • [46] Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.LinkGoogle Scholar
  • [47] Perl Y (1987) Digraphs with maximum number of paths and cycles. Networks 17(3):295–305.CrossrefGoogle Scholar
  • [48] Post I, Ye Y (2015) The simplex method is strongly polynomial for deterministic Markov decision processes. Math. Oper. Res. 40(4):859–868.LinkGoogle Scholar
  • [49] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics (John Wiley & Sons, Inc., New York).Google Scholar
  • [50] Rautenbach D, Stella I (2005) On the maximum number of cycles in a Hamiltonian graph. Discrete Math. 304(1):101–107.CrossrefGoogle Scholar
  • [51] Reid KB (1976) Cycles in the complement of a tree. Discrete Math. 15(2):163–174.CrossrefGoogle Scholar
  • [52] Scherrer B (2016) Improved and generalized upper bounds on the complexity of policy iteration. Math. Oper. Res. 41(3):758–774.LinkGoogle Scholar
  • [53] Shi Y (1994) The number of cycles in a Hamilton graph. Discrete Math. 133(1):249–257.CrossrefGoogle Scholar
  • [54] Soules GW (2005) Permanental bounds for nonnegative matrices via decomposition. Linear Algebra Its Appl. 394:73–89.CrossrefGoogle Scholar
  • [55] Szepesvári C (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan & Claypool Publishers, San Rafael, CA).CrossrefGoogle Scholar
  • [56] Takács L (1988) On the limit distribution of the number of cycles in a random graph. J. Appl. Probab. 25(A):359–376.CrossrefGoogle Scholar
  • [57] Taraviya M, Kalyanakrishnan S (2020) A tighter analysis of randomised policy iteration. Adams RP, Gogate V, eds. Proc. 35th Uncertainty Artificial Intelligence Conf., Proceedings of Machine Learning Research, vol. 115 (PMLR, New York), 519–529.Google Scholar
  • [58] Volkmann L (1996) Estimations for the number of cycles in a graph. Periodica Math. Hungarica 33(2):153–161.CrossrefGoogle Scholar
  • [59] Ye Y (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.LinkGoogle Scholar
  • [60] Zhou B (1988) The maximum number of cycles in the complement of a tree. Discrete Math. 69(1):85–94.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.