Upper Bounds for All and Max-Gain Policy Iteration Algorithms on Deterministic MDPs
References
- [1] (1897) Ueber das Gleichungssystem einer Kirchhoff’schen galvanischen Stromverzweigung. Mathematische Annalen 49(2):311–324.Crossref, Google Scholar
- [2] (2018) On the number of cycles in a graph. Math. Slovaca 68(1):1–10.Crossref, Google Scholar
- [3] (1997) On the number of cycles in 3-connected cubic graphs. J. Combin. Theory Ser. B 71(1):79–84.Crossref, Google Scholar
- [4] (2008) On the maximum number of cycles in a planar graph. J. Graph Theory 57(3):255–264.Crossref, Google Scholar
- [5] (1985) On the number of cycles possible in digraphs with large girth. Discrete Appl. Math. 10(3):211–225.Crossref, Google Scholar
- [6] (1999) On the number of simple cycles in planar graphs. Combinatorics Probab. Comput. 8(5):397–405.Crossref, Google Scholar
- [7] (2019) The maximum number of cycles in a graph with fixed number of edges. Electronic J. Combinatorics 26(4):P4.42.Crossref, Google Scholar
- [8] (2016) Triangle-free graphs with the maximum number of cycles. Discrete Math. 339(2):699–711.Crossref, Google Scholar
- [9] (2020) Lower bounds for policy iteration on multi-action MDPs. 2020 59th IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1744–1749.Google Scholar
- [10] (2009) On combinatorial properties of linear program digraphs. Avis D, Bremner D, Deza A, eds. Polyhedral Computation, CRM Proceedings and Lecture Notes, vol. 48 (American Mathematical Society, Providence, RI), 1–13.Crossref, Google Scholar
- [11] (1957) Dynamic Programming, 1st ed. (Princeton University Press, Princeton, NJ).Google Scholar
- [12] (1973) Certain properties of nonnegative matrices and their permanents. Doklady Akademii Nauk SSSR 211(1):27–30.Google Scholar
- [13] (2007) On the number of cycles in planar graphs. Lin G, ed. Computing and Combinatorics (Springer, Berlin), 97–107.Crossref, Google Scholar
- [14] (1963) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- [15] (2012) On the maximum number of cycles in outerplanar and series-parallel graphs. Graphs Combinatorics 28(2):265–275.Crossref, Google Scholar
- [16] (1994) Maximum path digraphs. Discrete Appl. Math. 50(3):221–237.Crossref, Google Scholar
- [17] (1970) Finite State Markovian Decision Processes, Mathematics in Science and Engineering, vol. 67 (Academic Press, New York).Google Scholar
- [18] (2019) A faster deterministic exponential time algorithm for energy games and mean payoff games. 46th Internat. Colloquium Automata Languages Programming, Leibniz International Proceedings in Informatics, vol. 132 (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Wadern, Germany), 114.Google Scholar
- [19] (2015) Cycle-maximal triangle-free graphs. Discrete Math. 338(2):274–290.Crossref, Google Scholar
- [20] (2021) Bounding the number of cycles in a graph in terms of its degree sequence. Eur. J. Combinatorics 91:103206.Crossref, Google Scholar
- [21] (1981) On the maximum number of cycles in a graph. Ars Combinatoria 11:289–294.Google Scholar
- [22] (2010) Exponential lower bounds for policy iteration. Abramsky S, Gavoille C, Kirchner C, Meyer auf der Heide F, Spirakis PG, eds. Automata, Languages and Programming (Springer, Berlin), 551–562.Crossref, Google Scholar
- [23] (2014) The value iteration algorithm is not strongly polynomial for discounted dynamic programming. Oper. Res. Lett. 42(2):130–131.Crossref, Google Scholar
- [24] (2018) On the number of cycles in a graph with restricted cycle lengths. SIAM J. Discrete Math. 32(1):266–279.Crossref, Google Scholar
- [25] (1979) Generalized Fibonacci maximum path graphs. Discrete Math. 28(3):237–245.Crossref, Google Scholar
- [26] (1996) The maximum number of cycles in graphs. Congressus Numerantium 121:211–215.Google Scholar
- [27] (2017) Improved strong worst-case upper bounds for MDP planning. Sierra C, ed. Proc. 26th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence), 1788–1794.Google Scholar
- [28] (2012) Worst-case analysis of strategy iteration and the simplex method. Unpublished PhD thesis, Aarhus University, Aarhus, Denmark.Google Scholar
- [29] (2010) Lower bounds for Howard’s algorithm for finding minimum mean-cost cycles. Cheong O, Chwa KY, Park K, eds. Algorithms and Computation (Springer, Berlin), 415–426.Crossref, Google Scholar
- [30] (2014) Dantzig’s pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 847–860.Google Scholar
- [31] (2014) Improved upper bounds for random-edge and random-jump on abstract cubes. Chekuri C, ed. Proc. 2014 Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 874–881.Google Scholar
- [32] (2012) The complexity of policy iteration is exponential for discounted Markov decision processes. 2012 IEEE 51st IEEE Conf. Decision Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 5997–6002.Google Scholar
- [33] (2016) Improved bound on the worst case complexity of policy iteration. Oper. Res. Lett. 44(2):267–272.Crossref, Google Scholar
- [34] (1960) Dynamic Programming and Markov Processes (MIT Press, Cambridge, MA).Google Scholar
- [35] (2016) Batch-switching policy iteration. Kambhampati S, ed. Proc. 25th Internat. Joint Conf. Artificial Intelligence (AAAI Press / International Joint Conferences on Artificial Intelligence, Palo Alto, CA), 3147–3153.Google Scholar
- [36] (2016) Randomised procedures for initialising and switching actions in policy iteration. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3145–3151.Google Scholar
- [37] (2013) A bound for the number of different basic solutions generated by the simplex method. Math. Programming 137(1–2):579–586.Crossref, Google Scholar
- [38] (1994) On the number of cycles in k-connected graphs. Acta Math. Universitatis Comenianae 63(2):315–321.Google Scholar
- [39] (1995) On the complexity of solving Markov decision problems. Besnard P, Hanks S, eds. Proc. 11th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 394–402.Google Scholar
- [40] (2010) Discounted deterministic Markov decision processes and discounted all-pairs shortest paths. ACM Trans. Algorithms 6(2):1–25.Crossref, Google Scholar
- [41] (1999) On the complexity of policy iteration. Laskey KB, Prade H, eds. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 401–408.Google Scholar
- [42] (1983) Towards a genuinely polynomial algorithm for linear programming. SIAM J. Comput. 12(2):347–353.Crossref, Google Scholar
- [43] (2008) Groups, Graphs and Trees: An Introduction to the Geometry of Infinite Groups (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [44] (1994) On the complexity of the policy improvement algorithm for Markov decision processes. ORSA J. Comput. 6(2):188–192.Link, Google Scholar
- [45] (2021) Maximising the number of cycles in graphs with forbidden subgraphs. J. Combin. Theory Ser. B 147:201–237.Crossref, Google Scholar
- [46] (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
- [47] (1987) Digraphs with maximum number of paths and cycles. Networks 17(3):295–305.Crossref, Google Scholar
- [48] (2015) The simplex method is strongly polynomial for deterministic Markov decision processes. Math. Oper. Res. 40(4):859–868.Link, Google Scholar
- [49] (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics (John Wiley & Sons, Inc., New York).Google Scholar
- [50] (2005) On the maximum number of cycles in a Hamiltonian graph. Discrete Math. 304(1):101–107.Crossref, Google Scholar
- [51] (1976) Cycles in the complement of a tree. Discrete Math. 15(2):163–174.Crossref, Google Scholar
- [52] (2016) Improved and generalized upper bounds on the complexity of policy iteration. Math. Oper. Res. 41(3):758–774.Link, Google Scholar
- [53] (1994) The number of cycles in a Hamilton graph. Discrete Math. 133(1):249–257.Crossref, Google Scholar
- [54] (2005) Permanental bounds for nonnegative matrices via decomposition. Linear Algebra Its Appl. 394:73–89.Crossref, Google Scholar
- [55] (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan & Claypool Publishers, San Rafael, CA).Crossref, Google Scholar
- [56] (1988) On the limit distribution of the number of cycles in a random graph. J. Appl. Probab. 25(A):359–376.Crossref, Google Scholar
- [57] (2020) A tighter analysis of randomised policy iteration. Adams RP, Gogate V, eds. Proc. 35th Uncertainty Artificial Intelligence Conf., Proceedings of Machine Learning Research, vol. 115 (PMLR, New York), 519–529.Google Scholar
- [58] (1996) Estimations for the number of cycles in a graph. Periodica Math. Hungarica 33(2):153–161.Crossref, Google Scholar
- [59] (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar
- [60] (1988) The maximum number of cycles in the complement of a tree. Discrete Math. 69(1):85–94.Crossref, Google Scholar

