Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time
Published Online:16 Oct 2019https://doi.org/10.1287/moor.2019.1000
References
- [1] (2013) Minimax pac bounds on the sample complexity of reinforcement learning with a generative model. Machine Learn. 91(3):325–349.Crossref, Google Scholar
- [2] (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
- [3] (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, Berlin).Google Scholar
- [4] (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
- [5] (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- [6] (1995) Neuro-dynamic programming: An overview. Proc. 34th IEEE Conf. Decision Control, vol. 1 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 560–564.Google Scholar
- [7] (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [8] (2012) Efficient sampling methods for discrete distributions. Czumaj A, Mehlhorn K, Pitts A, Wattenhofer R, eds. Automata, Languages, and Programming—ICALP 2012, Lecture Notes in Computer Science, vol. 7391 (Springer, Berlin, Heidelberg), 133–144.Crossref, Google Scholar
- [9] (2016) Stochastic primal–dual methods and sample complexity of reinforcement learning. Preprint, submitted December 8, https://arxiv.org/abs/1612.02516.Google Scholar
- [10] (2017) Lower bound on the computational complexity of discounted Markov decision problems. Preprint, submitted May 20, https://arxiv.org/abs/1705.07312.Google Scholar
- [11] (2012) Sublinear optimization for machine learning. J. ACM 59(5):23.Crossref, Google Scholar
- [12] (2012) Elements of Information Theory (John Wiley & Sons, Hoboken, NJ).Google Scholar
- [13] (2016) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).Google Scholar
- [14] (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
- [15] (1960) Les problemes de decisions sequentielles. Cahiers du Centre d’Études de Recherche Opérationnelle 2(2):161–179.Google Scholar
- [16] (1963) A probabilistic production and inventory problem. Management Sci. 10(1):98–108.Link, Google Scholar
- [17] (2014) The value iteration algorithm is not strongly polynomial for discounted dynamic programming. Oper. Res. Lett. 42(2):130–131.Crossref, Google Scholar
- [18] (1960) Dynamic Programming and Markov Processes (MIT Press, Cambridge, MA).Google Scholar
- [19] (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.Link, Google Scholar
- [20] (2003) On the sample complexity of reinforcement learning. Unpublished PhD thesis, University of London, London.Google Scholar
- [21] (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Kearns MJ, Solla SA, Cohn DA, eds. Advances in Neural Information Processing Systems, vol. 11 (MIT Press, Cambridge, MA), 996–1002.Google Scholar
- [22] (1997) Exponentiated gradient vs. gradient descent for linear predictors. Inform. Comput. 132(1):1–63.Crossref, Google Scholar
- [23] (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, Berlin).Google Scholar
- [24] (2014) Path finding methods for linear programming: Solving linear programs in Õ(vrank) iterations and faster algorithms for maximum flow. Proc. IEEE 55th Annual Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 424–433.Google Scholar
- [25] (2015) Efficient inverse maintenance and faster algorithms for linear programming. Proc. IEEE 56th Annual Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 230–249.Crossref, Google Scholar
- [26] (1988) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learn. 2(4):285–318.Crossref, Google Scholar
- [27] (1995) On the complexity of solving Markov decision problems. Proc. 11th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, Burlington, MA), 394–402.Google Scholar
- [28] (1999) On the complexity of policy iteration. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, Burlington, MA), 401–408.Google Scholar
- [29] (2005) An efficient stochastic approximation algorithm for stochastic saddle point problems. Modeling Uncertainty (Springer, Berlin), 156–184.Google Scholar
- [30] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [31] (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, Hoboken, NJ).Google Scholar
- [32] (2013) Improved and generalized upper bounds on the complexity of policy iteration. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 386–394.Google Scholar
- [33] (2018) Variance reduced value iteration and faster algorithms for solving Markov decision processes. Proc. 29th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 770–787.Crossref, Google Scholar
- [34] (2018) Near-optimal time and sample complexities for solving discounted Markov decision process with a generative model. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 5192–5202.Google Scholar
- [35] (1990) Solving h-horizon, stationary Markov decision problems in time proportional to log (h). Oper. Res. Lett. 9(5):287–297.Crossref, Google Scholar
- [36] (2016) An online primal–dual method for discounted Markov decision processes. Proc. IEEE Conf. Decisions Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 4516–4521.Crossref, Google Scholar
- [37] (1980) An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1):111–113.Crossref, Google Scholar
- [38] (2005) A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30(3):733–749.Link, Google Scholar
- [39] (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar

