Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

Mengdi Wang
Corresponding Author
Mengdi Wang
http://orcid.org/0000-0002-2101-9507
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Mengdi Wang

Corresponding Author

Mengdi Wang

http://orcid.org/0000-0002-2101-9507

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:16 Oct 2019https://doi.org/10.1287/moor.2019.1000

References

[1] Azar MG, Munos R, Kappen HJ (2013) Minimax pac bounds on the sample complexity of reinforcement learning with a generative model. Machine Learn. 91(3):325–349.Crossref, Google Scholar
[2] Bellman R (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
[3] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, Berlin).Google Scholar
[4] Bertsekas DP (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
[5] Bertsekas DP (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
[6] Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: An overview. Proc. 34th IEEE Conf. Decision Control, vol. 1 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 560–564.Google Scholar
[7] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[8] Bringmann K, Panagiotou K (2012) Efficient sampling methods for discrete distributions. Czumaj A, Mehlhorn K, Pitts A, Wattenhofer R, eds. Automata, Languages, and Programming—ICALP 2012, Lecture Notes in Computer Science, vol. 7391 (Springer, Berlin, Heidelberg), 133–144.Crossref, Google Scholar
[9] Chen Y, Wang M (2016) Stochastic primal–dual methods and sample complexity of reinforcement learning. Preprint, submitted December 8, https://arxiv.org/abs/1612.02516.Google Scholar
[10] Chen Y, Wang M (2017) Lower bound on the computational complexity of discounted Markov decision problems. Preprint, submitted May 20, https://arxiv.org/abs/1705.07312.Google Scholar
[11] Clarkson KL, Hazan E, Woodruff DP (2012) Sublinear optimization for machine learning. J. ACM 59(5):23.Crossref, Google Scholar
[12] Cover TM, Thomas JA (2012) Elements of Information Theory (John Wiley & Sons, Hoboken, NJ).Google Scholar
[13] Dantzig G (2016) Linear Programming and Extensions (Princeton University Press, Princeton, NJ).Google Scholar
[14] de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
[15] De Ghellinck G (1960) Les problemes de decisions sequentielles. Cahiers du Centre d’Études de Recherche Opérationnelle 2(2):161–179.Google Scholar
[16] d’Epenoux F (1963) A probabilistic production and inventory problem. Management Sci. 10(1):98–108.Link, Google Scholar
[17] Feinberg EA, Huang J (2014) The value iteration algorithm is not strongly polynomial for discounted dynamic programming. Oper. Res. Lett. 42(2):130–131.Crossref, Google Scholar
[18] Howard RA (1960) Dynamic Programming and Markov Processes (MIT Press, Cambridge, MA).Google Scholar
[19] Juditsky A, Nemirovski A, Tauvel C (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.Link, Google Scholar
[20] Kakade SM (2003) On the sample complexity of reinforcement learning. Unpublished PhD thesis, University of London, London.Google Scholar
[21] Kearns MJ, Singh SP (1999) Finite-sample convergence rates for Q-learning and indirect algorithms. Kearns MJ, Solla SA, Cohn DA, eds. Advances in Neural Information Processing Systems, vol. 11 (MIT Press, Cambridge, MA), 996–1002.Google Scholar
[22] Kivinen J, Warmuth MK (1997) Exponentiated gradient vs. gradient descent for linear predictors. Inform. Comput. 132(1):1–63.Crossref, Google Scholar
[23] Kushner HJ, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, Berlin).Google Scholar
[24] Lee YT, Sidford A (2014) Path finding methods for linear programming: Solving linear programs in Õ(vrank) iterations and faster algorithms for maximum flow. Proc. IEEE 55th Annual Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 424–433.Google Scholar
[25] Lee YT, Sidford A (2015) Efficient inverse maintenance and faster algorithms for linear programming. Proc. IEEE 56th Annual Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 230–249.Crossref, Google Scholar
[26] Littlestone N (1988) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learn. 2(4):285–318.Crossref, Google Scholar
[27] LittmanML Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems. Proc. 11th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, Burlington, MA), 394–402.Google Scholar
[28] Mansour Y, Singh S (1999) On the complexity of policy iteration. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, Burlington, MA), 401–408.Google Scholar
[29] Nemirovski A, Rubinstein RY (2005) An efficient stochastic approximation algorithm for stochastic saddle point problems. Modeling Uncertainty (Springer, Berlin), 156–184.Google Scholar
[30] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[31] Puterman LM (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, Hoboken, NJ).Google Scholar
[32] Scherrer B (2013) Improved and generalized upper bounds on the complexity of policy iteration. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 386–394.Google Scholar
[33] Sidford A, Wang M, Wu X, Ye Y (2018) Variance reduced value iteration and faster algorithms for solving Markov decision processes. Proc. 29th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 770–787.Crossref, Google Scholar
[34] Sidford A, Wang M, Wu X, Yang LF, Ye Y (2018) Near-optimal time and sample complexities for solving discounted Markov decision process with a generative model. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Red Hook, NY), 5192–5202.Google Scholar
[35] Tseng P (1990) Solving h-horizon, stationary Markov decision problems in time proportional to log (h). Oper. Res. Lett. 9(5):287–297.Crossref, Google Scholar
[36] Wang M, Chen Y (2016) An online primal–dual method for discounted Markov decision processes. Proc. IEEE Conf. Decisions Control (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 4516–4521.Crossref, Google Scholar
[37] Wong CK, Easton MC (1980) An efficient method for weighted sampling without replacement. SIAM J. Comput. 9(1):111–113.Crossref, Google Scholar
[38] Ye Y (2005) A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30(3):733–749.Link, Google Scholar
[39] Ye Y (2011) The simplex and policy-iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4):593–603.Link, Google Scholar

cover image Mathematics of Operations Research

Volume 45, Issue 2

May 2020

Pages 403-795, C2

Article Information

Metrics

Information

Received:May 28, 2018
Accepted:December 23, 2018
Published Online:October 16, 2019

Cite as

Mengdi Wang (2019) Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time. Mathematics of Operations Research 45(2):517-546.

https://doi.org/10.1287/moor.2019.1000

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

References

Volume 45, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News