Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains

Michael O’Sullivan
Corresponding Author
Michael O’Sullivan
[email protected]
Department of Engineering Science, University of Auckland, Auckland 1142, New Zealand
Search for more papers by this author
,
Arthur F. Veinott, Jr.
Arthur F. Veinott, Jr.
Formerly at the Department of Management Science and Engineering, Stanford University, Stanford California
Search for more papers by this author

Michael O’Sullivan

Corresponding Author

Michael O’Sullivan

[email protected]

Department of Engineering Science, University of Auckland, Auckland 1142, New Zealand

Search for more papers by this author

Arthur F. Veinott, Jr. (deceased)

Arthur F. Veinott, Jr.

Formerly at the Department of Management Science and Engineering, Stanford University, Stanford California

Search for more papers by this author

Published Online:14 Dec 2016https://doi.org/10.1287/moor.2016.0812

References

Andersen ED, Ye Y (1996) Combining interior-point and pivoting algorithms for linear programming. Management Sci. 42(12):1719–1731.Link, Google Scholar
Aspvall B, Shiloach Y (1980) A polynomial time algorithm for solving systems of linear inequalities with two variables per inequality. SIAM J. Comput. 9(4):827–845.Crossref, Google Scholar
Avrachenkov KE, Altman E (1999) Sensitive discount optimality via nested linear programs for ergodic Markov decision processes. Proc. Inform., Decision and Control, IDC ’99. (IEEE, Piscataway, NJ), 53–58.Crossref, Google Scholar
Balinski M (1961) On Solving Discrete Stochastic Decision Problems. U.S. Naval Supply System Research Study 2 (Mathematica, Princeton, NJ).Google Scholar
Bather JA (1973) Optimal decision procedures for finite Markov chains. Part III: General convex systems. Adv. Appl. Prob. 5(3):541–558.Crossref, Google Scholar
Bellman RA (1958) On a routing problem. Quart. Appl. Math. 16(1):87–90.Crossref, Google Scholar
Bertsekas DP, Tsitsiklis JN (1991) An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3):580–595.Link, Google Scholar
Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.Crossref, Google Scholar
de Ghellinck G (1960) Les problèms de décisions séquentielles. Cahiers du Centre d’Etudes de Recherche Opérationnelle 2(2):161–179.Google Scholar
Denardo EV (1970) Computing a bias-optimal policy in a discrete-time Markov decision problem. Oper. Res. 18(2):279–289.Link, Google Scholar
Denardo EV (1970) On linear programming in a Markov decision problem. Management Sci. 16(5):281–288.Link, Google Scholar
Denardo EV (1971) Markov renewal programs with small interest rates. Ann. Math. Statist. 42(2):477–496.Crossref, Google Scholar
Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16(3):468–487.Crossref, Google Scholar
d’Epenoux F (1960) Sur un problème de production et de stockage dans l’aléatoire. Revue Française de Recherche Opérationnelle 4(14):3–13. An English translation appears as: A probabilistic production and inventory problem. Management Sci. 10(1):98–108.Google Scholar
Derman C (1962) On sequential decisions and Markov chains. Management Sci. 9(1):16–24.Link, Google Scholar
Eaves BC, Veinott AF Jr (1974) Policy improvement in positive, negative and stopping Markov decision chains. Unpublished manuscript.Google Scholar
Eaves BC, Veinott AF Jr (2014) Maximum-stopping-value policies in finite Markov population decision chains. Math. Oper. Res. 39(3):597–606.Link, Google Scholar
Filar AE, Altman E, Avrachenkov K (2002) An asymptotic simplex method for singularly perturbed linear programs. Oper. Res. Lett. 30(5):295–307.Crossref, Google Scholar
Ford LR (1956) Network flow theory. Technical Report P-923, The RAND Corporation, Santa Monica, CA.Google Scholar
Hartman M, Arguelles C (1999) Transience bounds for long walks. Math. Oper. Res. 24(2):414–439.Link, Google Scholar
Haviv M, Puterman ML (1991) An improved algorithm for solving communicating average reward Markov decision processes. Ann. Oper. Res. 28(1):229–242.Crossref, Google Scholar
Hochbaum DS, Naor J (1994) Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23(6):1179–1192.Crossref, Google Scholar
Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352–362.Link, Google Scholar
Hordijk A, Yushkevich AA (2002) Blackwell optimality. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes (Kluwer, Boston), 231–267.Crossref, Google Scholar
Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity-analysis in discounted Markovian decision problems. Oper.-Res.-Spektrum 7(3):143–151.Crossref, Google Scholar
Howard RA (1960) Dynamic Programming and Markov Processes (Technology Press-Wiley, Cambridge, MA).Google Scholar
Huang Y, Veinott AF (1995) Markov branching decision chains with interest-rate-dependent rewards. Probab. Engrg. Informational Sci. 9(1):99–121.Crossref, Google Scholar
Jeroslow RG (1973) Asymptotic linear programming. Oper. Res. 21(5):1128–1141.Link, Google Scholar
Kallenberg LCM (1980) Optimal Transient Policies, Section 3.3 (Mathematisch Centrum, Amsterdam), 49–63.Google Scholar
Karp RM (1978) A characterization of the minimum cycle mean in a digraph. Discrete Math. 23(3):309–311.Crossref, Google Scholar
Manne AS (1960) Linear programming and sequential decisions. Management Sci. 6(3):259–267.Link, Google Scholar
Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40(2):366–370.Crossref, Google Scholar
Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
Renegar J (1988) A polynomial-time algorithm, based on Newton’s method, for linear programming. Math. Prog. 40(1):59–93.Crossref, Google Scholar
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Math. Oper. Res. 16(1):195–207.Link, Google Scholar
Rothblum UG, Veinott AF Jr (1992) Markov branching decision chains: Immigration- induced optimality. Technical Report 45, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar
Tarjan RE (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2):146–160.Crossref, Google Scholar
Vaidya PM (1990) An algorithm for linear programming which requires o(((m + n)n2 + (m + n)1.5n)l) arithmetic operations. Math. Programming 47(1–3):175–201.Crossref, Google Scholar
Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37(5):1284–1294.Crossref, Google Scholar
Veinott AF Jr (1968) Discrete dynamic programming with sensitive optimality criteria. Ann. Math. Statist. 39:1372. Preliminary Report.Google Scholar
Veinott AF Jr (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40(5):1635–1660.Crossref, Google Scholar
Veinott AF Jr (1969) Problem 3, stopping problems, homework due November 6, 1969. Operations Research 351, Dynamic Programming and Stochastic Control (Department Operations Research, Stanford University, Stanford, CA).Google Scholar
Veinott AF Jr (1974) Markov decision chains. Eaves BC, Dantzig GB, eds. Studies in Optimization. MAA Studies in Mathematics, Vol. 10 (Mathematical Association of America, Washington, DC), 124–159.Google Scholar
Ye Y (1991) An o(n3l) potential reduction algorithm for linear programming. Math. Prog. 50(1):239–258.Crossref, Google Scholar
Ye Y, Todd MJ, Mizuno S (1994) An o(nl)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1):53–67.Link, Google Scholar

cover image Mathematics of Operations Research

Volume 42, Issue 3

August 2017

Pages 577-896

Article Information

Metrics

Information

Received:January 22, 2015
Accepted:June 07, 2016
Published Online:December 14, 2016

Cite as

Michael O’Sullivan, Arthur F. Veinott, Jr. (2016) Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains. Mathematics of Operations Research 42(3):577-598.

https://doi.org/10.1287/moor.2016.0812

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains

References

Volume 42, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News