Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains
Published Online:14 Dec 2016https://doi.org/10.1287/moor.2016.0812
References
- (1996) Combining interior-point and pivoting algorithms for linear programming. Management Sci. 42(12):1719–1731.Link, Google Scholar
- (1980) A polynomial time algorithm for solving systems of linear inequalities with two variables per inequality. SIAM J. Comput. 9(4):827–845.Crossref, Google Scholar
- (1999) Sensitive discount optimality via nested linear programs for ergodic Markov decision processes. Proc. Inform., Decision and Control, IDC ’99. (IEEE, Piscataway, NJ), 53–58.Crossref, Google Scholar
- (1961) On Solving Discrete Stochastic Decision Problems. U.S. Naval Supply System Research Study 2 (Mathematica, Princeton, NJ).Google Scholar
- (1973) Optimal decision procedures for finite Markov chains. Part III: General convex systems. Adv. Appl. Prob. 5(3):541–558.Crossref, Google Scholar
- (1958) On a routing problem. Quart. Appl. Math. 16(1):87–90.Crossref, Google Scholar
- (1991) An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3):580–595.Link, Google Scholar
- (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.Crossref, Google Scholar
- de Ghellinck G (1960) Les problèms de décisions séquentielles. Cahiers du Centre d’Etudes de Recherche Opérationnelle 2(2):161–179.Google Scholar
- (1970) Computing a bias-optimal policy in a discrete-time Markov decision problem. Oper. Res. 18(2):279–289.Link, Google Scholar
- (1970) On linear programming in a Markov decision problem. Management Sci. 16(5):281–288.Link, Google Scholar
- (1971) Markov renewal programs with small interest rates. Ann. Math. Statist. 42(2):477–496.Crossref, Google Scholar
- (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16(3):468–487.Crossref, Google Scholar
- (1960) Sur un problème de production et de stockage dans l’aléatoire. Revue Française de Recherche Opérationnelle 4(14):3–13. An English translation appears as: A probabilistic production and inventory problem. Management Sci. 10(1):98–108.Google Scholar
- (1962) On sequential decisions and Markov chains. Management Sci. 9(1):16–24.Link, Google Scholar
- (1974) Policy improvement in positive, negative and stopping Markov decision chains. Unpublished manuscript.Google Scholar
- (2014) Maximum-stopping-value policies in finite Markov population decision chains. Math. Oper. Res. 39(3):597–606.Link, Google Scholar
- (2002) An asymptotic simplex method for singularly perturbed linear programs. Oper. Res. Lett. 30(5):295–307.Crossref, Google Scholar
- (1956) Network flow theory. Technical Report P-923, The RAND Corporation, Santa Monica, CA.Google Scholar
- (1999) Transience bounds for long walks. Math. Oper. Res. 24(2):414–439.Link, Google Scholar
- (1991) An improved algorithm for solving communicating average reward Markov decision processes. Ann. Oper. Res. 28(1):229–242.Crossref, Google Scholar
- (1994) Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23(6):1179–1192.Crossref, Google Scholar
- (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352–362.Link, Google Scholar
- (2002) Blackwell optimality. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes (Kluwer, Boston), 231–267.Crossref, Google Scholar
- (1985) Sensitivity-analysis in discounted Markovian decision problems. Oper.-Res.-Spektrum 7(3):143–151.Crossref, Google Scholar
- (1960) Dynamic Programming and Markov Processes (Technology Press-Wiley, Cambridge, MA).Google Scholar
- (1995) Markov branching decision chains with interest-rate-dependent rewards. Probab. Engrg. Informational Sci. 9(1):99–121.Crossref, Google Scholar
- (1973) Asymptotic linear programming. Oper. Res. 21(5):1128–1141.Link, Google Scholar
- (1980) Optimal Transient Policies, Section 3.3 (Mathematisch Centrum, Amsterdam), 49–63.Google Scholar
- (1978) A characterization of the minimum cycle mean in a digraph. Discrete Math. 23(3):309–311.Crossref, Google Scholar
- (1960) Linear programming and sequential decisions. Management Sci. 6(3):259–267.Link, Google Scholar
- (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40(2):366–370.Crossref, Google Scholar
- (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
- (1988) A polynomial-time algorithm, based on Newton’s method, for linear programming. Math. Prog. 40(1):59–93.Crossref, Google Scholar
- (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Math. Oper. Res. 16(1):195–207.Link, Google Scholar
- (1992) Markov branching decision chains: Immigration- induced optimality. Technical Report 45, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar
- (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2):146–160.Crossref, Google Scholar
- (1990) An algorithm for linear programming which requires o(((m + n)n2 + (m + n)1.5n)l) arithmetic operations. Math. Programming 47(1–3):175–201.Crossref, Google Scholar
- (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37(5):1284–1294.Crossref, Google Scholar
- (1968) Discrete dynamic programming with sensitive optimality criteria. Ann. Math. Statist. 39:1372. Preliminary Report.Google Scholar
- (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40(5):1635–1660.Crossref, Google Scholar
- (1969) Problem 3, stopping problems, homework due November 6, 1969. Operations Research 351, Dynamic Programming and Stochastic Control (Department Operations Research, Stanford University, Stanford, CA).Google Scholar
- (1974) Markov decision chains. Eaves BC, Dantzig GB, eds. Studies in Optimization. MAA Studies in Mathematics, Vol. 10 (Mathematical Association of America, Washington, DC), 124–159.Google Scholar
- (1991) An o(n3l) potential reduction algorithm for linear programming. Math. Prog. 50(1):239–258.Crossref, Google Scholar
- (1994) An o(nl)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1):53–67.Link, Google Scholar

