Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains

Published Online:https://doi.org/10.1287/moor.2016.0812

References

  • Andersen ED, Ye Y (1996) Combining interior-point and pivoting algorithms for linear programming. Management Sci. 42(12):1719–1731.LinkGoogle Scholar
  • Aspvall B, Shiloach Y (1980) A polynomial time algorithm for solving systems of linear inequalities with two variables per inequality. SIAM J. Comput. 9(4):827–845.CrossrefGoogle Scholar
  • Avrachenkov KE, Altman E (1999) Sensitive discount optimality via nested linear programs for ergodic Markov decision processes. Proc. Inform., Decision and Control, IDC ’99. (IEEE, Piscataway, NJ), 53–58.CrossrefGoogle Scholar
  • Balinski M (1961) On Solving Discrete Stochastic Decision Problems. U.S. Naval Supply System Research Study 2 (Mathematica, Princeton, NJ).Google Scholar
  • Bather JA (1973) Optimal decision procedures for finite Markov chains. Part III: General convex systems. Adv. Appl. Prob. 5(3):541–558.CrossrefGoogle Scholar
  • Bellman RA (1958) On a routing problem. Quart. Appl. Math. 16(1):87–90.CrossrefGoogle Scholar
  • Bertsekas DP, Tsitsiklis JN (1991) An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3):580–595.LinkGoogle Scholar
  • Blackwell D (1962) Discrete dynamic programming. Ann. Math. Statist. 33(2):719–726.CrossrefGoogle Scholar
  • de Ghellinck G (1960) Les problèms de décisions séquentielles. Cahiers du Centre d’Etudes de Recherche Opérationnelle 2(2):161–179.Google Scholar
  • Denardo EV (1970) Computing a bias-optimal policy in a discrete-time Markov decision problem. Oper. Res. 18(2):279–289.LinkGoogle Scholar
  • Denardo EV (1970) On linear programming in a Markov decision problem. Management Sci. 16(5):281–288.LinkGoogle Scholar
  • Denardo EV (1971) Markov renewal programs with small interest rates. Ann. Math. Statist. 42(2):477–496.CrossrefGoogle Scholar
  • Denardo EV, Fox BL (1968) Multichain Markov renewal programs. SIAM J. Appl. Math. 16(3):468–487.CrossrefGoogle Scholar
  • d’Epenoux F (1960) Sur un problème de production et de stockage dans l’aléatoire. Revue Française de Recherche Opérationnelle 4(14):3–13. An English translation appears as: A probabilistic production and inventory problem. Management Sci. 10(1):98–108.Google Scholar
  • Derman C (1962) On sequential decisions and Markov chains. Management Sci. 9(1):16–24.LinkGoogle Scholar
  • Eaves BC, Veinott AF Jr (1974) Policy improvement in positive, negative and stopping Markov decision chains. Unpublished manuscript.Google Scholar
  • Eaves BC, Veinott AF Jr (2014) Maximum-stopping-value policies in finite Markov population decision chains. Math. Oper. Res. 39(3):597–606.LinkGoogle Scholar
  • Filar AE, Altman E, Avrachenkov K (2002) An asymptotic simplex method for singularly perturbed linear programs. Oper. Res. Lett. 30(5):295–307.CrossrefGoogle Scholar
  • Ford LR (1956) Network flow theory. Technical Report P-923, The RAND Corporation, Santa Monica, CA.Google Scholar
  • Hartman M, Arguelles C (1999) Transience bounds for long walks. Math. Oper. Res. 24(2):414–439.LinkGoogle Scholar
  • Haviv M, Puterman ML (1991) An improved algorithm for solving communicating average reward Markov decision processes. Ann. Oper. Res. 28(1):229–242.CrossrefGoogle Scholar
  • Hochbaum DS, Naor J (1994) Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23(6):1179–1192.CrossrefGoogle Scholar
  • Hordijk A, Kallenberg LCM (1979) Linear programming and Markov decision chains. Management Sci. 25(4):352–362.LinkGoogle Scholar
  • Hordijk A, Yushkevich AA (2002) Blackwell optimality. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes (Kluwer, Boston), 231–267.CrossrefGoogle Scholar
  • Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity-analysis in discounted Markovian decision problems. Oper.-Res.-Spektrum 7(3):143–151.CrossrefGoogle Scholar
  • Howard RA (1960) Dynamic Programming and Markov Processes (Technology Press-Wiley, Cambridge, MA).Google Scholar
  • Huang Y, Veinott AF (1995) Markov branching decision chains with interest-rate-dependent rewards. Probab. Engrg. Informational Sci. 9(1):99–121.CrossrefGoogle Scholar
  • Jeroslow RG (1973) Asymptotic linear programming. Oper. Res. 21(5):1128–1141.LinkGoogle Scholar
  • Kallenberg LCM (1980) Optimal Transient Policies, Section 3.3 (Mathematisch Centrum, Amsterdam), 49–63.Google Scholar
  • Karp RM (1978) A characterization of the minimum cycle mean in a digraph. Discrete Math. 23(3):309–311.CrossrefGoogle Scholar
  • Manne AS (1960) Linear programming and sequential decisions. Management Sci. 6(3):259–267.LinkGoogle Scholar
  • Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40(2):366–370.CrossrefGoogle Scholar
  • Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.LinkGoogle Scholar
  • Renegar J (1988) A polynomial-time algorithm, based on Newton’s method, for linear programming. Math. Prog. 40(1):59–93.CrossrefGoogle Scholar
  • Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Math. Oper. Res. 16(1):195–207.LinkGoogle Scholar
  • Rothblum UG, Veinott AF Jr (1992) Markov branching decision chains: Immigration- induced optimality. Technical Report 45, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar
  • Tarjan RE (1972) Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2):146–160.CrossrefGoogle Scholar
  • Vaidya PM (1990) An algorithm for linear programming which requires o(((m + n)n2 + (m + n)1.5n)l) arithmetic operations. Math. Programming 47(1–3):175–201.CrossrefGoogle Scholar
  • Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37(5):1284–1294.CrossrefGoogle Scholar
  • Veinott AF Jr (1968) Discrete dynamic programming with sensitive optimality criteria. Ann. Math. Statist. 39:1372. Preliminary Report.Google Scholar
  • Veinott AF Jr (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40(5):1635–1660.CrossrefGoogle Scholar
  • Veinott AF Jr (1969) Problem 3, stopping problems, homework due November 6, 1969. Operations Research 351, Dynamic Programming and Stochastic Control (Department Operations Research, Stanford University, Stanford, CA).Google Scholar
  • Veinott AF Jr (1974) Markov decision chains. Eaves BC, Dantzig GB, eds. Studies in Optimization. MAA Studies in Mathematics, Vol. 10 (Mathematical Association of America, Washington, DC), 124–159.Google Scholar
  • Ye Y (1991) An o(n3l) potential reduction algorithm for linear programming. Math. Prog. 50(1):239–258.CrossrefGoogle Scholar
  • Ye Y, Todd MJ, Mizuno S (1994) An o(nl)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1):53–67.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.