Robust Modified Policy Iteration

Published Online:https://doi.org/10.1287/ijoc.1120.0509

References

  • Bagnell J, Ng A, Schneider J. Solving uncertain Markov decision problems. (2001) . Technical report, Robotics Inst., CMU, 1–11Google Scholar
  • Bertsimas D, Iancu DA, Parrilo P. Optimality of affine policies in multistage robust optimization. Math Oper. Res. (2010) 35(2):363–394LinkGoogle Scholar
  • Bienstock D, Özbay N. Computing robust basestock levels. Discrete Optim. (2008) 5(2):389–414CrossrefGoogle Scholar
  • Delage E, Mannor S. Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. (2010) 58(1):203–213LinkGoogle Scholar
  • Harmanec D. Generalizing Markov decision processes to imprecise probabilities. J. Statist. Planning and Inference (2002) 105(1):199–213CrossrefGoogle Scholar
  • Iyengar G. Robust dynamic programming. Math Oper. Res. (2005) 30(2):257–280LinkGoogle Scholar
  • Li B, Si J. Robust optimality for discounted infinite-horizon Markov decision processes with uncertain transition matrices. IEEE Trans. Automatic Control (2008) 53(9):112–2116CrossrefGoogle Scholar
  • Mannor S, Simester D, Sun P, Tsitsiklis JN. Bias and variance approximation in value function estimates. Management Sci. (2007) 53(2):308–322LinkGoogle Scholar
  • Martin JJ. Bayesian Decision Problems and Markov Chains (1967) (John Wiley and Sons, Inc., New York) No. 13 in Publications in Operations ResearchGoogle Scholar
  • Nilim A, El Ghaoui L. Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. (2005) 53(5):780–798LinkGoogle Scholar
  • Puterman ML. Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley and Sons, Inc., New York) Wiley Series in Probability and Mathematical StatisticsCrossrefGoogle Scholar
  • Puterman ML, Shin MC. Modified policy iteration algorithms for discounted Markov decision problems. Management Sci. (1978) 24(11):1127–1137LinkGoogle Scholar
  • Satia JK, Lave RE. Markov decision processes with uncertain transition probabilities. Oper. Res. (1973) 21(3):728–740LinkGoogle Scholar
  • Scarf HE. A min-max solution of an inventory problem. Studies in the Mathematical Theory of Inventory and Production (1958) (Stanford University Press, Stanford, CA) Google Scholar
  • See C-T, Sim M. Robust approximation to multi-period inventory management. Oper. Res. (2010) 58(3):583–594LinkGoogle Scholar
  • Silver EA. Markov decision processes with uncertain transition probabilities or rewards. (1963) . Technical Report 1, Research in the Control of Complex Systems, Operations Research Center, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
  • Singh SP, Gullapalli V. Asynchronous modified policy iteration with single-sided updates. (1993) . Unpublished manuscript. 1 Feb. 2011. http://www.eecs.umich.edu/~baveja/papers/single-sided.ps.gzGoogle Scholar
  • Wee KE, Dada M. Optimal policies for transshipping inventory in a retail network. Management Sci. (2005) 51(10):1519–1533LinkGoogle Scholar
  • White CC, Eldeib HK. Markov decision processes with imprecise transition probabilities. Oper. Res. (1994) 42(4):739–749LinkGoogle Scholar
  • White CC, Thomas LC, Scherer WT. Reward revision for discounted Markov decision problems. Oper. Res. (1985) 33(6):1299–1315LinkGoogle Scholar
  • Williams RJ, Baird LC. Analysis of some incremental variants of policy iteration: First steps towards understanding actor-critic learning systems. (1993) . Technical Report NU-CCS-93-11, Northeastern University, College of Computer Science, Boston, MA, 02115Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.