Modified Policy Iteration Algorithms for Discounted Markov Decision Problems

Martin L. Puterman
Martin L. Puterman
University of British Columbia
Search for more papers by this author
,
Moon Chirl Shin
Moon Chirl Shin
University of British Columbia
Search for more papers by this author

Martin L. Puterman

University of British Columbia

Search for more papers by this author

Moon Chirl Shin

University of British Columbia

Search for more papers by this author

Published Online:1 Jul 1978https://doi.org/10.1287/mnsc.24.11.1127

Abstract

In this paper we study a class of modified policy iteration algorithms for solving Markov decision problems. These correspond to performing policy evaluation by successive approximations. We discuss the relationship of these algorithms to Newton-Kantorovich iteration and demonstrate their covergence. We show that all of these algorithms converge at least as quickly as successive approximations and obtain estimates of their rates of convergence. An analysis of the computational requirements of these algorithms suggests that they may be appropriate for solving problems with either large numbers of actions, large numbers of states, sparse transition matrices, or small discount rates. These algorithms are compared to policy iteration, successive approximations, and Gauss-Seidel methods on large randomly generated test problems.

Cited by
- Numerical Hopf–Lax formulae for Hamilton–Jacobi equations on unstructured geometries
  Journal of Computational and Applied Mathematics, Vol. 481
- Quantum algorithm for apprenticeship learning
  19 February 2026 | Quantum Machine Intelligence, Vol. 8, No. 1
- Optimizing play for learning risky behaviour
  25 February 2026 | Proceedings of the Royal Society B: Biological Sciences, Vol. 293, No. 2065
- Framework for state features design in job shop scheduling with deep reinforcement learning: Beyond empirical approaches
  30 October 2025 | Journal of Computational Design and Engineering, Vol. 13, No. 1
- Deep Reinforcement Learning
  2 June 2026
- A primal–dual policy iteration algorithm for constrained Markov decision processes
  European Journal of Operational Research, Vol. 328, No. 1
- Toward Comprehensive Learner Modeling for Personalized E-Learning: A Comparative Analysis of Techniques and Components
  29 May 2026
- Learner Modeling for Personalized E-Learning: A Comparative Analysis of Approaches
- Scaling Optimization over Uncertainty via Compilation
  9 April 2025 | Proceedings of the ACM on Programming Languages, Vol. 9, No. OOPSLA1
- The Art of Temporal Approximation: An Investigation into Numerical Solutions to Discrete- and Continuous-Time Problems in Economics
  3 May 2024 | Computational Economics, Vol. 65, No. 3
- The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
  Anna Winnicki; ,
  Joseph Lubars; ,
  Michael Livesay; ,
  R. Srikant
  30 May 2024 | Operations Research, Vol. 73, No. 1
- Deep reinforcement learning for approximate policy iteration: convergence analysis and a post-earthquake disaster response case study
  23 September 2023 | Optimization Letters, Vol. 18, No. 9
- Constructing Optimal Portfolio Rebalancing Strategies with a Two-Stage Multiresolution-Grid Model
  16 February 2024 | Computational Economics, Vol. 64, No. 5
- Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning ∗
  3 July 2024 | Quantum Science and Technology, Vol. 9, No. 4
- Deep Reinforcement Learning-Based Task Assignment for Cooperative Mobile Edge Computing
  IEEE Transactions on Mobile Computing, Vol. 23, No. 4
- Modeling market trading strategies of the intermediary entity for microgrids: A reinforcement learning-based approach
  Electric Power Systems Research, Vol. 227
- Channel Correlation in Multi-User Covert Communication: Friend or Foe?
  IEEE Transactions on Information Forensics and Security, Vol. 19
- Performance Bounds for Policy-Based Reinforcement Learning Methods in Zero-Sum Markov Games with Linear Function Approximation
- Nitty-Gritty of Deep Reinforcement Learning for the Healthcare Sector
- The Art of Temporal Approximation: An Investigation into Numerical Solutions to Discrete and Continuous-Time Problems in Economics
- Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
- Applications of Markov chain approximation methods to optimal control problems in economics
  Journal of Economic Dynamics and Control, Vol. 143
- Geometric Policy Iteration for Markov Decision Processes
  14 August 2022
- H-TD 2 : Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch
  IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 8
- Budget-aware Index Tuning with Reinforcement Learning
  11 June 2022
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
- Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method
  12 January 2022
- Markov Decision Processes with Discounted Costs: Improved Successive Over-Relaxation Method
  24 March 2022
- Task-Oriented Communication Design in Cyber-Physical Systems: A Survey on Theory and Applications
  IEEE Access, Vol. 10
- A Cognitive Human Model for Virtual Commissioning of Dynamic Human-Robot Teams
- Direct and indirect reinforcement learning
  31 May 2021 | International Journal of Intelligent Systems, Vol. 36, No. 8
- Optimal control-limit maintenance policy for a production system with multiple process states
  Computers & Industrial Engineering, Vol. 158
- Multi-Beam Power Allocation in Dynamic Massive MIMO Cloud Radio Access Networks
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
- Markov Decision Processes with Discounted Costs over a Finite Horizon: Action Elimination
  2 September 2020
- Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure
  16 May 2021
- Explainable Reinforcement Learning with the Tsetlin Machine
  19 July 2021
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
  SSRN Electronic Journal, Vol. 89
- Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms
- Task Management for Cooperative Mobile Edge Computing
- Topological Quantum Compiling with Reinforcement Learning
  19 October 2020 | Physical Review Letters, Vol. 125, No. 17
- Optimal Control of Parallel Queues for Managing Volunteer Convergence
  1 October 2020 | Production and Operations Management, Vol. 29, No. 10
- Complexity bounds for approximately solving discounted MDPs by value iterations
  Operations Research Letters, Vol. 48, No. 5
- Assessing risk of attacks in large networked system with Context Sensitive Probabilistic Modelling
- Admission control in a two-class loss system with periodically varying parameters and abandonments
  20 June 2019 | Queueing Systems, Vol. 94, No. 1-2
- Quality of Service Optimization in Mobile Edge Computing Networks via Deep Reinforcement Learning
  10 September 2020
- Visualization Analysis of Markov Chain Based on CiteSpace
  3 February 2020
- Solving the Rubik’s cube with deep reinforcement learning and search
  15 July 2019 | Nature Machine Intelligence, Vol. 1, No. 8
- Markov Decision Processes with Discounted Cost: The action elimination procedures
- A Performance Evaluation of Deep Reinforcement Learning for Model-Based Intrusion Response
- Cognitive Radio Networks: Analysis of a Paid-Sharing Approach Based on Admission Control Decisions
  9 May 2018 | Wireless Personal Communications, Vol. 101, No. 4
- DYNAMIC CONTROL OF A SINGLE-SERVER SYSTEM WHEN JOBS CHANGE STATUS
  7 June 2017 | Probability in the Engineering and Informational Sciences, Vol. 32, No. 3
- Adversarial Machine Learning: The Case of Recommendation Systems
- Parallel Hierarchical Pre-Gauss-Seidel Value Iteration Algorithm
  International Journal of Decision Support System Technology, Vol. 10, No. 2
- Model-Based Response Planning Strategies for Autonomic Intrusion Protection
  16 April 2018 | ACM Transactions on Autonomous and Adaptive Systems, Vol. 13, No. 1
- Approximate Dynamic Programming
  11 May 2018
- A semi-Lagrangian algorithm in policy space for hybrid optimal control problems
  11 July 2018 | ESAIM: Control, Optimisation and Calculus of Variations, Vol. 24, No. 3
- Optimized operation of hybrid battery systems for electric vehicles using deterministic and stochastic dynamic programming
  Journal of Energy Storage, Vol. 14
- Managing Climate Change Under Uncertainty: Recursive Integrated Assessment at an Inflection Point
  Annual Review of Resource Economics, Vol. 9, No. 1
- Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems
  Information Sciences, Vol. 411
- Algorithm 972
  9 January 2017 | ACM Transactions on Mathematical Software, Vol. 43, No. 3
- Fast and Highly Scalable Bayesian MDP on a GPU Platform
  20 August 2017
- Error Bounds of Adaptive Dynamic Programming Algorithms
  5 January 2017
- Ensuring the Reliability of Your Model Checker: Interval Iteration for Markov Decision Processes
  13 July 2017
- Design and evaluation of norm-aware agents based on Normative Markov Decision Processes
  International Journal of Approximate Reasoning, Vol. 78
- High-Performance Intrusion Response Planning on Many-Core Architectures
- Stability and monotone convergence of generalised policy iteration for discrete-time linear quadratic regulations
  7 September 2015 | International Journal of Control, Vol. 89, No. 3
- Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited
  Computers & Industrial Engineering, Vol. 87
- Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey
  9 May 2015 | International Journal of Automation and Computing, Vol. 12, No. 3
- Error Bounds of Adaptive Dynamic Programming Algorithms for Solving Undiscounted Optimal Control Problems
  IEEE Transactions on Neural Networks and Learning Systems, Vol. 26, No. 6
- Adaptive Generalized Policy Iteration in Active Fault Detection and Control ★ ★This work was supported by the Czech Science Foundation, project No. GA15-12068S.
  IFAC-PapersOnLine, Vol. 48, No. 21
- EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING
  6 August 2013 | Computational Intelligence, Vol. 30, No. 4
- Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming Decomposition
  Mercedes Esteban-Bravo,
  Jose M. Vidal-Sanz,
  Gökhan Yildirim
  21 April 2014 | Marketing Science, Vol. 33, No. 5
- Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming
  Operations Research Letters, Vol. 42, No. 6-7
- Approximate dynamic programming for stochastic N-stage optimization with application to optimal consumption under uncertainty
  19 November 2013 | Computational Optimization and Applications, Vol. 58, No. 1
- On integral generalized policy iteration for continuous-time linear quadratic regulations
  Automatica, Vol. 50, No. 2
- Imperfect Norm Enforcement in Stochastic Environments: An Analysis of Efficiency and Cost Tradeoffs
  12 November 2014
- Policy set iteration for Markov decision processes
  Automatica, Vol. 49, No. 12
- (Approximate) iterated successive approximations algorithm for sequential decision processes
  8 February 2012 | Annals of Operations Research, Vol. 208, No. 1
- Robust Modified Policy Iteration
  David L. Kaufman,
  Andrew J. Schaefer,
  6 June 2012 | INFORMS Journal on Computing, Vol. 25, No. 3
- Accelerated modified policy iteration algorithms for Markov decision processes
  27 February 2013 | Mathematical Methods of Operations Research, Vol. 78, No. 1
- On the convergence of techniques that improve value iteration
- A lexicographie approach to constrained MDP Admission Control
- Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results
  6 July 2012 | Journal of Optimization Theory and Applications, Vol. 156, No. 2
- Near-optimal continuous patrolling with teams of mobile information gathering agents
  Artificial Intelligence, Vol. 195
- Optimal Inventory Management in a Fluctuating Market
- Planning with Markov Decision Processes
  11 March 2022
- Reinforcement Learning and Markov Decision Processes
- Reinforcement Learning in Continuous State and Action Spaces
- Combined Fixed Point and Policy Iteration for Hamilton--Jacobi--Bellman Equations in Finance
  SIAM Journal on Numerical Analysis, Vol. 50, No. 4
- Value Function Iteration as a Solution Method for the Ramsey Model
  16 March 2016 | Jahrbücher für Nationalökonomie und Statistik, Vol. 231, No. 4
- Integration of inventory and transportation decisions in a logistics system
  Transportation Research Part E: Logistics and Transportation Review, Vol. 46, No. 6
- Consumer Demand for Variety: Intertemporal Effects of Consumption, Product Switching and Pricing Policies
  SSRN Electronic Journal, Vol. 45
- Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs
- Reinforcement learning design for cancer clinical trials
  11 September 2009 | Statistics in Medicine, Vol. 28, No. 26
- Postponed Updates for Temporal-Difference Reinforcement Learning
- Optimization of Web Service Composition Using Factored Markov Decision Process
- Agent-based Simulation for Research in Economics
- Valuing Corporate Financing Strategies
  SSRN Electronic Journal, Vol. 58
- Learning classifier systems: a survey
  29 March 2007 | Soft Computing, Vol. 11, No. 11
- Using Reinforcement Learning in Partial Order Plan Space
- Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks
- Improving MACS Thanks to a Comparison with 2TBNs
- Contingent planning under uncertainty via stochastic satisfiability
  Artificial Intelligence, Vol. 147, No. 1-2
- Finite State and Action MDPS
- Stochastic dynamic programming with factored representations
  Artificial Intelligence, Vol. 121, No. 1-2
- A note on policy algorithms for discounted Markov decision problems
  Operations Research Letters, Vol. 25, No. 4
- Chapter 5 Numerical solution of dynamic economic models
- General dynamic programming algorithms applied to polling systems
  Communications in Statistics. Stochastic Models, Vol. 14, No. 5
- Abstraction and approximate decision-theoretic planning
  Artificial Intelligence, Vol. 89, No. 1-2
- A K-step look-ahead analysis of value iteration algorithms for Markov decision processes
  European Journal of Operational Research, Vol. 88, No. 3
- Chapter 14 Numerical dynamic programming in economics
- Generic rank-one corrections for value iteration in Markovian decision problems
  Operations Research Letters, Vol. 17, No. 3
- Learning to act using real-time dynamic programming
  Artificial Intelligence, Vol. 72, No. 1-2
- Bibliography
  27 May 2008
- Approximate solutions for large-scale piecewise deterministic control systems arising in manufacturing flow control models
  IEEE Transactions on Robotics and Automation, Vol. 10, No. 2
- A survey of solution techniques for the partially observed Markov decision process
  Annals of Operations Research, Vol. 32, No. 1
- Dynamic Scheduling of a Robot Servicing Machines on a One-Dimensional Line
  31 May 2007 | IIE Transactions, Vol. 23, No. 4
- Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes
  ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research, Vol. 35, No. 6
- On the Algorithm of Pollatschek and Avi-ltzhak
- Approximate solutions to continuous stochastic games
- Chapter 8 Markov decision processes
- Adaptive aggregation methods for infinite horizon dynamic programming
  IEEE Transactions on Automatic Control, Vol. 34, No. 6
- Markov decision processes
  European Journal of Operational Research, Vol. 39, No. 1
- Newton-type methods for stochastic games
- Reward revision and the average reward markov decision process
  Operations-Research-Spektrum, Vol. 9, No. 4
- Reward revision for partially observed Markov decision processes
- Truncated policy iteration methods
  Operations Research Letters, Vol. 3, No. 5
- On iterative optimization ol structured Markov decision processes with discounted rewards
  4 March 2011 | Mathematische Operationsforschung und Statistik. Series Optimization, Vol. 15, No. 3
- A method of bisection for discounted Markov decision problems
  Zeitschrift für Operations Research, Vol. 23, No. 7
- Successive approximations for Markov decision processes and Markov games with unbounded rewards
  Mathematische Operationsforschung und Statistik. Series Optimization, Vol. 10, No. 3
- COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING
- THE ANALYTIC THEORY OF POLICY ITERATION11This research was partially supported by NRC Grant A3609.
- ON APPROXIMATE SOLUTIONS OF FINITE-STAGE DYNAMIC PROGRAMS

Volume 24, Issue 11

July 1978

Pages 1095-1207

Article Information

Metrics

Information

Published Online:July 01, 1978

Cite as

Martin L. Puterman, Moon Chirl Shin, (1978) Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. Management Science 24(11):1127-1137.

https://doi.org/10.1287/mnsc.24.11.1127

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Modified Policy Iteration Algorithms for Discounted Markov Decision Problems

Abstract

Volume 24, Issue 11

Article Information

Metrics

Information

Cite as

Sign Up for INFORMS Publications Updates and News