On the Convergence of Policy Iteration in Stationary Dynamic Programming

Martin L. Puterman
Martin L. Puterman
Faculty of Commerce, University of British Columbia, 2075 Wesbrook Mall, Vancouver, British Columbia, Canada V6T 1W5
Search for more papers by this author
,
Shelby L. Brumelle
Shelby L. Brumelle
Faculty of Commerce, University of British Columbia, 2075 Wesbrook Mall, Vancouver, British Columbia, Canada V6T 1W5
Search for more papers by this author

Martin L. Puterman

Faculty of Commerce, University of British Columbia, 2075 Wesbrook Mall, Vancouver, British Columbia, Canada V6T 1W5

Search for more papers by this author

Shelby L. Brumelle

Faculty of Commerce, University of British Columbia, 2075 Wesbrook Mall, Vancouver, British Columbia, Canada V6T 1W5

Search for more papers by this author

Published Online:1 Feb 1979https://doi.org/10.1287/moor.4.1.60

Abstract

The policy iteration method of dynamic programming is studied in an abstract setting. It is shown to be equivalent to the Newton-Kantorovich iteration procedure applied to the functional equation of dynamic programming. This equivalence is used to obtain the rate of convergence and error bounds for the sequence of values generated by policy iteration. These results are discussed in the context of the finite state Markovian decision problem with compact action space. An example is analyzed in detail.

Cited by
- Numerical Hopf–Lax formulae for Hamilton–Jacobi equations on unstructured geometries
  Journal of Computational and Applied Mathematics, Vol. 481
- A policy iteration method for inverse mean field games
  15 April 2026 | Research in the Mathematical Sciences, Vol. 13, No. 2
- Semismooth newton methods for risk-averse Markov decision processes
  Automatica, Vol. 188
- From Optimization to Control: Quasi-Policy Iteration
  IEEE Transactions on Automatic Control, Vol. 71, No. 6
- Convergence Analysis for Entropy-Regularized Control Problems: A Probabilistic Approach
  9 April 2026 | SIAM Journal on Control and Optimization, Vol. 64, No. 2
- Back in time. fast. Accelerated time iterations
  Journal of Economic Dynamics and Control, Vol. 182
- Policy Iteration for Exploratory Hamilton–Jacobi–Bellman Equations
  17 March 2025 | Applied Mathematics & Optimization, Vol. 91, No. 2
- Modified monotone policy iteration for interpretable policies in Markov decision processes and the impact of state ordering rules
  24 July 2024 | Annals of Operations Research, Vol. 347, No. 2
- Dynamic Perturbation
  9 April 2024 | Review of Economic Studies, Vol. 92, No. 2
- The Art of Temporal Approximation: An Investigation into Numerical Solutions to Discrete- and Continuous-Time Problems in Economics
  3 May 2024 | Computational Economics, Vol. 65, No. 3
- Policy Iteration for the Deterministic Control Problems—A Viscosity Approach
  24 January 2025 | SIAM Journal on Control and Optimization, Vol. 63, No. 1
- A Policy Iteration Algorithm for N-player General-Sum Linear Quadratic Dynamic Games
- Consistent smooth approximation of feedback laws for infinite horizon control problems with non-smooth value functions
  Journal of Differential Equations, Vol. 411
- Probabilistic Framework of Howard's Policy Iteration: BML Evaluation and Robust Convergence Analysis
  IEEE Transactions on Automatic Control, Vol. 69, No. 8
- Experimental Validation of Data-Driven Adaptive Optimal Control for Continuous-Time Systems Via Hybrid Iteration: An Application to Rotary Inverted Pendulum
  IEEE Transactions on Industrial Electronics, Vol. 71, No. 6
- A Minimum Discounted Reward Hamilton–Jacobi Formulation for Computing Reachable Sets
  IEEE Transactions on Automatic Control, Vol. 69, No. 2
- Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming
  IFAC-PapersOnLine, Vol. 58, No. 18
- A Note on Generalized Second-Order Value Iteration in Markov Decision Processes
  7 November 2023 | Journal of Optimization Theory and Applications, Vol. 199, No. 3
- Optimal polynomial feedback laws for finite horizon control problems
  Computers & Mathematics with Applications, Vol. 148
- The Art of Temporal Approximation: An Investigation into Numerical Solutions to Discrete and Continuous-Time Problems in Economics
- Value-Gradient Based Formulation of Optimal Control Problem and Machine Learning Algorithm
  26 April 2023 | SIAM Journal on Numerical Analysis, Vol. 61, No. 2
- Policy Iteration Method for Time-Dependent Mean Field Games Systems with Non-separable Hamiltonians
  13 January 2023 | Applied Mathematics & Optimization, Vol. 87, No. 2
- A First-Order Approach to Accelerated Value Iteration
  Vineet Goyal,
  Julien Grand-Clément
  24 March 2022 | Operations Research, Vol. 71, No. 2
- Search and Wealth Distribution in a Frictional Labor Market Model
  1 January 2023 | SSRN Electronic Journal, Vol. 89
- Policy Iteration for Multiplicative Noise Output Feedback Control
- Optimal investment strategies for pension funds with regulation-conform dynamic pension payment management in the absence of guarantees
  29 October 2021 | European Actuarial Journal, Vol. 12, No. 2
- Continuous vs. discrete time: Some computational insights
  Journal of Economic Dynamics and Control, Vol. 144
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
  17 June 2021 | International Journal of Control, Vol. 95, No. 10
- Optimal price-threshold control for battery operation with aging phenomenon: a quasiconvex optimization approach
  8 May 2017 | Annals of Operations Research, Vol. 317, No. 2
- Applications of Markov chain approximation methods to optimal control problems in economics
  Journal of Economic Dynamics and Control, Vol. 143
- Convergence of Dynamic Programming on the Semidefinite Cone for Discrete-Time Infinite-Horizon LQR
  IEEE Transactions on Automatic Control, Vol. 67, No. 10
- Rates of convergence for the policy iteration method for Mean Field Games systems
  Journal of Mathematical Analysis and Applications, Vol. 512, No. 1
- Generalized Second-Order Value Iteration in Markov Decision Processes
  IEEE Transactions on Automatic Control, Vol. 67, No. 8
- Approximations and Optimal Control for State-Dependent Limited Processor Sharing Queues
  Varun Gupta,
  Jiheng Zhang
  13 January 2022 | Stochastic Systems, Vol. 12, No. 2
- Newton’s method for reinforcement learning and model predictive control
  Results in Control and Optimization, Vol. 7
- On linear and super-linear convergence of Natural Policy Gradient algorithm
  Systems & Control Letters, Vol. 164
- Approximating Optimal feedback Controllers of Finite Horizon Control Problems Using Hierarchical Tensor Formats
  21 June 2022 | SIAM Journal on Scientific Computing, Vol. 44, No. 3
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
- Two-scale methods for convex envelopes
  13 October 2021 | Mathematics of Computation, Vol. 91, No. 333
- Dynamic Programming Through the Lens of Semismooth Newton-Type Methods
  IEEE Control Systems Letters, Vol. 6
- Continuous vs. Discrete Time: Numerical Gains from Trade
  SSRN Electronic Journal, Vol. 32
- Continuous-Time Speed for Discrete-Time Models: A Markov-Chain Approximation Method
  SSRN Electronic Journal, Vol. 89
- Multilevel Techniques for the Solution of HJB Minimum-Time Control Problems
  11 January 2022 | Journal of Systems Science and Complexity, Vol. 34, No. 6
- A Generalized Time Iteration Method for Solving Dynamic Optimization Problems with Occasionally Binding Constraints
  26 August 2020 | Computational Economics, Vol. 58, No. 2
- A Neural Network-Based Policy Iteration Algorithm with Global $$H^2$$-Superlinear Convergence for Stochastic Games on Domains
  18 May 2020 | Foundations of Computational Mathematics, Vol. 21, No. 2
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
- A policy iteration method for mean field games
  27 July 2021 | ESAIM: Control, Optimisation and Calculus of Variations, Vol. 27
- Tensor Decomposition Methods for High-dimensional Hamilton--Jacobi--Bellman Equations
  10 May 2021 | SIAM Journal on Scientific Computing, Vol. 43, No. 3
- Applications of Markov Chain Approximation Methods to Optimal Control Problems in Economics
  SSRN Electronic Journal, Vol. 89
- A Mean Field Games model for finite mixtures of Bernoulli and categorical distributions
  Journal of Dynamics and Games, Vol. 8, No. <![CDATA[1]]>
- Machine learning and structural econometrics: contrasts and synergies
  29 August 2020 | The Econometrics Journal, Vol. 23, No. 3
- Exponential Convergence and Stability of Howard's Policy Improvement Algorithm for Controlled Diffusions
  11 May 2020 | SIAM Journal on Control and Optimization, Vol. 58, No. 3
- Optimal Investment Strategies for Pension Funds with Regulation-Conform Dynamic Pension Payment Management in the Absence of Guarantees
  SSRN Electronic Journal, Vol. 38
- Unique Tarski Fixed Points
  Massimo Marinacci,
  Luigi Montrucchio
  4 June 2019 | Mathematics of Operations Research, Vol. 44, No. 4
- Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation
  IEEE Transactions on Robotics, Vol. 35, No. 5
- The Primal‐Dual Active Set Method for a Class of Nonlinear Problems with T ‐Monotone Operators
  17 March 2019 | Mathematical Problems in Engineering, Vol. 2019, No. 1
- Numerical approximation of equations involving minimal/maximal operators by successive solution of obstacle problems
  Journal of Computational and Applied Mathematics, Vol. 342
- Domain decomposition based parallel Howard’s algorithm
  Mathematics and Computers in Simulation, Vol. 147
- A semi-Lagrangian algorithm in policy space for hybrid optimal control problems
  11 July 2018 | ESAIM: Control, Optimisation and Calculus of Variations, Vol. 24, No. 3
- A Semi-Lagrangian Scheme for a Modified Version of the Hughes’ Model for Pedestrian Flow
  1 September 2016 | Dynamic Games and Applications, Vol. 7, No. 4
- A Primer on Portfolio Choice with Small Transaction Costs
  Annual Review of Financial Economics, Vol. 9, No. 1
- A discrete Hughes model for pedestrian flow on graphs
  Networks & Heterogeneous Media, Vol. 12, No. 1
- Solution and Estimation of Dynamic Discrete Choice Structural Models Using Euler Equations
  SSRN Electronic Journal, Vol. 83
- A Primer on Portfolio Choice with Small Transaction Costs
  SSRN Electronic Journal, Vol. 19
- Low latency policy iteration via parallel processing and randomization
- Inequality Constraints and Euler Equation-based Solution Methods
  12 June 2014 | The Economic Journal, Vol. 125, No. 585
- An Accelerated Value/Policy Iteration Scheme for Optimal Control Problems and Games
  31 October 2014
- An Efficient Policy Iteration Algorithm for Dynamic Programming Equations
  SIAM Journal on Scientific Computing, Vol. 37, No. 1
- Valuing Customer Portfolios with Endogenous Mass and Direct Marketing Interventions Using a Stochastic Dynamic Programming Decomposition
  Mercedes Esteban-Bravo,
  Jose M. Vidal-Sanz,
  Gökhan Yildirim
  21 April 2014 | Marketing Science, Vol. 33, No. 5
- A numerical method for pricing European options with proportional transaction costs
  19 February 2014 | Journal of Global Optimization, Vol. 60, No. 1
- Optimal consumption under uncertainty, liquidity constraints, and bounded rationality
  Journal of Economic Dynamics and Control, Vol. 39
- Recent Results in the Approximation of Nonlinear Optimal Control Problems
  26 June 2014
- Discontinuous Galerkin Finite Element Approximation of Hamilton--Jacobi--Bellman Equations with Cordes Coefficients
  SIAM Journal on Numerical Analysis, Vol. 52, No. 2
- (Approximate) iterated successive approximations algorithm for sequential decision processes
  8 February 2012 | Annals of Operations Research, Vol. 208, No. 1
- Multigrid Methods for Second Order Hamilton--Jacobi--Bellman and Hamilton--Jacobi--Bellman--Isaacs Equations
  SIAM Journal on Scientific Computing, Vol. 35, No. 5
- Multigrid methods for two‐player zero‐sum stochastic games
  17 January 2012 | Numerical Linear Algebra with Applications, Vol. 19, No. 2
- Value Function Iteration as a Solution Method for the Ramsey Model
  16 March 2016 | Jahrbücher für Nationalökonomie und Statistik, Vol. 231, No. 4
- Maximize user rewards in distributed generation environments using reinforcement learning
- Total Expected Discounted Reward MDPS : Policy Iteration Algorithm
  15 February 2011
- Mathematical programming based debugging
  Electronic Notes in Discrete Mathematics, Vol. 36
- Some Convergence Results for Howard's Algorithm
  SIAM Journal on Numerical Analysis, Vol. 47, No. 4
- Relaxed dynamic programming in switching systems
  IEE Proceedings - Control Theory and Applications, Vol. 153, No. 5
- NUMERICAL METHODS FOR DIFFERENTIAL GAMES BASED ON PARTIAL DIFFERENTIAL EQUATIONS
  20 November 2011 | International Game Theory Review, Vol. 08, No. 02
- EMPIRICAL RESULTS ON CONVERGENCE AND EXPLORATION IN APPROXIMATE POLICY ITERATION
  IFAC Proceedings Volumes, Vol. 38, No. 1
- A Markov-decision-based price comparison problem for mobile agent-based Internet commerce system (MAGICS)
- Business-to-consumer and business-to-business Mobile AGent-based Internet Commerce System (MAGICS)
- Design of an active forwarding scheme for ISDN3
- Convergence Properties of Policy Iteration
  SIAM Journal on Control and Optimization, Vol. 42, No. 6
- Finite State and Action MDPS
- Chapter 5 Numerical solution of dynamic economic models
- General dynamic programming algorithms applied to polling systems
  Communications in Statistics. Stochastic Models, Vol. 14, No. 5
- A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic Discretizations
  SSRN Electronic Journal, Vol. 17
- Bibliography
  27 May 2008
- Approximate solutions for large-scale piecewise deterministic control systems arising in manufacturing flow control models
  IEEE Transactions on Robotics and Automation, Vol. 10, No. 2
- Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory
  ZOR - Methods and Models of Operations Research, Vol. 40, No. 1
- Policy iteration and Newton-Raphson methods for Markov decision processes under average cost criterion
  Computers & Mathematics with Applications, Vol. 24, No. 1-2
- Chapter 8 Markov decision processes
- Discrete Dynamic Programming and Viscosity Solutions of the Bellman Equation
  Annales de l'Institut Henri Poincaré C, Analyse non linéaire, Vol. 6
- Some Estimates for Finite Difference Approximations
  SIAM Journal on Control and Optimization, Vol. 27, No. 3
- Generalized Inverses in Discrete Time Markov Decision Processes
  SIAM Journal on Matrix Analysis and Applications, Vol. 10, No. 1
- Counter examples for compact action markov decision chains with average reward criteria
  2 May 2007 | Communications in Statistics. Stochastic Models, Vol. 3, No. 3
- Discounted MDP’s: Distribution Functions and Exponential Utility Maximization
  SIAM Journal on Control and Optimization, Vol. 25, No. 1
- The variational calculus and approximation in policy space for Markovian decision processes
  Journal of Mathematical Analysis and Applications, Vol. 111, No. 1
- Computing optimal ( s, S ) policies in inventory models with continuous demands
  1 July 2016 | Advances in Applied Probability, Vol. 17, No. 2
- A Fixed Point Approach to Undiscounted Markov Renewal Programs
  SIAM Journal on Algebraic Discrete Methods, Vol. 5, No. 4
- Countable-state average-cost regenerative stopping problems
  14 July 2016 | Journal of Applied Probability, Vol. 18, No. 2
- On the convergence of policy iteration for controlled diffusions
  Journal of Optimization Theory and Applications, Vol. 33, No. 1

cover image Mathematics of Operations Research

Volume 4, Issue 1

February 1979

Pages 1-97

Article Information

Metrics

Information

Published Online:February 01, 1979

Cite as

Martin L. Puterman, Shelby L. Brumelle, (1979) On the Convergence of Policy Iteration in Stationary Dynamic Programming. Mathematics of Operations Research 4(1):60-69.

https://doi.org/10.1287/moor.4.1.60

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On the Convergence of Policy Iteration in Stationary Dynamic Programming

Abstract

Volume 4, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News