Optimal Adaptive Policies for Markov Decision Processes

Apostolos N. Burnetas
Apostolos N. Burnetas
Department of Operations Research, Case Western Reserve University, Cleveland, Ohio 44106
Search for more papers by this author
,
Michael N. Katehakis
Michael N. Katehakis
Faculty of Management and RUTCOR, Rutgers University, Newark, New Jersey 07102
Search for more papers by this author

Apostolos N. Burnetas

Department of Operations Research, Case Western Reserve University, Cleveland, Ohio 44106

Search for more papers by this author

Michael N. Katehakis

Faculty of Management and RUTCOR, Rutgers University, Newark, New Jersey 07102

Search for more papers by this author

Published Online:1 Feb 1997https://doi.org/10.1287/moor.22.1.222

Abstract

In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.

Cited by
- Learning Payoffs While Routing in Skill-Based Queues
  Sanne van Kempen,
  Jaron Sanders,
  Fiona Sloothaak,
  Maarten G. Wolf
  5 May 2026 | Stochastic Systems, Vol. 0, No. 0
- Deviations from the Nash equilibrium in a two-player optimal execution game with reinforcement learning
  28 April 2026 | Annals of Operations Research, Vol. 69
- Optimal pricing and inventory strategies with delivery delay compensation in dominated markets
  7 April 2026 | Annals of Operations Research, Vol. 54
- Complexity Bounds for Deterministic Partially Observed Markov Decision Processes
  30 October 2024 | Annals of Operations Research, Vol. 344, No. 1
- Data-Driven Rules for Multidimensional Reflection Problems
  14 October 2024 | SIAM/ASA Journal on Uncertainty Quantification, Vol. 12, No. 4
- Transfer Learning in Bandits With Latent Continuity
  IEEE Transactions on Information Theory, Vol. 70, No. 11
- Logarithmic Regret Bounds for Continuous-Time Average-Reward Markov Decision Processes
  10 September 2024 | SIAM Journal on Control and Optimization, Vol. 62, No. 5
- A model-adaptive random search actor critic: convergence analysis and inventory-control case studies
  23 September 2024 | Annals of Operations Research, Vol. 41
- Exploring Quantum Cognition
- ACTOR: Adaptive Control of Transmission Power in RPL
  6 April 2024 | Sensors, Vol. 24, No. 7
- Motion Planning for The Estimation of Functions*
- Efficiency and fairness trade-offs in two player bargaining games
  24 October 2023 | European Journal for Philosophy of Science, Vol. 13, No. 4
- Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning
  Computer Methods in Applied Mechanics and Engineering, Vol. 416
- Nonstationary Reinforcement Learning: The Blessing of (More) Optimism
  Wang Chi Cheung,
  David Simchi-Levi,
  Ruihao Zhu
  22 February 2023 | Management Science, Vol. 69, No. 10
- Convergence rate analysis for optimal computing budget allocation algorithms
  Automatica, Vol. 153
- Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds
  Shipra Agrawal,
  Randy Jia
  6 May 2022 | Mathematics of Operations Research, Vol. 48, No. 1
- Artificial Intelligence in the Earth Sciences
  14 July 2023
- Reinforcement Learning
  16 September 2023
- ICACIA: An Intelligent Context-Aware framework for COBOT in defense industry using ontological and deep learning models
  Robotics and Autonomous Systems, Vol. 157
- Robust control of the multi-armed bandit problem
  21 August 2015 | Annals of Operations Research, Vol. 317, No. 2
- Temporal concatenation for Markov decision processes
  13 July 2021 | Probability in the Engineering and Informational Sciences, Vol. 36, No. 4
- SIFTER
  16 November 2022 | Proceedings of the VLDB Endowment, Vol. 16, No. 1
- When learning becomes impossible
  20 June 2022
- Bias in semantic and discourse interpretation
  3 July 2021 | Linguistics and Philosophy, Vol. 45, No. 3
- Learning and data-driven optimization in queues with strategic customers
  25 May 2022 | Queueing Systems, Vol. 100, No. 3-4
- Distribution-Free Reinforcement Learning
  3 December 2022
- Deep Reinforcement Learning for Agriculture: Principles and Use Cases
  12 October 2021
- Strategic Earning on Tokenized Platforms via Model-based Decision Making
  SSRN Electronic Journal, Vol. 17
- Applied machine learning in Alzheimer's disease research: omics, imaging, and clinical data
  9 December 2021 | Emerging Topics in Life Sciences, Vol. 5, No. 6
- Regret Analysis in Deterministic Reinforcement Learning
- On the Convergence of Optimal Computing Budget Allocation Algorithms
- On the evaluation of bidding strategies in sequential auctions
  Operations Research Letters, Vol. 49, No. 6
- Memory-based Deep Reinforcement Learning for POMDPs
- Managing mobile production-inventory systems influenced by a modulation process
  6 July 2021 | Annals of Operations Research, Vol. 304, No. 1-2
- Dynamic pricing with finite price sets: a non-parametric approach
  28 June 2021 | Mathematical Methods of Operations Research, Vol. 94, No. 1
- Reinforcement Learning and Additional Rewards for the Traveling Salesman Problem
  7 September 2021
- Artificial Intelligence in the Earth Sciences
  8 September 2021
- A Novel Implementation of Q-Learning for the Whittle Index
  8 December 2021
- Image Captioning using Reinforcement Learning with BLUDEr Optimization
  14 January 2021 | Pattern Recognition and Image Analysis, Vol. 30, No. 4
- Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors
  Weidong Han,
  Warren B. Powell
  29 May 2020 | Operations Research, Vol. 68, No. 5
- Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items
  Michael N. Katehakis,
  Jian Yang,
  Tingting Zhou
  26 August 2020 | Operations Research, Vol. 68, No. 5
- Bandit Algorithms
  4 July 2020 | , Vol. 31
- Tracking the State of Large Dynamic Networks via Reinforcement Learning
- QTAccel: A Generic FPGA based Design for Q-Table based Reinforcement Learning Accelerators
- Sequential decision-making in mining and processing based on geometallurgical inputs
  Minerals Engineering, Vol. 149
- Notice of Removal: Reinforcement Learning and Additional Rewardsfor the Traveling Salesman Problem
- From self-tuning regulators to reinforcement learning and back again
- Bibliography
  11 November 2019
- Figuring out the User in a Few Steps
  25 July 2019
- Research on China’s Power Sustainable Transition Under Progressively Levelized Power Generation Cost Based on a Dynamic Integrated Generation–Transmission Planning Model
  16 April 2019 | Sustainability, Vol. 11, No. 8
- Reinforcement learning versus evolutionary computation: A survey on hybrid algorithms
  Swarm and Evolutionary Computation, Vol. 44
- Reinforcement Learning Under Drift
  SSRN Electronic Journal, Vol. 29
- Look before we leap: reinforced active sampling framework for image classification
  Journal of Electronic Imaging, Vol. 27, No. 04
- Online Learning Schemes for Power Allocation in Energy Harvesting Communications
  IEEE Transactions on Information Theory, Vol. 64, No. 6
- Stochastic Online Shortest Path Routing: The Value of Feedback
  IEEE Transactions on Automatic Control, Vol. 63, No. 4
- AMModels: An R package for storing models, data, and metadata to facilitate adaptive management
  28 February 2018 | PLOS ONE, Vol. 13, No. 2
- Massive MIMO Power Allocation in Millimeter Wave Networks
  13 June 2018
- Reinforcement learning for control: Performance, stability, and deep approximators
  Annual Reviews in Control, Vol. 46
- Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items
  SSRN Electronic Journal, Vol. 57
- Making the torch lighter: Areinforced active sampling framework for image classification
- ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT
  5 October 2016 | Probability in the Engineering and Informational Sciences, Vol. 31, No. 3
- Perspectives of approximate dynamic programming
  7 February 2012 | Annals of Operations Research, Vol. 241, No. 1-2
- Infomax Strategies for an Optimal Balance Between Exploration and Exploitation
  22 April 2016 | Journal of Statistical Physics, Vol. 163, No. 6
- Stochastic scheduling of single forest firefighting processor
  Canadian Journal of Forest Research, Vol. 46, No. 3
- Learning Uncertainty in Ocean Current Predictions for Safe and Reliable Navigation of Underwater Vehicles
  4 September 2015 | Journal of Field Robotics, Vol. 33, No. 1
- Dynamic Pricing and Learning with Finite Inventories
  Arnoud V. den Boer,
  Bert Zwart
  26 June 2015 | Operations Research, Vol. 63, No. 4
- A perpetual search for talents across overlapping generations: A learning process
  Mathematical Social Sciences, Vol. 76
- Online Learning Methods for Networking
  19 January 2015 | Foundations and Trends® in Networking, Vol. 8, No. 4
- MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT
  10 October 2014 | Probability in the Engineering and Informational Sciences, Vol. 29, No. 1
- On optimal bidding and inventory control in sequential procurement auctions: the multi period case
  15 February 2014 | Annals of Operations Research, Vol. 217, No. 1
- Sequential allocation of sampling budgets in unknown environments
- Optimal Control of Hidden Markov Models With Binary Observations
  IEEE Transactions on Automatic Control, Vol. 59, No. 1
- Stochastic finite-state systems in control theory
  Information Sciences, Vol. 251
- A simple index rule for efficient traffic splitting over parallel wireless networks with partial information
  Performance Evaluation, Vol. 70, No. 10
- Adaptive aggregation for reinforcement learning in average reward Markov decision processes
  24 January 2012 | Annals of Operations Research, Vol. 208, No. 1
- Asymptotically optimal Bayesian sequential change detection and identification rules
  12 April 2012 | Annals of Operations Research, Vol. 208, No. 1
- Kullback–Leibler upper confidence bounds for optimal sequential allocation
  The Annals of Statistics, Vol. 41, No. 3
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
  12 December 2012 | Foundations and Trends® in Machine Learning, Vol. 5, No. 1
- On bidding for a fixed number of items in a sequence of auctions
  European Journal of Operational Research, Vol. 222, No. 1
- On optimal bidding in sequential procurement auctions
  Operations Research Letters, Vol. 40, No. 4
- Approximately optimal adaptive learning in opportunistic spectrum access
- Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint
  10 July 2012
- Sample Complexity Bounds of Exploration
- Dynamic traffic splitting to parallel wireless networks with partial information: A Bayesian approach
  Performance Evaluation, Vol. 69, No. 1
- Action Time Sharing Policies for Ergodic Control of Markov Chains
  SIAM Journal on Control and Optimization, Vol. 50, No. 1
- Adaptive learning of uncontrolled restless bandits with logarithmic regret
- A Perpetual Search for Talent Across Overlapping Generations
  SSRN Electronic Journal, Vol. 32
- No Regret Routing for ad-hoc wireless networks
- Optimism in reinforcement learning and Kullback-Leibler divergence
- Online regret bounds for Markov decision processes with deterministic transitions
  Theoretical Computer Science, Vol. 411, No. 29-30
- Multi-armed bandit based policies for cognitive radio's decision making issues
- Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
- Bounded Parameter Markov Decision Processes with Average Reward Criterion
- Bayesian Adaptive Stochastic Process Termination
  Inchi Hu,
  Chi-Wen Jevons Lee,
  1 May 2003 | Mathematics of Operations Research, Vol. 28, No. 2
- Finite State and Action MDPS
- On confidence intervals from simulation of finite Markov chains
  Mathematical Methods of Operations Research, Vol. 46, No. 2

cover image Mathematics of Operations Research

Volume 22, Issue 1

February 1997

Pages 1-255

Article Information

Metrics

Information

Published Online:February 01, 1997

Cite as

Apostolos N. Burnetas, Michael N. Katehakis, (1997) Optimal Adaptive Policies for Markov Decision Processes. Mathematics of Operations Research 22(1):222-255.

https://doi.org/10.1287/moor.22.1.222

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Optimal Adaptive Policies for Markov Decision Processes

Abstract

Volume 22, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News