Open Access

Empirical Q-Value Iteration

Dileep Kalathil
Corresponding Author
Dileep Kalathil
[email protected]
https://orcid.org/0000-0002-7403-4006
Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843;
Search for more papers by this author
,
Vivek S. Borkar
Vivek S. Borkar
[email protected]
Department of Electrical Engineering, Indian Institute of Technology Mumbai, Mumbai 400076, India;
Search for more papers by this author
,
Rahul Jain
Rahul Jain
[email protected]
Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089
Search for more papers by this author

Dileep Kalathil

Corresponding Author

Dileep Kalathil

[email protected]

https://orcid.org/0000-0002-7403-4006

Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843;

Search for more papers by this author

Vivek S. Borkar

[email protected]

Department of Electrical Engineering, Indian Institute of Technology Mumbai, Mumbai 400076, India;

Search for more papers by this author

Rahul Jain

[email protected]

Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Published Online:9 Oct 2020https://doi.org/10.1287/stsy.2019.0062

Abstract

We propose a new simple and natural algorithm for learning the optimal $Q$ -value function of a discounted-cost Markov decision process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as $Q$ -learning and actor-critic algorithms, this algorithm does not depend on a stochastic approximation-based method. We show that our algorithm, which we call the empirical $Q$ -value iteration algorithm, converges to the optimal $Q$ -value function. We also give a rate of convergence or a nonasymptotic sample complexity bound and show that an asynchronous (or online) version of the algorithm will also work. Preliminary experimental results suggest a faster rate of convergence to a ballpark estimate for our algorithm compared with stochastic approximation-based algorithms.

Volume 11, Issue 1

March 2021

Pages 1-81

Article Information

Metrics

Information

Received:April 28, 2020
Accepted:April 29, 2020
Published Online:October 09, 2020

Cite as

Dileep Kalathil, Vivek S. Borkar, Rahul Jain (2020) Empirical Q-Value Iteration. Stochastic Systems 11(1):1-18.

https://doi.org/10.1287/stsy.2019.0062

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Empirical Q-Value Iteration

Abstract

Volume 11, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News