Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Prasenjit Karmakar
Corresponding Author
Prasenjit Karmakar
[email protected]
http://orcid.org/0000-0001-6895-2364
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India
Search for more papers by this author
,
Shalabh Bhatnagar
Shalabh Bhatnagar
[email protected]
Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India
Search for more papers by this author

Prasenjit Karmakar

Corresponding Author

Prasenjit Karmakar

[email protected]

http://orcid.org/0000-0001-6895-2364

Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India

Search for more papers by this author

Shalabh Bhatnagar

[email protected]

Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India

Search for more papers by this author

Published Online:13 Jul 2017https://doi.org/10.1287/moor.2017.0855

Abstract

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by “controlled” Markov noise. In particular, the faster and slower recursions have nonadditive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation, using our results.

cover image Mathematics of Operations Research

Volume 43, Issue 1

February 2018

Pages 1-346, C2

Article Information

Metrics

Information

Received:April 13, 2015
Accepted:February 07, 2017
Published Online:July 13, 2017

Cite as

Prasenjit Karmakar, Shalabh Bhatnagar (2017) Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning. Mathematics of Operations Research 43(1):130-151.

https://doi.org/10.1287/moor.2017.0855

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Abstract

Volume 43, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News