Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Published Online:https://doi.org/10.1287/moor.2017.0855

References

  • Aubin J, Cellina A (1984) Differential Inclusions: Set-Valued Maps and Viability Theory (Springer, Berlin).CrossrefGoogle Scholar
  • Benaïm M (1999) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Séminaire de probabilités XXXIII (Springer, Berlin), 1–68.CrossrefGoogle Scholar
  • Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.CrossrefGoogle Scholar
  • Benveniste A, Metivier M, Priouret P (1990) Adaptive Algorithms and Stochastic Approximation (Springer, New York).CrossrefGoogle Scholar
  • Borkar VS (1995) Probability Theory: An Advanced Course (Springer, New York).CrossrefGoogle Scholar
  • Borkar VS (1997) Stochastic approximation with two time scales. Systems Control Lett. 29(5):291–294.CrossrefGoogle Scholar
  • Borkar VS (2006) Stochastic approximation with “controlled Markov noise.” Systems Control Lett. 55(2):139–145.CrossrefGoogle Scholar
  • Borkar VS (2008) Stochastic Approximation: A Dynamic Systems Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Degris T, White M, Sutton RS (2012) Linear off-policy actor-critic. Proc. 29th Internat. Conf. Machine Learning, ICML, ’12 (Omnipress, Madison, WI).Google Scholar
  • Konda VR, Tsitsiklis JN (2003) Linear stochastic approximation driven by slowly varying Markov chains. Systems Control Lett. 50(2): 95–102.CrossrefGoogle Scholar
  • Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM J. Control Optim. 42(4):1143–1166.CrossrefGoogle Scholar
  • Ma DJ, Makowski AM, Shwartz A (1990) Stochastic approximations for finite state Markov chains. Stochastic Processes Their Appl. 35(1):27–45.CrossrefGoogle Scholar
  • Maei HR (2011) Gradient temporal-difference learning algorithms. PhD thesis, University of Alberta, Alberta, Canada.Google Scholar
  • Menache I, Mannor S, Shimkin N (2005) Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1):215–238.CrossrefGoogle Scholar
  • Metivier M, Priouret P (1984) Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. IEEE Trans. Inform. Theory 30(2):140–151.CrossrefGoogle Scholar
  • Rudin W (1976) Principles of Mathematical Analysis, 3rd ed. (McGraw-Hill, New York).Google Scholar
  • Sutton RS, Maei RS, Szepesvári C (2008) A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inform. Processing Systems 21, NIPS ’08.Google Scholar
  • Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. Pohoreckyj Danyluk A, Bottou L, Littman ML eds. Proc. 26th Internat. Conf. Machine Learning, ICML ’10 (ACM, New York), 993–1000.CrossrefGoogle Scholar
  • Tadić VB (2004) Almost sure convergence of two time-scale stochastic approximation algorithms. Proc. 2004 Amer. Control Conf. (IEEE, Piscataway, NJ).CrossrefGoogle Scholar
  • Tadić VB (2015) Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema. Stochastic Processes their Appl. 125(5):1715–1755.CrossrefGoogle Scholar
  • Yu H (2012) Least squares temporal difference methods: An analysis under general conditions. SIAM J. Control Optim. 50(6):3310–3343.CrossrefGoogle Scholar
  • Yu H (2016) Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize. J. Machine Learning Res. 17(220):1–58.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.