A Learning Algorithm for Risk-Sensitive Cost

Arnab Basu
Arnab Basu
[email protected]
Quantitative Methods and Information Systems Area, Indian Institute of Management Bangalore, Bangalore 560076, India
Search for more papers by this author
,
Tirthankar Bhattacharyya
Tirthankar Bhattacharyya
[email protected]
Department of Mathematics, Indian Institute of Science, Bangalore 560012, India
Search for more papers by this author
,
Vivek S. Borkar
Vivek S. Borkar
[email protected]
School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India
Search for more papers by this author

Arnab Basu

[email protected]

Quantitative Methods and Information Systems Area, Indian Institute of Management Bangalore, Bangalore 560076, India

Search for more papers by this author

Tirthankar Bhattacharyya

[email protected]

Department of Mathematics, Indian Institute of Science, Bangalore 560012, India

Search for more papers by this author

Vivek S. Borkar

[email protected]

School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India

Search for more papers by this author

Published Online:17 Oct 2008https://doi.org/10.1287/moor.1080.0324

References

Abramovich Y. A., Aliprantis C. D.An Invitation to Operator Theory (2002) (American Mathematical Society, Providence, RI) Crossref, Google Scholar
Bagchi A., Sureshkumar K., Yong J. Dynamic asset management: Risk sensitive criterion with positive factors constraints. Recent Developments in Mathematical Finance (2002) (World Scientific, Hong Kong) 1–11Crossref, Google Scholar
Balaji S., Meyn S. P. Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stochastic Processes Their Appl. (2000) 90:123–144Crossref, Google Scholar
Bapat R. B., Raghavan T. E. S.Nonnegative Matrices and Applications (1997) (Cambridge University Press, Cambridge, UK) Crossref, Google Scholar
Barto A. G., Sutton R. S., Anderson C. Neuron-like elements that can solve difficult learning control problems. IEEE Trans. Systems Man Cybernetics (1983) 13:835–846Google Scholar
Benaim M., Azéma J., Emery M., Ledoux M., Yor M. Dynamics of stochastic approximation algorithms. Le Séminaire de Probabilités. Springer Lecture Notes in Mathematics (1999) 1709(Springer Verlag, Berlin-Heidelberg) 1–68Crossref, Google Scholar
Benveniste A., Metivier M., Priouret P.Adaptive Algorithms and Stochastic Approximations (1991) (Springer Verlag, Berlin-Heidelberg) Google Scholar
Bertsekas D. P., Nedic A. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamical Systems (2003) 13:79–110Crossref, Google Scholar
Bertsekas D. P., Tsitsiklis J. N.Neurodynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
Bertsekas D. P., Borkar V. S., Nedic A., Si J., Barto A. G., Powell W. B., Wunsch D. Improved temporal difference methods with linear function approximation. Handbook of Learning and Approximate Dynamic Programming (2004) (IEEE Press, New York) 235–259Google Scholar
Bhatia R.Matrix Analysis (1997) (Springer Verlag, New York) Crossref, Google Scholar
Bielecki T. R., Pliska S. R. Risk sensitive dynamic asset management. Appl. Math. Optim. (1999) 39:337–360Crossref, Google Scholar
Bielecki T. R., Pliska S. R. Risk sensitive asset management with transaction costs. Finance Stochastics (2000) 4:1–33Crossref, Google Scholar
Bielecki T. R., Pliska S. R. Economic properties of the risk-sensitive criterion for portfolio management. Rev. Account. Finance (2003) 2:3–17Crossref, Google Scholar
Borkar V. S. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm. Systems Control Lett. (2001) 44:339–346Crossref, Google Scholar
Borkar V. S. Q-learning for risk-sensitive control. Math. Oper. Res. (2002) 27:294–311Link, Google Scholar
Borkar V. S., Meyn S. P. The o.d.e. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. (2000) 38:447–469Crossref, Google Scholar
Borkar V. S., Meyn S. P. Risk-sensitive optimal control for Markov decision processes with monotone costs. Math. Oper. Res. (2002) 27:192–209Link, Google Scholar
Derevitskii D. P., Fradkov A. L. Two models for analyzing the dynamics of adaptation algorithms. Automation Remote Control (1974) 35:59–67Google Scholar
Di Masi G. B., Stettner L. On adaptive and multiplicative (controlled) Poisson equations: Approximation and probability. Bonach Center Publications (2006) 72:57–70Crossref, Google Scholar
Di Masi G. B., Stettner L. Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optim. (2007) 46:231–252Crossref, Google Scholar
Golub G. H., Van Loan C. F.Matrix Computations (1996) (Johns Hopkins University Press, Baltimore, MD) Google Scholar
Hirsch M. W. Convergent activation dynamics in continuous time networks. Neural Networks (1989) 2:331–349Crossref, Google Scholar
Karatzas I., Shreve S. E.Brownian Motion and Stochastic Calculus (1988) (Springer-Verlag, New York) Crossref, Google Scholar
Kontoyiannis I., Meyn S. P. Spectral theory and limit theorems for geometrically ergodic Markov processes. Anal. Appl. Probab. (2003) 13:304–362Crossref, Google Scholar
Kontoyiannis I., Meyn S. P. Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes. Electronic J. Probab. (2005) 10:61–123Crossref, Google Scholar
Kushner H. J., Dupuis P.Numerical Methods for Stochastic Control Problems in Continuous Time (2001) 2nd ed.(Springer-Verlag, New York) Crossref, Google Scholar
Si J., Barto A. G., Powell W. B., Wunsch D.Handbook of Learning and Approximate Dynamic Programming (2004) (IEEE Press, New York) Crossref, Google Scholar
Stewart G. W.Matrix Algorithms (2001) II(SIAM, Philadelphia) Crossref, Google Scholar
Sutton R. S., Barto A. G.Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42:674–690Crossref, Google Scholar
Watkins C. J. C. H. Learning from delayed rewards. (1989) . Ph.D. thesis, University of Cambridge, Cambridge, UKGoogle Scholar
Wilson F. W. Smoothing derivatives of functions and applications. Trans. Amer. Math. Soc. (1969) 139:413–428Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 33, Issue 4

November 2008

Pages 769-1024

Article Information

Metrics

Information

Received:December 29, 2006
Published Online:October 17, 2008

Cite as

Arnab Basu, Tirthankar Bhattacharyya, Vivek S. Borkar, (2008) A Learning Algorithm for Risk-Sensitive Cost. Mathematics of Operations Research 33(4):880-898.

https://doi.org/10.1287/moor.1080.0324

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Learning Algorithm for Risk-Sensitive Cost

References

Volume 33, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News