A Learning Algorithm for Risk-Sensitive Cost

Published Online:https://doi.org/10.1287/moor.1080.0324

References

  • Abramovich Y. A., Aliprantis C. D.An Invitation to Operator Theory (2002) (American Mathematical Society, Providence, RI) CrossrefGoogle Scholar
  • Bagchi A., Sureshkumar K., Yong J. Dynamic asset management: Risk sensitive criterion with positive factors constraints. Recent Developments in Mathematical Finance (2002) (World Scientific, Hong Kong) 1–11CrossrefGoogle Scholar
  • Balaji S., Meyn S. P. Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stochastic Processes Their Appl. (2000) 90:123–144CrossrefGoogle Scholar
  • Bapat R. B., Raghavan T. E. S.Nonnegative Matrices and Applications (1997) (Cambridge University Press, Cambridge, UK) CrossrefGoogle Scholar
  • Barto A. G., Sutton R. S., Anderson C. Neuron-like elements that can solve difficult learning control problems. IEEE Trans. Systems Man Cybernetics (1983) 13:835–846Google Scholar
  • Benaim M., Azéma J., Emery M., Ledoux M., Yor M. Dynamics of stochastic approximation algorithms. Le Séminaire de Probabilités. Springer Lecture Notes in Mathematics (1999) 1709(Springer Verlag, Berlin-Heidelberg) 1–68CrossrefGoogle Scholar
  • Benveniste A., Metivier M., Priouret P.Adaptive Algorithms and Stochastic Approximations (1991) (Springer Verlag, Berlin-Heidelberg) Google Scholar
  • Bertsekas D. P., Nedic A. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamical Systems (2003) 13:79–110CrossrefGoogle Scholar
  • Bertsekas D. P., Tsitsiklis J. N.Neurodynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D. P., Borkar V. S., Nedic A., Si J., Barto A. G., Powell W. B., Wunsch D. Improved temporal difference methods with linear function approximation. Handbook of Learning and Approximate Dynamic Programming (2004) (IEEE Press, New York) 235–259Google Scholar
  • Bhatia R.Matrix Analysis (1997) (Springer Verlag, New York) CrossrefGoogle Scholar
  • Bielecki T. R., Pliska S. R. Risk sensitive dynamic asset management. Appl. Math. Optim. (1999) 39:337–360CrossrefGoogle Scholar
  • Bielecki T. R., Pliska S. R. Risk sensitive asset management with transaction costs. Finance Stochastics (2000) 4:1–33CrossrefGoogle Scholar
  • Bielecki T. R., Pliska S. R. Economic properties of the risk-sensitive criterion for portfolio management. Rev. Account. Finance (2003) 2:3–17CrossrefGoogle Scholar
  • Borkar V. S. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm. Systems Control Lett. (2001) 44:339–346CrossrefGoogle Scholar
  • Borkar V. S. Q-learning for risk-sensitive control. Math. Oper. Res. (2002) 27:294–311LinkGoogle Scholar
  • Borkar V. S., Meyn S. P. The o.d.e. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. (2000) 38:447–469CrossrefGoogle Scholar
  • Borkar V. S., Meyn S. P. Risk-sensitive optimal control for Markov decision processes with monotone costs. Math. Oper. Res. (2002) 27:192–209LinkGoogle Scholar
  • Derevitskii D. P., Fradkov A. L. Two models for analyzing the dynamics of adaptation algorithms. Automation Remote Control (1974) 35:59–67Google Scholar
  • Di Masi G. B., Stettner L. On adaptive and multiplicative (controlled) Poisson equations: Approximation and probability. Bonach Center Publications (2006) 72:57–70CrossrefGoogle Scholar
  • Di Masi G. B., Stettner L. Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optim. (2007) 46:231–252CrossrefGoogle Scholar
  • Golub G. H., Van Loan C. F.Matrix Computations (1996) (Johns Hopkins University Press, Baltimore, MD) Google Scholar
  • Hirsch M. W. Convergent activation dynamics in continuous time networks. Neural Networks (1989) 2:331–349CrossrefGoogle Scholar
  • Karatzas I., Shreve S. E.Brownian Motion and Stochastic Calculus (1988) (Springer-Verlag, New York) CrossrefGoogle Scholar
  • Kontoyiannis I., Meyn S. P. Spectral theory and limit theorems for geometrically ergodic Markov processes. Anal. Appl. Probab. (2003) 13:304–362CrossrefGoogle Scholar
  • Kontoyiannis I., Meyn S. P. Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes. Electronic J. Probab. (2005) 10:61–123CrossrefGoogle Scholar
  • Kushner H. J., Dupuis P.Numerical Methods for Stochastic Control Problems in Continuous Time (2001) 2nd ed.(Springer-Verlag, New York) CrossrefGoogle Scholar
  • Si J., Barto A. G., Powell W. B., Wunsch D.Handbook of Learning and Approximate Dynamic Programming (2004) (IEEE Press, New York) CrossrefGoogle Scholar
  • Stewart G. W.Matrix Algorithms (2001) II(SIAM, Philadelphia) CrossrefGoogle Scholar
  • Sutton R. S., Barto A. G.Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
  • Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42:674–690CrossrefGoogle Scholar
  • Watkins C. J. C. H. Learning from delayed rewards. (1989) . Ph.D. thesis, University of Cambridge, Cambridge, UKGoogle Scholar
  • Wilson F. W. Smoothing derivatives of functions and applications. Trans. Amer. Math. Soc. (1969) 139:413–428CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.