A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP
References
- [1] (2017) A variational formula for risk-sensitive reward. SIAM J. Control Optim. 55(2):961–988.Crossref, Google Scholar
- [2] (2005) Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44(1):283–312.Crossref, Google Scholar
- [3] (2000) Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stochastic Processes Their Appl. 90(1):123–144.Crossref, Google Scholar
- [4] (2008) A learning algorithm for risk-sensitive cost. Math. Oper. Res. 33(4):880–898.Link, Google Scholar
- [5] (2017) A distributional perspective on reinforcement learning. Internat. Conf. Machine Learning (JMLR.org, Sydney, NSW), 449–458.Google Scholar
- [6] (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- [7] (2001) A sensitivity formula for risk-sensitive cost and the actor-critic algorithm. Systems Control Lett. 44(5):339–346.Crossref, Google Scholar
- [8] (2002) Q-learning for risk-sensitive control. Math. Oper. Res. 27(2):294–311.Link, Google Scholar
- [9] (2010) Learning algorithms for risk-sensitive control. Proc. 19th Internat. Sympos. Math. Theory Networks Systems (MTNS), vol. 5, 1327–1332.Google Scholar
- [10] (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27(1):192–209.Link, Google Scholar
- [11] (2001) Convergence and applications of stochastic approximation with state-dependent noise. Proc. 2001 American Control Conf., vol. 2 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 744–749.Google Scholar
- [12] (2002) Stochastic approximation algorithms with expanding truncations. IFAC Proc. Volumes 35(1):403–408.Crossref, Google Scholar
- [13] (2014) Algorithms for CVaR optimization in MDPs. Preprint, submitted June 12, https://arxiv.org/abs/1406.3339.Google Scholar
- [14] (2017) Risk-constrained reinforcement learning with percentile risk criteria. J. Machine Learning Res. 18(1):6070–6120.Google Scholar
- [15] (2015) Risk-sensitive and robust decision-making: A CVaR optimization approach. Preprint, submitted June 6, https://arxiv.org/abs/1506.02188.Google Scholar
- [16] (1996) Connections between stochastic control and dynamic games. Math. Control Signals Systems 9(4):303–326.Crossref, Google Scholar
- [17] (2011) Entropic risk measures: Coherence vs. convexity, model ambiguity and robust large deviations. Stochastics Dynam. 11(02n03):333–351.Crossref, Google Scholar
- [18] (2008) Stochastic Finance (De Gruyter, Berlin, Boston).Google Scholar
- [19] (2021) On tight bounds for function approximation error in risk-sensitive reinforcement learning. Systems Control Lett. 150:104899.Crossref, Google Scholar
- [20] (2003) Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13(1):304–362.Crossref, Google Scholar
- [21] (2020) Distributed stochastic approximation algorithm with expanding truncations. IEEE Trans. Automatic Control 65(2):664–679.Crossref, Google Scholar
- [22] (2001) Simulation-based optimization of Markov reward processes. IEEE Trans. Automatic Control 46(2):191–209.Crossref, Google Scholar
- [23] (2023) Risk-sensitive policy gradient algorithm: Code and implementation details. https://github.com/mmoharami/Policy-Gradient-Risk-Sensitive-Library.Google Scholar
- [24] (2012) Robustness and risk-sensitivity in Markov decision processes. Adv. Neural Inform. Processing Systems, vol. 25 (Curran Associates, Inc., Red Hook, NY), 233–241.Google Scholar
- [25] (2018) Risk-sensitive reinforcement learning: A constrained optimization viewpoint. Preprint, submitted October 22, https://arxiv.org/abs/1810.09126.Google Scholar
- [26] (2013) Actor-critic algorithms for risk-sensitive MDPs. Adv. Neural Inform. Processing Systems 26:252–260.Google Scholar
- [27] (2016) Variance-constrained actor-critic algorithms for discounted and average reward MDPs. Machine Learning 105(3):367–417.Crossref, Google Scholar
- [28] (2002) Conditional value-at-risk for general loss distributions. J. Banking Finance 26(7):1443–1471.Crossref, Google Scholar
- [29] (2020) Improving robustness via risk averse distributional reinforcement learning. Proc. 2nd Conf. Learning Dynamics Control, vol. 120 (PMLR, New York), 958–968.Google Scholar
- [30] (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- [31] (1998) Stochastic approximation with random truncations, state-dependent noise and discontinuous dynamics. Stochastics Stochastic Rep. 64(3–4):283–326.Crossref, Google Scholar
- [32] (2013) Temporal difference methods for the variance of the reward to go. Proc. 33rd Internat. Conf. Machine Learning, vol. 28 (PMLR, New York), 495–503.Google Scholar
- [33] (2016) Learning the variance of the reward-to-go. J. Machine Learning Res. 17(1):361–396.Google Scholar
- [34] (2015) Optimizing the CVaR via sampling. Proc. AAAI Conf. Artificial Intelligence, vol. 29 (Association for the Advancement of Artificial Intelligence, Washington, DC).Google Scholar
- [35] (1982) Optimization Over Time (John Wiley & Sons, Inc., New York).Google Scholar
- [36] (1990) Risk-Sensitive Optimal Control, vol. 2 (Wiley, New York).Google Scholar
- [37] (2021) Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. Adv. Neural Inform. Processing Systems 34:2949–2964.Google Scholar

