On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes
References
- (2008) H-Infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2008) A learning algorithm for risk-sensitive cost. Math. Oper. Res. 33(4):880–898.Link, Google Scholar
- (2012a) Dynamic Programming and Optimal Control: Volume I (Athena Scientific, Belmont, MA).Google Scholar
- (2012b) Dynamic Programming and Optimal Control: Volume II; Approximate Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- (1999) Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Methods Oper. Res. 50(2):167–188.Crossref, Google Scholar
- (2001) A sensitivity formula for risk-sensitive cost and the actor–critic algorithm. Systems Control Lett. 44(5):339–346.Crossref, Google Scholar
- (2002) Q-learning for risk-sensitive control. Math. Oper. Res. 27(2):294–311.Link, Google Scholar
- (2010) Learning algorithms for risk-sensitive control. Proc. 19th Internat. Sympos. Math. Theory Networks Systems (Budapest), vol. 5.Google Scholar
- (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27(1):192–209.Link, Google Scholar
- (2003) The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Oper. Res. 28(4):752–776.Link, Google Scholar
- (2019) Distributionally robust optimization for sequential decision-making. Optimization 68(12):2397–2426.Crossref, Google Scholar
- (2021) First-order methods for Wasserstein distributionally robust MDP. Internat. Conf. Machine Learn., 2010–2019 (PMLR, New York).Google Scholar
- (1975) On a variational formula for the principal eigenvalue for operators with maximum principle. Proc. Natl. Acad. Sci. USA 72(3):780–783.Crossref, Google Scholar
- (2013) A Course in Robust Control Theory: A Convex Approach, vol. 36 (Springer Science & Business Media, New York).Google Scholar
- (2018) Beyond the one-step greedy approach in reinforcement learning. Internat. Conf. Machine Learn (PMLR, New York), 1387–1396.Google Scholar
- (2020) Risk-sensitive reinforcement learning: Near-optimal risk-sample tradeoff in regret. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 22384–22395.Google Scholar
- (2022) Robust Markov decision process: Beyond rectangularity. Math. Oper. Res. 47(3):1772–1800.Google Scholar
- (2023) RASR: Risk-averse soft-robust MDPs with EVaR and entropic risk. Proc. 26th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2023), vol. 206 (PMLR, New York), 10022–10059.Google Scholar
- (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
- (2016) Robust MDPs with k-rectangular uncertainty. Math. Oper. Res. 41(4):1484–1509.Link, Google Scholar
- (2024) A policy gradient algorithm for the risk-sensitive exponential cost MDP. Math. Oper. Res. 50(1):431–458.Google Scholar
- (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Google Scholar
- (2002) A spectral theoretic proof of Perron-Frobenius. Math. Proc. Roy. Irish Acad. (JSTOR), 29–35.Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (1980) Successive approximations for average reward Markov games. Internat. J. Game Theory 9(1):13–24.Crossref, Google Scholar
- (1990) Risk-Sensitive Optimal Control, vol. 2 (Wiley, Chichester, UK).Google Scholar
- (2023) On the convergence of policy iteration-based reinforcement learning with Monte Carlo policy evaluation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 9852–9878.Google Scholar
- (2021) The role of lookahead and approximate policy evaluation in policy iteration with linear value function approximation. Preprint, submitted September 28, https://arxiv.org/abs/2109.13419.Google Scholar
- (2010) Distributionally robust Markov decision processes. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2017) A convex optimization approach to distributionally robust Markov decision processes with Wasserstein distance. IEEE Control Systems Lett. 1(1):164–169.Crossref, Google Scholar
- (1998) Essentials of Robust Control, vol. 104 (Prentice Hall, Upper Saddle River, NJ).Google Scholar

