Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme
References
- [1] (1977) Optimal inspections in a stochastic control problem with costly observations. Math. Oper. Res. 2(2):155–190.Link, Google Scholar
- [2] (1978) Optimal inspections in a stochastic control problem with costly observations II. Math. Oper. Res. 3(1):67–81.Link, Google Scholar
- [3] (2016) Weakly chained matrices, policy iteration, and impulse control. SIAM J. Numerical Anal. 54(3):1341–1364.Crossref, Google Scholar
- [4] (2015) Quickest detection with discretely controlled observations. Sequential Anal. 34(1):77–133.Crossref, Google Scholar
- [5] (2022) Disorder detection with costly observations. J. Appl. Probab. 59(2):338–349.Crossref, Google Scholar
- [6] (2021) Active measure reinforcement learning for observation cost minimization. Antonie L, Zadeh PM, eds. Proc. 34th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
- [7] (2022) Balancing information with observation costs in deep reinforcement learning. Kiringa I, Gambs S, Kalala KH, eds. Proc. 35th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
- [8] (2012) Approximation schemes for monotone systems of nonlinear second order partial differential equations: Convergence result and error estimate. Differential Equations Appl. 4(2):297–317.Crossref, Google Scholar
- [9] (1971) An optimal stochastic control problem with observation cost. IEEE Trans. Automatic Control. 16(2):185–189.Crossref, Google Scholar
- [10] (2015) A quickest detection problem with an observation cost. Ann. Appl. Probab. 25(3):1475–1512.Crossref, Google Scholar
- [11] (2018) Sequential testing of a Wiener process with costly observations. Sequential Anal. 37(1):47–58.Crossref, Google Scholar
- [12] (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.Crossref, Google Scholar
- [13] (2021) Optimal causal rate-constrained sampling for a class of continuous Markov processes. IEEE Trans. Inform. Theory 67(12):7876–7890.Crossref, Google Scholar
- [14] (2008) Paging and registration in cellular networks: Jointly optimal policies and an iterative algorithm. IEEE Trans. Inform. Theory 54(2):608–622.Crossref, Google Scholar
- [15] (1989) Adaptive Markov Control Processes, Applied Mathematical Sciences (Springer, New York).Crossref, Google Scholar
- [16] (2010) A power penalty approach to a nonlinear complementarity problem. Oper. Res. Lett. 38(1):72–76.Crossref, Google Scholar
- [17] (2021) Self-triggered Markov decision processes. Proc. 60th IEEE CDC (Austin, TX), 4507–4514.Google Scholar
- [18] (2006) Parabolic variational inequalities: The Lagrange multiplier approach. J. Mathématiques Pures Appliquées 85(3):415–449.Crossref, Google Scholar
- [19] (2020) Active reinforcement learning: Observing rewards at a cost. Preprint, submitted November 24, https://arxiv.org/abs/2011.06709.Google Scholar
- [20] (1964) On the optimum timing of observations for linear control systems with unknown initial state. IEEE Trans. Automatic Control 9(2):144–150.Crossref, Google Scholar
- [21] (1992) Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, Berlin).Crossref, Google Scholar
- [22] (1967) Optimal control of measurement subsystems. IEEE Trans. Automatic Control 12(5):528–536.Crossref, Google Scholar
- [23] (2013) Optimal strategies for communication and remote estimation with an energy harvesting sensor. IEEE Trans. Automatic Control 58(9):2246–2260.Crossref, Google Scholar
- [24] (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications, 1st ed. (Springer, Berlin).Crossref, Google Scholar
- [25] (2006) Inattentive consumers. J. Monetary Econom. 53(8):1761–1800.Crossref, Google Scholar
- [26] (2006) Inattentive producers. Rev. Econom. Stud. 73(3):793–821.Crossref, Google Scholar
- [27] (2012) On the use of policy iteration as an easy way of pricing American options. SIAM J. Financial Math. 3(1):459–478.Crossref, Google Scholar
- [28] (2019) A penalty scheme for monotone systems with interconnected obstacles: Convergence and error estimates. SIAM J. Numerical Anal. 57(4):1625–1648.Crossref, Google Scholar
- [29] (2020) Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems. SIAM J. Control Optim. 58(1):243–276.Crossref, Google Scholar
- [30] (2021) A penalty scheme and policy iteration for nonlocal HJB variational inequalities with monotone nonlinearities. Comput. Math. Appl. 93:199–213.Crossref, Google Scholar
- [31] (1996) Numerical dynamic programming in economics. Amman HM, Kendrick DA, Rust J, eds. Handbook of Computational Economics, vol. 1 (Elsevier, Amsterdam), 619–729.Google Scholar
- [32] (2020) LQG control and sensing co-design. IEEE Trans. Automatic Control 66(4):1468–1483.Crossref, Google Scholar
- [33] (2013) Markov decision processes with information costs. PhD thesis, Freie Universität Berlin, Berlin.Google Scholar
- [34] (2014) Markov control processes with rare state observation: Theory and application to treatment scheduling in HIV-1. Comm. Math. Sci. 12(5):859–877.Crossref, Google Scholar
- [35] (2011) A penalty method for the numerical solution of Hamilton–Jacobi–Bellman (HJB) equations in finance. SIAM J. Numerical Anal. 49(1):213–231.Crossref, Google Scholar
- [36] (2012) Penalty methods for the solution of discrete HJB equations—Continuous control and obstacle problems. SIAM J. Numerical Anal. 50(2):595–625.Crossref, Google Scholar
- [37] (2008) Optimal sensor querying: General Markovian and LQG models with controlled observations. IEEE Trans. Automatic Control 53(6):1392–1405.Crossref, Google Scholar
- [38] (2020) Analysis and computation of an optimality equation arising in an impulse control problem with discrete and costly observations. J. Comput. Appl. Math. 366:112399.Crossref, Google Scholar
- [39] (2020) A hybrid stochastic river environmental restoration modeling with discrete and costly observations. Optimal Control Appl. Methods 41(6):1964–1994.Crossref, Google Scholar
- [40] (2021) Cost-efficient monitoring of continuous-time stochastic processes based on discrete observations. Appl. Stochastic Models Bus. Indust. 37(1):113–138.Crossref, Google Scholar
- [41] (2020) Analysis and computation of a discrete costly observation model for growth estimation and management of biological resources. Comput. Math. Appl. 79(4):1072–1093.Crossref, Google Scholar

