Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Published Online:https://doi.org/10.1287/moor.2023.0172

References

  • [1] Anderson RF, Friedman A (1977) Optimal inspections in a stochastic control problem with costly observations. Math. Oper. Res. 2(2):155–190.LinkGoogle Scholar
  • [2] Anderson RF, Friedman A (1978) Optimal inspections in a stochastic control problem with costly observations II. Math. Oper. Res. 3(1):67–81.LinkGoogle Scholar
  • [3] Azimzadeh P, Forsyth PA (2016) Weakly chained matrices, policy iteration, and impulse control. SIAM J. Numerical Anal. 54(3):1341–1364.CrossrefGoogle Scholar
  • [4] Bayraktar E, Kravitz R (2015) Quickest detection with discretely controlled observations. Sequential Anal. 34(1):77–133.CrossrefGoogle Scholar
  • [5] Bayraktar E, Ekström E, Guo J (2022) Disorder detection with costly observations. J. Appl. Probab. 59(2):338–349.CrossrefGoogle Scholar
  • [6] Bellinger C, Coles R, Crowley M, Tamblyn I (2021) Active measure reinforcement learning for observation cost minimization. Antonie L, Zadeh PM, eds. Proc. 34th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
  • [7] Bellinger C, Drozdyuk A, Crowley M, Tamblyn I (2022) Balancing information with observation costs in deep reinforcement learning. Kiringa I, Gambs S, Kalala KH, eds. Proc. 35th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
  • [8] Briani A, Camilli F, Zidani H (2012) Approximation schemes for monotone systems of nonlinear second order partial differential equations: Convergence result and error estimate. Differential Equations Appl. 4(2):297–317.CrossrefGoogle Scholar
  • [9] Cooper C, Hahi N (1971) An optimal stochastic control problem with observation cost. IEEE Trans. Automatic Control. 16(2):185–189.CrossrefGoogle Scholar
  • [10] Dalang RC, Shiryaev AN (2015) A quickest detection problem with an observation cost. Ann. Appl. Probab. 25(3):1475–1512.CrossrefGoogle Scholar
  • [11] Dyrssen H, Ekström E (2018) Sequential testing of a Wiener process with costly observations. Sequential Anal. 37(1):47–58.CrossrefGoogle Scholar
  • [12] Forsyth PA, Vetzal KR (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.CrossrefGoogle Scholar
  • [13] Guo N, Kostina V (2021) Optimal causal rate-constrained sampling for a class of continuous Markov processes. IEEE Trans. Inform. Theory 67(12):7876–7890.CrossrefGoogle Scholar
  • [14] Hajek B, Mitzel K, Yang S (2008) Paging and registration in cellular networks: Jointly optimal policies and an iterative algorithm. IEEE Trans. Inform. Theory 54(2):608–622.CrossrefGoogle Scholar
  • [15] Hernández-Lerma O (1989) Adaptive Markov Control Processes, Applied Mathematical Sciences (Springer, New York).CrossrefGoogle Scholar
  • [16] Huang C, Wang S (2010) A power penalty approach to a nonlinear complementarity problem. Oper. Res. Lett. 38(1):72–76.CrossrefGoogle Scholar
  • [17] Huang Y, Zhu Q (2021) Self-triggered Markov decision processes. Proc. 60th IEEE CDC (Austin, TX), 4507–4514.Google Scholar
  • [18] Ito K, Kunisch K (2006) Parabolic variational inequalities: The Lagrange multiplier approach. J. Mathématiques Pures Appliquées 85(3):415–449.CrossrefGoogle Scholar
  • [19] Krueger D, Leike J, Evans O, Salvatier J (2020) Active reinforcement learning: Observing rewards at a cost. Preprint, submitted November 24, https://arxiv.org/abs/2011.06709.Google Scholar
  • [20] Kushner H (1964) On the optimum timing of observations for linear control systems with unknown initial state. IEEE Trans. Automatic Control 9(2):144–150.CrossrefGoogle Scholar
  • [21] Kushner HJ, Dupuis PG (1992) Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, Berlin).CrossrefGoogle Scholar
  • [22] Meier L, Peschon J, Dressler R (1967) Optimal control of measurement subsystems. IEEE Trans. Automatic Control 12(5):528–536.CrossrefGoogle Scholar
  • [23] Nayyar A, Başar T, Teneketzis D, Veeravalli VV (2013) Optimal strategies for communication and remote estimation with an energy harvesting sensor. IEEE Trans. Automatic Control 58(9):2246–2260.CrossrefGoogle Scholar
  • [24] Pham H (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications, 1st ed. (Springer, Berlin).CrossrefGoogle Scholar
  • [25] Reis R (2006) Inattentive consumers. J. Monetary Econom. 53(8):1761–1800.CrossrefGoogle Scholar
  • [26] Reis R (2006) Inattentive producers. Rev. Econom. Stud. 73(3):793–821.CrossrefGoogle Scholar
  • [27] Reisinger C, Witte JH (2012) On the use of policy iteration as an easy way of pricing American options. SIAM J. Financial Math. 3(1):459–478.CrossrefGoogle Scholar
  • [28] Reisinger C, Zhang Y (2019) A penalty scheme for monotone systems with interconnected obstacles: Convergence and error estimates. SIAM J. Numerical Anal. 57(4):1625–1648.CrossrefGoogle Scholar
  • [29] Reisinger C, Zhang Y (2020) Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems. SIAM J. Control Optim. 58(1):243–276.CrossrefGoogle Scholar
  • [30] Reisinger C, Zhang Y (2021) A penalty scheme and policy iteration for nonlocal HJB variational inequalities with monotone nonlinearities. Comput. Math. Appl. 93:199–213.CrossrefGoogle Scholar
  • [31] Rust J (1996) Numerical dynamic programming in economics. Amman HM, Kendrick DA, Rust J, eds. Handbook of Computational Economics, vol. 1 (Elsevier, Amsterdam), 619–729.Google Scholar
  • [32] Tzoumas V, Carlone L, Pappas GJ, Jadbabaie A (2020) LQG control and sensing co-design. IEEE Trans. Automatic Control 66(4):1468–1483.CrossrefGoogle Scholar
  • [33] Winkelmann S (2013) Markov decision processes with information costs. PhD thesis, Freie Universität Berlin, Berlin.Google Scholar
  • [34] Winkelmann S, Schütte C, Von Kleist M (2014) Markov control processes with rare state observation: Theory and application to treatment scheduling in HIV-1. Comm. Math. Sci. 12(5):859–877.CrossrefGoogle Scholar
  • [35] Witte JH, Reisinger C (2011) A penalty method for the numerical solution of Hamilton–Jacobi–Bellman (HJB) equations in finance. SIAM J. Numerical Anal. 49(1):213–231.CrossrefGoogle Scholar
  • [36] Witte JH, Reisinger C (2012) Penalty methods for the solution of discrete HJB equations—Continuous control and obstacle problems. SIAM J. Numerical Anal. 50(2):595–625.CrossrefGoogle Scholar
  • [37] Wu W, Arapostathis A (2008) Optimal sensor querying: General Markovian and LQG models with controlled observations. IEEE Trans. Automatic Control 53(6):1392–1405.CrossrefGoogle Scholar
  • [38] Yoshioka H, Tsujimura M (2020) Analysis and computation of an optimality equation arising in an impulse control problem with discrete and costly observations. J. Comput. Appl. Math. 366:112399.CrossrefGoogle Scholar
  • [39] Yoshioka H, Tsujimura M, Hamagami K, Yoshioka Y (2020) A hybrid stochastic river environmental restoration modeling with discrete and costly observations. Optimal Control Appl. Methods 41(6):1964–1994.CrossrefGoogle Scholar
  • [40] Yoshioka H, Yaegashi Y, Tsujimura M, Yoshioka Y (2021) Cost-efficient monitoring of continuous-time stochastic processes based on discrete observations. Appl. Stochastic Models Bus. Indust. 37(1):113–138.CrossrefGoogle Scholar
  • [41] Yoshioka H, Yoshioka Y, Yaegashi Y, Tanaka T, Horinouchi M, Aranishi F (2020) Analysis and computation of a discrete costly observation model for growth estimation and management of biological resources. Comput. Math. Appl. 79(4):1072–1093.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.