Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Christoph Reisinger
Christoph Reisinger
[email protected]
https://orcid.org/0000-0003-4027-5298
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author
,
Jonathan Tam
Corresponding Author
Jonathan Tam
[email protected]
https://orcid.org/0000-0003-1896-1056
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Jonathan Tam

Corresponding Author

Jonathan Tam

[email protected]

https://orcid.org/0000-0003-1896-1056

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Published Online:23 May 2024https://doi.org/10.1287/moor.2023.0172

References

[1] Anderson RF, Friedman A (1977) Optimal inspections in a stochastic control problem with costly observations. Math. Oper. Res. 2(2):155–190.Link, Google Scholar
[2] Anderson RF, Friedman A (1978) Optimal inspections in a stochastic control problem with costly observations II. Math. Oper. Res. 3(1):67–81.Link, Google Scholar
[3] Azimzadeh P, Forsyth PA (2016) Weakly chained matrices, policy iteration, and impulse control. SIAM J. Numerical Anal. 54(3):1341–1364.Crossref, Google Scholar
[4] Bayraktar E, Kravitz R (2015) Quickest detection with discretely controlled observations. Sequential Anal. 34(1):77–133.Crossref, Google Scholar
[5] Bayraktar E, Ekström E, Guo J (2022) Disorder detection with costly observations. J. Appl. Probab. 59(2):338–349.Crossref, Google Scholar
[6] Bellinger C, Coles R, Crowley M, Tamblyn I (2021) Active measure reinforcement learning for observation cost minimization. Antonie L, Zadeh PM, eds. Proc. 34th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
[7] Bellinger C, Drozdyuk A, Crowley M, Tamblyn I (2022) Balancing information with observation costs in deep reinforcement learning. Kiringa I, Gambs S, Kalala KH, eds. Proc. 35th CAIAC (Canadian Artificial Intelligence Association, Waterloo, ON, Canada).Google Scholar
[8] Briani A, Camilli F, Zidani H (2012) Approximation schemes for monotone systems of nonlinear second order partial differential equations: Convergence result and error estimate. Differential Equations Appl. 4(2):297–317.Crossref, Google Scholar
[9] Cooper C, Hahi N (1971) An optimal stochastic control problem with observation cost. IEEE Trans. Automatic Control. 16(2):185–189.Crossref, Google Scholar
[10] Dalang RC, Shiryaev AN (2015) A quickest detection problem with an observation cost. Ann. Appl. Probab. 25(3):1475–1512.Crossref, Google Scholar
[11] Dyrssen H, Ekström E (2018) Sequential testing of a Wiener process with costly observations. Sequential Anal. 37(1):47–58.Crossref, Google Scholar
[12] Forsyth PA, Vetzal KR (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.Crossref, Google Scholar
[13] Guo N, Kostina V (2021) Optimal causal rate-constrained sampling for a class of continuous Markov processes. IEEE Trans. Inform. Theory 67(12):7876–7890.Crossref, Google Scholar
[14] Hajek B, Mitzel K, Yang S (2008) Paging and registration in cellular networks: Jointly optimal policies and an iterative algorithm. IEEE Trans. Inform. Theory 54(2):608–622.Crossref, Google Scholar
[15] Hernández-Lerma O (1989) Adaptive Markov Control Processes, Applied Mathematical Sciences (Springer, New York).Crossref, Google Scholar
[16] Huang C, Wang S (2010) A power penalty approach to a nonlinear complementarity problem. Oper. Res. Lett. 38(1):72–76.Crossref, Google Scholar
[17] Huang Y, Zhu Q (2021) Self-triggered Markov decision processes. Proc. 60th IEEE CDC (Austin, TX), 4507–4514.Google Scholar
[18] Ito K, Kunisch K (2006) Parabolic variational inequalities: The Lagrange multiplier approach. J. Mathématiques Pures Appliquées 85(3):415–449.Crossref, Google Scholar
[19] Krueger D, Leike J, Evans O, Salvatier J (2020) Active reinforcement learning: Observing rewards at a cost. Preprint, submitted November 24, https://arxiv.org/abs/2011.06709.Google Scholar
[20] Kushner H (1964) On the optimum timing of observations for linear control systems with unknown initial state. IEEE Trans. Automatic Control 9(2):144–150.Crossref, Google Scholar
[21] Kushner HJ, Dupuis PG (1992) Numerical Methods for Stochastic Control Problems in Continuous Time (Springer, Berlin).Crossref, Google Scholar
[22] Meier L, Peschon J, Dressler R (1967) Optimal control of measurement subsystems. IEEE Trans. Automatic Control 12(5):528–536.Crossref, Google Scholar
[23] Nayyar A, Başar T, Teneketzis D, Veeravalli VV (2013) Optimal strategies for communication and remote estimation with an energy harvesting sensor. IEEE Trans. Automatic Control 58(9):2246–2260.Crossref, Google Scholar
[24] Pham H (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications, 1st ed. (Springer, Berlin).Crossref, Google Scholar
[25] Reis R (2006) Inattentive consumers. J. Monetary Econom. 53(8):1761–1800.Crossref, Google Scholar
[26] Reis R (2006) Inattentive producers. Rev. Econom. Stud. 73(3):793–821.Crossref, Google Scholar
[27] Reisinger C, Witte JH (2012) On the use of policy iteration as an easy way of pricing American options. SIAM J. Financial Math. 3(1):459–478.Crossref, Google Scholar
[28] Reisinger C, Zhang Y (2019) A penalty scheme for monotone systems with interconnected obstacles: Convergence and error estimates. SIAM J. Numerical Anal. 57(4):1625–1648.Crossref, Google Scholar
[29] Reisinger C, Zhang Y (2020) Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems. SIAM J. Control Optim. 58(1):243–276.Crossref, Google Scholar
[30] Reisinger C, Zhang Y (2021) A penalty scheme and policy iteration for nonlocal HJB variational inequalities with monotone nonlinearities. Comput. Math. Appl. 93:199–213.Crossref, Google Scholar
[31] Rust J (1996) Numerical dynamic programming in economics. Amman HM, Kendrick DA, Rust J, eds. Handbook of Computational Economics, vol. 1 (Elsevier, Amsterdam), 619–729.Google Scholar
[32] Tzoumas V, Carlone L, Pappas GJ, Jadbabaie A (2020) LQG control and sensing co-design. IEEE Trans. Automatic Control 66(4):1468–1483.Crossref, Google Scholar
[33] Winkelmann S (2013) Markov decision processes with information costs. PhD thesis, Freie Universität Berlin, Berlin.Google Scholar
[34] Winkelmann S, Schütte C, Von Kleist M (2014) Markov control processes with rare state observation: Theory and application to treatment scheduling in HIV-1. Comm. Math. Sci. 12(5):859–877.Crossref, Google Scholar
[35] Witte JH, Reisinger C (2011) A penalty method for the numerical solution of Hamilton–Jacobi–Bellman (HJB) equations in finance. SIAM J. Numerical Anal. 49(1):213–231.Crossref, Google Scholar
[36] Witte JH, Reisinger C (2012) Penalty methods for the solution of discrete HJB equations—Continuous control and obstacle problems. SIAM J. Numerical Anal. 50(2):595–625.Crossref, Google Scholar
[37] Wu W, Arapostathis A (2008) Optimal sensor querying: General Markovian and LQG models with controlled observations. IEEE Trans. Automatic Control 53(6):1392–1405.Crossref, Google Scholar
[38] Yoshioka H, Tsujimura M (2020) Analysis and computation of an optimality equation arising in an impulse control problem with discrete and costly observations. J. Comput. Appl. Math. 366:112399.Crossref, Google Scholar
[39] Yoshioka H, Tsujimura M, Hamagami K, Yoshioka Y (2020) A hybrid stochastic river environmental restoration modeling with discrete and costly observations. Optimal Control Appl. Methods 41(6):1964–1994.Crossref, Google Scholar
[40] Yoshioka H, Yaegashi Y, Tsujimura M, Yoshioka Y (2021) Cost-efficient monitoring of continuous-time stochastic processes based on discrete observations. Appl. Stochastic Models Bus. Indust. 37(1):113–138.Crossref, Google Scholar
[41] Yoshioka H, Yoshioka Y, Yaegashi Y, Tanaka T, Horinouchi M, Aranishi F (2020) Analysis and computation of a discrete costly observation model for growth estimation and management of biological resources. Comput. Math. Appl. 79(4):1072–1093.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 50, Issue 2

May 2025

Pages iii, 783-1583

Article Information

Metrics

Information

Received:June 05, 2023
Accepted:March 25, 2024
Published Online:May 23, 2024

Cite as

Christoph Reisinger, Jonathan Tam (2024) Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme. Mathematics of Operations Research 50(2):1305-1332.

https://doi.org/10.1287/moor.2023.0172

Keywords

Acknowledgments

The authors thank Prof. Dr. Dirk Becherer (Humboldt Universität zu Berlin) for his insightful suggestions during discussion as well as the two anonymous referees for their feedback.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

References

Volume 50, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News