Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Christoph Reisinger
Christoph Reisinger
[email protected]
https://orcid.org/0000-0003-4027-5298
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author
,
Jonathan Tam
Corresponding Author
Jonathan Tam
[email protected]
https://orcid.org/0000-0003-1896-1056
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
Search for more papers by this author

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Jonathan Tam

Corresponding Author

Jonathan Tam

[email protected]

https://orcid.org/0000-0003-1896-1056

Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom

Search for more papers by this author

Published Online:23 May 2024https://doi.org/10.1287/moor.2023.0172

Abstract

We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.

Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].

cover image Mathematics of Operations Research

Volume 50, Issue 2

May 2025

Pages iii, 783-1583

Article Information

Metrics

Information

Received:June 05, 2023
Accepted:March 25, 2024
Published Online:May 23, 2024

Cite as

Christoph Reisinger, Jonathan Tam (2024) Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme. Mathematics of Operations Research 50(2):1305-1332.

https://doi.org/10.1287/moor.2023.0172

Keywords

Acknowledgments

The authors thank Prof. Dr. Dirk Becherer (Humboldt Universität zu Berlin) for his insightful suggestions during discussion as well as the two anonymous referees for their feedback.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme

Abstract

Volume 50, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News