On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies

Huizhen Yu
Huizhen Yu
[email protected]
https://orcid.org/0000-0002-3673-0094
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G2E8, Canada
Search for more papers by this author

Department of Computing Science, University of Alberta, Edmonton, Alberta T6G2E8, Canada

Search for more papers by this author

Published Online:24 Oct 2023https://doi.org/10.1287/moor.2022.0188

References

[1] Arapostathis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Marcus SI (1993) Discrete-time controlled Markov processes with average cost criterion: A survey. SIAM J. Control Optim. 31(2):282–344.Crossref, Google Scholar
[2] Ash R (1972) Real Analysis and Probability (Academic Press, New York).Google Scholar
[3] Balder EJ (1989) On compactness of the space of policies in stochastic dynamic programming. Stochastic Proc. Appl. 32:141–150.Crossref, Google Scholar
[4] Bäuerle N, Rieder U (2014) More risk-sensitive Markov decision processes. Math. Oper. Res. 39:105–120.Link, Google Scholar
[5] Bertsekas DP, Shreve SE (1978) Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York).Google Scholar
[6] Bierth KJ (1987) An expected average reward criterion. Stochastic Proc. Appl. 26:123–140.Crossref, Google Scholar
[7] Blackwell D (1968) A Borel set not containing a graph. Ann. Math. Statist. 39:1345–1347.Crossref, Google Scholar
[8] Blackwell D (1976) The stochastic processes of Borel gambling and dynamic programming. Ann. Statist. 4:370–374.Crossref, Google Scholar
[9] Blackwell D, Ryll-Nardzewski C (1963) Non-existence of everywhere proper conditional distributions. Ann. Math. Statist. 34:223–225.Crossref, Google Scholar
[10] Blackwell D, Freedman D, Orkin M (1974) The optimal reward operator in dynamic programming. Ann. Probability 2(5):926–941.Crossref, Google Scholar
[11] Borkar VS (2002) Convex analytic methods in Markov decision processes. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes: Methods and Applications (Springer Science+Business Media, New York), 347–375.Crossref, Google Scholar
[12] Brown LD, Purves R (1973) Measurable selections of extrema. Ann. Statist. 1:902–912.Crossref, Google Scholar
[13] Cavazos-Cadena R, Salem-Siva F (2010) The discounted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. Appl. Math. Optim. 61:167–190.Crossref, Google Scholar
[14] Dubins LE, Freedman D (1964) Measurable sets of measures. Pacific J. Math. 14:1211–1222.Crossref, Google Scholar
[15] Dudley RM (2002) Real Analysis and Probability (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[16] Dynkin EB, Yushkevich AA (1979) Controlled Markov Processes (Springer, New York).Crossref, Google Scholar
[17] Feinberg EA (1980) An ϵ-optimality control of a finite Markov chain with an average reward criterion. Theory Probability Appl. 25(1):70–81.Crossref, Google Scholar
[18] Feinberg EA (1982a) Controlled Markov processes with arbitrary numerical criteria. Theory Probability Appl. 27(3):486–503.Crossref, Google Scholar
[19] Feinberg EA (1982b) Non-randomized Markov and semi-Markov strategies in dynamic programming. Theory Probability Appl. 27(1):116–126.Crossref, Google Scholar
[20] Feinberg EA (1991) Non-randomized strategies in stochastic decision processes. Ann. Oper. Res. 29:315–332.Crossref, Google Scholar
[21] Feinberg EA (1996) On measurability and representation of strategic measures in Markov decision processes. Statistics, Probability and Game Theory, Lecture Notes–Monograph Series (IMS), vol. 30, 29–43.Google Scholar
[22] Feinberg EA, Kasyanov PO (2021) MDPs with setwise continuous transition probabilities. Oper. Res. Lett. 49:734–740.Crossref, Google Scholar
[23] Feinberg EA, Kasyanov PO, Liang Y (2020) Fatou’s lemma in its classic form and Lebesgue’s convergence theorems for varying measures with applications to MDPs. Theory Probability Appl. 65:270–291.Crossref, Google Scholar
[24] Gikhman II, Skorokhod AV (1979) Controlled Stochastic Processes (Springer, New York).Crossref, Google Scholar
[25] Gödel K (1938) The consistency of the axiom of choice and of the generalized continuum hypothesis. Proc. National Acad. Sci. USA 24:556–557.Crossref, Google Scholar
[26] González-Trejo JI, Hernández-Lerma O, Hoyos-Reyes LF (2002) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.Crossref, Google Scholar
[27] Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).Crossref, Google Scholar
[28] Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).Crossref, Google Scholar
[29] Hernández-Lerma O, Lasserre JB (2002) The linear programming approach. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes: Methods and Applications (Springer Science+Business Media, New York).Crossref, Google Scholar
[30] Jaśkiewicz A (2007) Average optimality for risk-sensitive control with general state space. Ann. Appl. Probability 17:654–675.Crossref, Google Scholar
[31] Jaśkiewicz A, Nowak AS (2014) Robust Markov control processes. J. Math. Anal. Appl. 420:1337–1353.Crossref, Google Scholar
[32] Jaśkiewicz A, Nowak AS (2018) Zero-sum stochastic games. Başar T, Zaccour G, eds. Handbook of Dynamic Game Theory (Springer International Publishing, Cham, Switzerland), 215–279.Crossref, Google Scholar
[33] Kechris AS (1995) Classical Descriptive Set Theory (Springer-Verlag, New York).Crossref, Google Scholar
[34] Koellner P (2014) Large cardinals and determinacy. Zalta EN, ed. The Stanford Encyclopedia of Philosophy (Metaphysics Research Laboratory, Stanford University, Standford, CA). https://plato.stanford.edu/archives/spr2014/entries/large-cardinals-determinacy/.Google Scholar
[35] Krengel U (1985) Ergodic Theorems (Walter de Gruyter, Berlin).Crossref, Google Scholar
[36] Maitra A, Sudderth W (1993) Borel stochastic games with limsup payoffs. Ann. Probability 21:861–885.Crossref, Google Scholar
[37] Maitra A, Sudderth W (1998) Finitely additive stochastic games with Borel measurable payoffs. Internat. J. Game Theory 27:257–267.Crossref, Google Scholar
[38] Maitra A, Purves R, Sudderth W (1990) Leavable gambling problems with unbounded utilities. Trans. Amer. Math. Soc. 320:543–567.Crossref, Google Scholar
[39] Martin DA, Steel JR (1988) Projective determinacy. Proc. National Acad. Sci. USA 85:6582–6586.Crossref, Google Scholar
[40] Nowak AS (1985) Universally measurable strategies in zero-sum stochastic games. Ann. Probability 13(1):269–287.Crossref, Google Scholar
[41] Nowak AS (2010) On measurable minimax selectors. J. Math. Anal. Appl. 366:385–388.Crossref, Google Scholar
[42] Parthasarathy KR (1967) Probability Measures on Metric Spaces (Academic Press, New York).Crossref, Google Scholar
[43] Prikry K, Sudderth WD (2016) Measurability of the value of a parametrized game. Internat. J. Game Theory 45:675–683.Crossref, Google Scholar
[44] Rieder U (1991) Non-cooperative dynamic games with general utility functions. Raghavan TES, Ferguson TS, Parthasarathy T, Vrieze OJ, eds. Stochastic Games and Related Topics (Kluwer, Dordrecht, Netherlands), 161–174.Crossref, Google Scholar
[45] Schäl M (1975) On dynamic programming: Compactness of the space of policies. Stoch. Proc. Appl. 3:345–364.Crossref, Google Scholar
[46] Shapiro A, Dentcheva D, Ruszczyński A (2021) Lectures on Stochastic Programming: Modeling and Theory, 3rd ed. (Society for Industrial and Applied Mathematics and Mathematical Optimization Society, Philadelphia).Crossref, Google Scholar
[47] Shreve SE, Bertsekas DP (1978) Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control. SIAM J. Control Optim. 16(6):953–978.Crossref, Google Scholar
[48] Shreve SE, Bertsekas DP (1979) Universally measurable policies in dynamic programming. Math. Oper. Res. 4(1):15–30.Link, Google Scholar
[49] Strauch RE (1966) Negative dynamic programming. Ann. Math. Statist. 37:871–890.Crossref, Google Scholar
[50] Sudderth W (1969) On the existence of good stationary strategies. Trans. Amer. Math. Soc. 135:399–414.Crossref, Google Scholar
[51] Vega-Amaya O (2018) Solutions of the average cost optimality equation for Markov decision processes with weakly continuous kernel: The fixed-point approach revisited. J. Math. Anal. Appl. 464:152–163.Crossref, Google Scholar
[52] Wei C, Fauß M, Chapman MP (2022) CVaR-based safety analysis in the infinite time horizon setting. Proc. Amer. Control Conf. (IEEE, Piscataway, NJ), 2863–2870.Google Scholar
[53] Yu H (2015) On convergence of value iteration for a class of total cost Markov decision processes. SIAM J. Control Optim. 53(4):1982–2016.Crossref, Google Scholar
[54] Yu H (2020) Average cost optimality inequality for Markov decision processes with Borel spaces and universally measurable policies. SIAM J. Control Optim. 58(4):2469–2502.Crossref, Google Scholar
[55] Yu H (2022) On structural properties of optimal average cost functions in Markov decision processes with Borel spaces and universally measurable policies. J. Math. Anal. Appl. 509:125954.Crossref, Google Scholar
[56] Yu H (2023) On strategic measures and optimality properties in discrete-time stochastic control with universally measurable policies. Preprint, https://arxiv.org/abs/ 2206.06492.Google Scholar
[57] Yüksel S (2020) A universal dynamic program and refined existence results for decentralized stochastic control. SIAM J. Control Optim. 58(5):2711–2739.Crossref, Google Scholar
[58] Yüksel S, Saldi N (2017) Convex analysis in decentralized stochastic control, strategic measures, and optimal solutions. SIAM J. Control Optim. 55(1):1–28.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 3

August 2024

Pages 1303-2047, C2

Article Information

Metrics

Information

Received:July 17, 2022
Accepted:July 09, 2023
Published Online:October 24, 2023

Cite as

Huizhen Yu (2023) On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies. Mathematics of Operations Research 49(3):1734-1760.

https://doi.org/10.1287/moor.2022.0188

Keywords

Acknowledgments

The author is grateful to Professor Eugene Feinberg for pointing to several important references on strategic measures and stochastic games and valuable feedback on earlier versions of this work. The author also thanks Professor William Sudderth for mentioning the early work (Maitra et al. [38]) on Borel gambling problems, which also used Kondô’s uniformization theorem in its analysis; Professor Serdar Yüksel, for helpful discussion on minimax control problems; and an anonymous reviewer, whose critical comments helped improve the paper. This paper is dedicated to the memory of Professor Sanjoy K. Mitter, an inspiring mentor.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies

References

Volume 49, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News