Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

David B. Brown
Corresponding Author
David B. Brown
[email protected]
http://orcid.org/0000-0002-5458-9098
Fuqua School of Business, Duke University, Durham, North Carolina 27708
Search for more papers by this author
,
Martin B. Haugh
Martin B. Haugh
[email protected]
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Search for more papers by this author

David B. Brown

Corresponding Author

David B. Brown

[email protected]

http://orcid.org/0000-0002-5458-9098

Fuqua School of Business, Duke University, Durham, North Carolina 27708

Search for more papers by this author

Martin B. Haugh

[email protected]

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:2 Aug 2017https://doi.org/10.1287/opre.2017.1631

References

Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.Link, Google Scholar
Andersen L, Broadie M (2004) Primal-dual simulation algorithm for pricing multidimensional American options. Management Sci. 50(9):1222–1234.Link, Google Scholar
Ansell PS, Glazebrook KD, Niño-Mora J, O’Keeffe M (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57(1):21–39.Crossref, Google Scholar
Bertsekas DP, Shreve SE (1996) Stochastic Optimal Control: The Discrete-Time Case (Athena Scientific, Belmont, MA).Google Scholar
Brown DB, Smith JE (2011) Dynamic portfolio optimization with transaction costs: Heuristics and dual bounds. Management Sci. 57(10):1752–1770.Link, Google Scholar
Brown DB, Smith JE (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.Link, Google Scholar
Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, Part 1):785–801.Link, Google Scholar
Chen N, Glasserman P (2007) Additive and multiplicative duals for American option pricing. Finance Stochastics 11(2):153–179.Crossref, Google Scholar
Cox DR, Smith WL (1961) Queues (Methuen, London).Google Scholar
de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
Desai VV, Farias VF, Moallemi CC (2011) Bounds for Markov decision processes. Lewis FL, Liu D, eds. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (IEEE Press, Piscataway, NJ), 452–473.Google Scholar
Desai VV, Farias VF, Moallemi CC (2012) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Link, Google Scholar
Devalkar S, Anupindi R, Sinha A (2011) Integrated optimization of procurement, processing, and trade of commodities. Oper. Res. 59(6):1369–1381.Link, Google Scholar
Feinberg EA (2011) Total expected discounted reward MDPs: Existence of optimal policies. Cochran JJ, ed. Wiley Encyclopedia of Operations Research and Management Science (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Feinberg EA, Huang J (2015) On the reduction of total-cost and average-cost MDPs to discounted MDPs. Preprint arXiv:1507.00664.Google Scholar
Glasserman P (2004) Monte Carlo Methods in Financial Engineering (Springer-Verlag, New York).Crossref, Google Scholar
Graves SC (1999) A single-item inventory model for a nonstationary demand process. Manufacturing Service Oper. Management 1(1):50–61.Link, Google Scholar
Harrison JM (1975) Dynamic scheduling of a multiclass queue: Discount optimality. Oper. Res. 23(2):270–281.Link, Google Scholar
Haugh MB, Kogan L (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.Link, Google Scholar
Haugh MB, Wang C (2014a) Dynamic portfolio execution and information relaxations. SIAM J. Financial Math. 5(1):316–359.Crossref, Google Scholar
Haugh MB, Wang C (2014b) Information relaxations and dynamic zero-sum games. Working paper, Columbia University, New York.Google Scholar
Haugh MB, Iyengar G, Wang C (2016) Tax-aware dynamic asset allocation. Oper. Res. 64(4):849–866.Link, Google Scholar
Hawkins J (2003) A Lagrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Ph.D. thesis, Operations Research Center, Massachusetts Institute of Technology, Cambridge.Google Scholar
Henderson SG, Glynn PW (2002) Approximating martingales for variance reduction in Markov process simulation. Math. Oper. Res. 27(2):253–271.Link, Google Scholar
Johnson G, Thompson H (1975) Optimality of myopic inventory policies for certain dependent demand processes. Management Sci. 21(11):1303–1307.Link, Google Scholar
Kim MJ, Lim AEB (2016) Robust multiarmed bandit problems. Management Sci. 62(1):264–285.Link, Google Scholar
Kogan L, Mitra I (2013) Accuracy verification for numerical solutions of equilibrium models. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
Lai G, Margot F, Secomandi N (2010) An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation. Oper. Res. 58(3):564–582.Link, Google Scholar
Lu X, Song J-S, Regan A (2006) Inventory planning with forecast updates: Approximate solutions and cost error bounds. Oper. Res. 54(6):1079–1097.Link, Google Scholar
Niño-Mora J (2006) Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue. Queueing Syst. 54(4):281–312.Crossref, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, New York).Crossref, Google Scholar
Rogers LCG (2002) Monte Carlo valuation of American options. Math. Finance 12(3):271–286.Crossref, Google Scholar
Rogers LCG (2007) Pathwise stochastic optimal control. SIAM J. Control Optim. 46(3):1116–1132.Crossref, Google Scholar
Rudin W (1987) Real and Complex Analysis (McGraw-Hill, New York).Google Scholar
Van Mieghem JA (1995) Dynamic scheduling with convex delay costs: The generalized cμ rule. Ann. Appl. Probab. 5(3):809–833.Crossref, Google Scholar
Veatch MH, Wein LM (1996) Scheduling a make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.Link, Google Scholar
Veinott AF Jr (1965) Optimal policy for a multi-product, dynamic, nonstationary inventory system. Management Sci. 12(3):206–222.Link, Google Scholar
Veinott AF Jr (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40(5):1635–1660.Crossref, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A:287–298.Crossref, Google Scholar
Williams D (1991) Probability with Martingales (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Ye F, Zhu H, Zhou E (2014) Weakly coupled dynamic program: Information and Lagrangian relaxations. Working paper, Georgia Institute of Technology, Atlanta.Google Scholar

Volume 65, Issue 5

September-October 2017

Pages ii-v, 1115-1428

Article Information

Metrics

Information

Received:September 04, 2014
Accepted:March 30, 2017
Published Online:August 02, 2017

Cite as

David B. Brown, Martin B. Haugh (2017) Information Relaxation Bounds for Infinite Horizon Markov Decision Processes. Operations Research 65(5):1355-1379.

https://doi.org/10.1287/opre.2017.1631

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Information Relaxation Bounds for Infinite Horizon Markov Decision Processes

References

Volume 65, Issue 5

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News