Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality

David B. Brown
David B. Brown
[email protected]
https://orcid.org/0000-0002-5458-9098
Fuqua School of Business, Duke University, Durham, North Carolina 27708
Search for more papers by this author
,
Jingwei Zhang
Jingwei Zhang
[email protected]
https://orcid.org/0000-0003-2684-284X
Fuqua School of Business, Duke University, Durham, North Carolina 27708
Search for more papers by this author

Fuqua School of Business, Duke University, Durham, North Carolina 27708

Search for more papers by this author

Jingwei Zhang

[email protected]

https://orcid.org/0000-0003-2684-284X

Fuqua School of Business, Duke University, Durham, North Carolina 27708

Search for more papers by this author

Published Online:16 Dec 2021https://doi.org/10.1287/opre.2021.2181

References

Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.Link, Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learning, Atlanta, June 17–19, 127–135.Google Scholar
Ahuja V, Birge JR (2016) Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. Eur. J. Oper. Res. 248(2):619–633.Crossref, Google Scholar
Balseiro SR, Brown DB, Chen C (2021) Dynamic pricing of relocating resources in large networks. Management Sci. 67(7):4075–4094.Link, Google Scholar
Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
Bertsimas D, Mišić VV (2016) Decomposable Markov decision processes: A fluid optimization approach. Oper. Res. 64(6):1537–1555.Link, Google Scholar
Bhat N, Farias VF, Moallemi CC, Sinha D (2020) Near-optimal ab testing. Management Sci. 66(10):4477–4495.Link, Google Scholar
Birge JR, Zhao G (2008) Successive linear approximation solution of infinite-horizon dynamic stochastic programs. SIAM J. Optim. 18(4):1165–1186.Crossref, Google Scholar
Bray RL (2019) Markov decision processes with exogenous variables. Management Sci. 65(10):4598–4606.Link, Google Scholar
Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66(7):3029–3050.Link, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Chen F, Song J-S (2001) Optimal policies for multiechelon inventory problems with Markov-modulated demand. Oper. Res. 49(2):226–234.Link, Google Scholar
Chen Z-L, Powell WB (1999) Convergent cutting-plane and partial-sampling algorithm for multistage stochastic linear programs with recourse. J. Optim. Theory Appl. 102(3):497–524.Crossref, Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
DeCroix GA, Arreola-Risa A (1998) Optimal production and inventory policy for multiple products under resource constraints. Management Sci. 44(7):950–961.Link, Google Scholar
Erkip N, Hausman WH, Nahmias S (1990) Optimal centralized ordering policies in multi-echelon inventory systems with correlated demands. Management Sci. 36(3):381–392.Link, Google Scholar
Hawkins JT (2003) A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Unpublished doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
Hu W, Frazier P (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
Linowsky K, Philpott AB (2005) On the convergence of sampling-based decomposition algorithms for multistage stochastic programs. J. Optim. Theory Appl. 125(2):349–366.Crossref, Google Scholar
Miao S, Jasin S, Chao X (2020) Asymptotically optimal Lagrangian policies for one-warehouse multi-store system with lost sales. Preprint, submitted April 6, https://doi.org/10.2139/ssrn.3552995.Google Scholar
North CM, Dougan ML, Sacks CA (2020) Improving clinical trial enrollment—in the Covid-19 era and beyond. New England J. Medicine 383(15):1406–1408.Crossref, Google Scholar
Pereira MV, Pinto LM (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1-3):359–375.Crossref, Google Scholar
Philpott AB, Guan Z (2008) On the convergence of stochastic dual dynamic programming and related methods. Oper. Res. Lett. 36(4):450–455.Crossref, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
Russo D, Van Roy B (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.Link, Google Scholar
Shapiro A (2011) Analysis of stochastic dual dynamic programming method. Eur. J. Oper. Res. 209(1):63–72.Crossref, Google Scholar
Song J-S, Zipkin P (1993) Inventory control in a fluctuating demand environment. Oper. Res. 41(2):351–370.Link, Google Scholar
Topaloglu H (2009) Using Lagrangian relaxation to compute capacity-dependent bid prices in network revenue management. Oper. Res. 57(3):637–649.Link, Google Scholar
Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25:287–298.Crossref, Google Scholar
Zayas-Caban G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.Crossref, Google Scholar

Volume 70, Issue 5

September-October 2022

Pages iii-vi, 2597-3033, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:November 16, 2020
Accepted:July 25, 2021
Published Online:December 16, 2021

Cite as

David B. Brown, Jingwei Zhang (2021) Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality. Operations Research 70(5):3015-3033.

https://doi.org/10.1287/opre.2021.2181

Keywords

Acknowledgments

The authors thank the referee team (the area editor, an anonymous associate editor, and two anonymous referees) for a number of insightful comments that improved the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality

References

Volume 70, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News