Dynamic Programs with Shared Resources and Signals: Dynamic Fluid Policies and Asymptotic Optimality

Published Online:https://doi.org/10.1287/opre.2021.2181

References

  • Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.LinkGoogle Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learning, Atlanta, June 17–19, 127–135.Google Scholar
  • Ahuja V, Birge JR (2016) Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. Eur. J. Oper. Res. 248(2):619–633.CrossrefGoogle Scholar
  • Balseiro SR, Brown DB, Chen C (2021) Dynamic pricing of relocating resources in large networks. Management Sci. 67(7):4075–4094.LinkGoogle Scholar
  • Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bertsimas D, Mišić VV (2016) Decomposable Markov decision processes: A fluid optimization approach. Oper. Res. 64(6):1537–1555.LinkGoogle Scholar
  • Bhat N, Farias VF, Moallemi CC, Sinha D (2020) Near-optimal ab testing. Management Sci. 66(10):4477–4495.LinkGoogle Scholar
  • Birge JR, Zhao G (2008) Successive linear approximation solution of infinite-horizon dynamic stochastic programs. SIAM J. Optim. 18(4):1165–1186.CrossrefGoogle Scholar
  • Bray RL (2019) Markov decision processes with exogenous variables. Management Sci. 65(10):4598–4606.LinkGoogle Scholar
  • Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66(7):3029–3050.LinkGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Chen F, Song J-S (2001) Optimal policies for multiechelon inventory problems with Markov-modulated demand. Oper. Res. 49(2):226–234.LinkGoogle Scholar
  • Chen Z-L, Powell WB (1999) Convergent cutting-plane and partial-sampling algorithm for multistage stochastic linear programs with recourse. J. Optim. Theory Appl. 102(3):497–524.CrossrefGoogle Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
  • DeCroix GA, Arreola-Risa A (1998) Optimal production and inventory policy for multiple products under resource constraints. Management Sci. 44(7):950–961.LinkGoogle Scholar
  • Erkip N, Hausman WH, Nahmias S (1990) Optimal centralized ordering policies in multi-echelon inventory systems with correlated demands. Management Sci. 36(3):381–392.LinkGoogle Scholar
  • Hawkins JT (2003) A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. Unpublished doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
  • Hu W, Frazier P (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
  • Linowsky K, Philpott AB (2005) On the convergence of sampling-based decomposition algorithms for multistage stochastic programs. J. Optim. Theory Appl. 125(2):349–366.CrossrefGoogle Scholar
  • Miao S, Jasin S, Chao X (2020) Asymptotically optimal Lagrangian policies for one-warehouse multi-store system with lost sales. Preprint, submitted April 6, https://doi.org/10.2139/ssrn.3552995.Google Scholar
  • North CM, Dougan ML, Sacks CA (2020) Improving clinical trial enrollment—in the Covid-19 era and beyond. New England J. Medicine 383(15):1406–1408.CrossrefGoogle Scholar
  • Pereira MV, Pinto LM (1991) Multi-stage stochastic optimization applied to energy planning. Math. Programming 52(1-3):359–375.CrossrefGoogle Scholar
  • Philpott AB, Guan Z (2008) On the convergence of stochastic dual dynamic programming and related methods. Oper. Res. Lett. 36(4):450–455.CrossrefGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).CrossrefGoogle Scholar
  • Russo D, Van Roy B (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.LinkGoogle Scholar
  • Shapiro A (2011) Analysis of stochastic dual dynamic programming method. Eur. J. Oper. Res. 209(1):63–72.CrossrefGoogle Scholar
  • Song J-S, Zipkin P (1993) Inventory control in a fluctuating demand environment. Oper. Res. 41(2):351–370.LinkGoogle Scholar
  • Topaloglu H (2009) Using Lagrangian relaxation to compute capacity-dependent bid prices in network revenue management. Oper. Res. 57(3):637–649.LinkGoogle Scholar
  • Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25:287–298.CrossrefGoogle Scholar
  • Zayas-Caban G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.