Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management

Published Online:https://doi.org/10.1287/opre.2022.2263

References

  • Agarwal A, Foster DP, Hsu DJ, Kakade SM, Rakhlin A (2011) Stochastic convex optimization with bandit feedback. Taylor JS, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2011), Granada, Spain, 1035–1043.Google Scholar
  • Agrawal S, Jia R (2017) Optimistic posterior sampling for reinforcement learning: Worst-case regret bounds. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems 30 (NIPS 2017, Long Beach, CA), 1184–1194.Google Scholar
  • Bartlett PL, Tewari A (2009) REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. Bilmes JA, Ng AY, eds. Proc. Twenty-Fifth Conf. Uncertainty Artificial Intelligence, Montreal, QC, Canada (AUAI Press, Arlington, VA), 35–42.Google Scholar
  • Bartók G, Foster DP, Pál D, Rakhlin A, Szepesvári C (2014) Partial monitoring—classification, regret bounds, and algorithms. Math. Oper. Res. 39(4):967–997.LinkGoogle Scholar
  • Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Bijvank M, Vis IF (2011) Lost-sales inventory theory: A review. Eur. J. Oper. Res. 215(1):1–13.CrossrefGoogle Scholar
  • Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.LinkGoogle Scholar
  • Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.LinkGoogle Scholar
  • Huh WT, Janakiraman G, Muckstadt JA, Rusmevichientong P (2009b) Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Sci. 55(3):404–420.LinkGoogle Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(Apr):1563–1600.Google Scholar
  • Janakiraman G, Roundy RO (2004) Lost-sales problems with stochastic lead times: Convexity results for base-stock policies. Oper. Res. 52(5):795–803.LinkGoogle Scholar
  • Lee HL, Cohen MA (1983) A note on the convexity of performance measures of m/m/c queueing systems. J. Appl. Probab. 20(4):920–923.CrossrefGoogle Scholar
  • Lugosi G, Markakis MG, Neu G (2017) On the hardness of inventory management with censored demand data. Preprint, submitted October 16, https://arxiv.org/abs/1710.05739.Google Scholar
  • Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Shanthikumar JG, Yao DD (1987) Optimal server allocation in a system of multi-server stations. Management Sci. 33(9):1173–1180.LinkGoogle Scholar
  • Tewari A, Bartlett PL (2008) Optimistic linear programming gives logarithmic regret for irreducible MDPs. Platt JC, Koller D, Singer Y, Roweis ST, eds. Proc. Twenty-First Annual Conf. Adv. Neural Inform. Processing Systems (NIPS 2007, Vancouver, British Columbia, Canada) (Curran Associates, Inc.), 1505–1512.Google Scholar
  • Weber RR (1980) Note—On the marginal benefit of adding servers to g/gi/m queues. Management Sci. 26(9):946–951.LinkGoogle Scholar
  • Zhang H, Chao X, Shi C (2020) Closing the gap: A learning algorithm for the lost-sales inventory system with lead times. Management Sci. 66(5):1962–1980.LinkGoogle Scholar
  • Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, Boston).Google Scholar
  • Zipkin P (2008) Old and new methods for lost-sales inventory systems. Oper. Res. 56(5):1256–1263.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.