Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management
Published Online:25 Mar 2022https://doi.org/10.1287/opre.2022.2263
References
- (2011) Stochastic convex optimization with bandit feedback. Taylor JS, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS 2011), Granada, Spain, 1035–1043.Google Scholar
- (2017) Optimistic posterior sampling for reinforcement learning: Worst-case regret bounds. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems 30 (NIPS 2017, Long Beach, CA), 1184–1194.Google Scholar
- (2009) REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. Bilmes JA, Ng AY, eds. Proc. Twenty-Fifth Conf. Uncertainty Artificial Intelligence, Montreal, QC, Canada (AUAI Press, Arlington, VA), 35–42.Google Scholar
- (2014) Partial monitoring—classification, regret bounds, and algorithms. Math. Oper. Res. 39(4):967–997.Link, Google Scholar
- (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
- (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- (2011) Lost-sales inventory theory: A review. Eur. J. Oper. Res. 215(1):1–13.Crossref, Google Scholar
- (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
- (2009a) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
- (2009b) Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Sci. 55(3):404–420.Link, Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(Apr):1563–1600.Google Scholar
- (2004) Lost-sales problems with stochastic lead times: Convexity results for base-stock policies. Oper. Res. 52(5):795–803.Link, Google Scholar
- (1983) A note on the convexity of performance measures of m/m/c queueing systems. J. Appl. Probab. 20(4):920–923.Crossref, Google Scholar
- (2017) On the hardness of inventory management with censored demand data. Preprint, submitted October 16, https://arxiv.org/abs/1710.05739.Google Scholar
- (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1987) Optimal server allocation in a system of multi-server stations. Management Sci. 33(9):1173–1180.Link, Google Scholar
- (2008) Optimistic linear programming gives logarithmic regret for irreducible MDPs. Platt JC, Koller D, Singer Y, Roweis ST, eds. Proc. Twenty-First Annual Conf. Adv. Neural Inform. Processing Systems (NIPS 2007, Vancouver, British Columbia, Canada) (Curran Associates, Inc.), 1505–1512.Google Scholar
- (1980) Note—On the marginal benefit of adding servers to g/gi/m queues. Management Sci. 26(9):946–951.Link, Google Scholar
- (2020) Closing the gap: A learning algorithm for the lost-sales inventory system with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
- (2000) Foundations of Inventory Management (McGraw-Hill, Boston).Google Scholar
- (2008) Old and new methods for lost-sales inventory systems. Oper. Res. 56(5):1256–1263.Link, Google Scholar

