Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs

Hao Yuan
Corresponding Author
Hao Yuan
[email protected]
https://orcid.org/0000-0003-1738-8673
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105
Search for more papers by this author
,
Qi Luo
Corresponding Author
Qi Luo
[email protected]
https://orcid.org/0000-0002-4103-7112
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105
Search for more papers by this author
,
Cong Shi
Corresponding Author
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105
Search for more papers by this author

Hao Yuan

Corresponding Author

Hao Yuan

[email protected]

https://orcid.org/0000-0003-1738-8673

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105

Search for more papers by this author

Qi Luo

Corresponding Author

Qi Luo

[email protected]

https://orcid.org/0000-0002-4103-7112

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105

Search for more papers by this author

Cong Shi

Corresponding Author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105

Search for more papers by this author

Published Online:8 Feb 2021https://doi.org/10.1287/mnsc.2020.3799

References

Agrawal S , Jia R (2019) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Karlin A, ed. Proc. 2019 ACM Conf. Econom. Comput. (ACM, New York), 743–744.Google Scholar
Azoury KS (1985) Bayes solution to dynamic inventory models under unknown demand distribution. Management Sci. 31(9):1150–1160.Link, Google Scholar
Besbes O , Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
Bubeck S , Munos R , Stoltz G , Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12(May):1655–1695.Google Scholar
Burnetas AN , Smith CE (2000) Adaptive ordering and pricing for perishable products. Oper. Res. 48(3):436–443.Link, Google Scholar
Caliskan-Demirag O , Chen Y , Yang Y (2012) Ordering policies for periodic-review inventory systems with quantity-dependent fixed costs. Oper. Res. 60(4):785–796.Link, Google Scholar
Chao X , Zipkin PH (2008) Optimal policy for a periodic-review inventory system under a supply capacity contract. Oper. Res. 56(1):59–68.Link, Google Scholar
Chen S (2004) The infinite horizon periodic review problem with setup costs and capacity constraints: A partial characterization of the optimal policy. Oper. Res. 52(3):409–421.Link, Google Scholar
Chen S , Lambrecht M (1996) X-Y band and modified (s, S) policy. Oper. Res. 44(6):1013–1019.Link, Google Scholar
Chen L , Plambeck EL (2008) Dynamic inventory management with learning about the demand distribution and substitution probability. Manufacturing Service Oper. Management 10(2):236–256.Link, Google Scholar
Chen X , Simchi-Levi D (2004a) Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Oper. Res. 52(6):887–896.Link, Google Scholar
Chen X , Simchi-Levi D (2004b) Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The infinite horizon case. Math. Oper. Res. 29(3):698–723.Link, Google Scholar
Chen B , Chao X , Ahn HS (2019) Coordinating pricing and inventory replenishment with nonparametric demand learning. Oper. Res. 67(4):1035–1052.Abstract, Google Scholar
Chen B , Chao X , Shi C (2021) Nonparametric learning algorithms for joint pricing and inventory control with lost-sales and censored demand. Math. Oper. Res. Forthcoming.Google Scholar
Chen Y , Ray S , Song Y (2006) Optimal pricing and inventory control policy in periodic-review systems with fixed ordering cost and lost sales. Naval Res. Logist. 53(2):117–136.Crossref, Google Scholar
Chen W , Shi C , Duenyas I (2020) Optimal learning algorithms for stochastic inventory systems with random capacities. Production Oper. Management 29(7):1624–1649.Crossref, Google Scholar
Cheung M , Elmachtoub AN , Levi R , Shmoys DB (2016) The submodular joint replenishment problem. Math. Programming 158(1–2):207–233.Crossref, Google Scholar
Chu LY , Shanthikumar JG , Shen ZJM (2008) Solving operational statistics via a Bayesian analysis. Oper. Res. Lett. 36(1):110–116.Crossref, Google Scholar
Federgruen A , Zipkin P (1984) An efficient algorithm for computing optimal (s, S) policies. Oper. Res. 32(6):1268–1285.Link, Google Scholar
Feng Q (2010) Integrating dynamic pricing and replenishment decisions under supply capacity uncertainty. Management Sci. 56(12):2154–2172.Link, Google Scholar
Gallego G , Özer Ö (2001) Integrating replenishment decisions with advance demand information. Management Sci. 47(10):1344–1360.Link, Google Scholar
Gallego G , Scheller-Wolf A (2000) Capacitated inventory problems with fixed order costs: Some optimal policy structure. Eur. J. Oper. Res. 126(3):603–613.Crossref, Google Scholar
Gavirneni S (2001) An efficient heuristic for inventory control when the customer is using a (s, S) policy. Oper. Res. Lett. 28(4):187–192.Crossref, Google Scholar
Godfrey GA , Powell WB (2001) An adaptive, distribution-free algorithm for the newsvendor problem with censored demands, with applications to inventory and distribution. Management Sci. 47(8):1101–1112.Link, Google Scholar
Guan Y , Miller AJ (2008) Polynomial-time algorithms for stochastic uncapacitated lot-sizing problems. Oper. Res. 56(5):1172–1183.Link, Google Scholar
Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
Hu P , Lu Y , Song M (2019) Joint pricing and inventory control with fixed and convex/concave variable production costs. Production Oper. Management 28(4):847–877.Crossref, Google Scholar
Huang K , Küçükyavuz S (2008) On stochastic lot-sizing problems with random lead times. Oper. Res. Lett. 36(3):303–308.Crossref, Google Scholar
Huh WT , Janakiraman G (2008) (s, S) optimality in joint inventory-pricing control: An alternate approach. Oper. Res. 56(3):783–790.Link, Google Scholar
Huh WT , Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Huh WT , Janakiraman G , Muckstadt JA , Rusmevichientong P (2009) An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand. Math. Oper. Res. 34(2):397–416.Link, Google Scholar
Huh WT , Levi R , Rusmevichientong P , Orlin JB (2011) Adaptive data-driven inventory control with censored demand based on kaplan-meier estimator. Oper. Res. 59(4):929–941.Link, Google Scholar
Iglehart DL (1963) Optimality of (s, S) policies in the infinite horizon dynamic inventory problem. Management Sci. 9(2):259–267.Link, Google Scholar
Iglehart DL (1964) The dynamic inventory problem with unknown demand distribution. Management Sci. 10(3):429–440.Link, Google Scholar
Khouja M , Goyal S (2008) A review of the joint replenishment problem literature: 1989–2005. Eur. J. Oper. Res. 186(1):1–16.Crossref, Google Scholar
Kleinberg R , Slivkins A , Upfal E (2008) Multi-armed bandits in metric spaces. Dwork C, ed. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
Kleywegt AJ , Shapiro A , Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.Crossref, Google Scholar
Levi R , Shi C (2013) Approximation algorithms for the stochastic lot-sizing problem with order lead times. Oper. Res. 61(3):593–602.Link, Google Scholar
Levi R , Perakis G , Uichanco J (2015) The data-driven newsvendor problem: New bounds and insights. Oper. Res. 63(6):1294–1306.Link, Google Scholar
Levi R , Roundy RO , Shmoys DB (2007) Provably near-optimal sampling-based policies for stochastic inventory control models. Math. Oper. Res. 32(4):821–839.Link, Google Scholar
Lim V (2016) How poor inventory management ruined Target Canada. Tradegecko Inventory Management blog , March 2, https://www.tradegecko.com/blog/inventory-management/how-poor-inventory-management-ruined-target-canada.Google Scholar
Liyanage LH , Shanthikumar JG (2005) A practical inventory control policy using operational statistics. Oper. Res. Lett. 33(4):341–348.Crossref, Google Scholar
Lu X , Song JS , Zhu K (2005) On the censored newsvendor and the optimal acquisition of information. Oper. Res. 53(6):1024–1026.Link, Google Scholar
Lu X , Song JS , Zhu K (2008) Analysis of perishable-inventory systems with censored demand data. Oper. Res. 56(4):1034–1038.Link, Google Scholar
Murray GR , Silver EA (1966) A Bayesian analysis of the style goods inventory problem. Management Sci. 12(11):785–797.Link, Google Scholar
Nagarajan V , Shi C (2016) Approximation algorithms for inventory problems with submodular or routing costs. Math. Programming 160(1–2):225–244.Crossref, Google Scholar
Özer Ö , Wei W (2004) Inventory control with limited capacity and advance demand information. Oper. Res. 52(6):988–1000.Link, Google Scholar
Pang Z , Chen FY , Feng Y (2012) A note on the structure of joint inventory-pricing control with leadtimes. Oper. Res. 60(3):581–587.Link, Google Scholar
Perakis G , Roels G (2008) Regret in the newsvendor model with partial information. Oper. Res. 56(1):188–203.Link, Google Scholar
Powell W , Ruszczyński A , Topaloglu H (2004) Learning algorithms for separable approximations of discrete stochastic optimization problems. Math. Oper. Res. 29(4):814–836.Link, Google Scholar
Ross SM (1996) Stochastic Processes , 2nd ed. (John Wiley & Sons, New York).Google Scholar
Scarf H (1959) Bayes solutions of the statistical inventory problem. Ann. Math. Statist. 30(2):490–508.Crossref, Google Scholar
Scarf H (1960) The Optimality of (S, s) Policies in the Dynamic Inventory Problem , Mathematical Methods in the Social Sciences (Stanford Univ. Press, CA).Google Scholar
Sethi SP , Cheng F (1997) Optimality of (s, S) policies in inventory models with markovian demand. Oper. Res. 45(6):931–939.Link, Google Scholar
Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.Crossref, Google Scholar
Shi C , Chen W , Duenyas I (2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Oper. Res. 64(2):362–370.Link, Google Scholar
Shi C , Zhang H , Chao X , Levi R (2014) Approximation algorithms for capacitated stochastic inventory systems with setup costs. Naval Res. Logist. 61(4):304–319.Crossref, Google Scholar
Simchi-Levi D , Chen X , Bramel J (2014) The Logic of Logistics: Theory, Algorithms, and Applications for Logistics and Supply Chain Management , 3rd ed., Springer Series in Operations Research and Financial Engineering (Springer, New York).Crossref, Google Scholar
Veinott AF Jr (1966) The status of mathematical inventory theory. Management Sci. 12(11):745–777.Link, Google Scholar
Veinott AF Jr , Wagner HM (1965) Computing optimal (s, S) inventory policies. Management Sci. 11(5):525–552.Link, Google Scholar
Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge Univ. Press, UK).Crossref, Google Scholar
Zhang H , Chao X , Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.Link, Google Scholar
Zhang H , Chao X , Shi C (2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Sci. 66(5):1962–1980.Link, Google Scholar
Zheng YS (1991) A simple proof for optimality of (s, S) policies in infinite-horizon inventory systems. J. Appl. Probab. 28(4):802–810.Crossref, Google Scholar
Zheng YS , Federgruen A (1991) Finding optimal (s, S) policies is about as simple as evaluating a single policy. Oper. Res. 39(4):654–665.Link, Google Scholar
Zipkin P (2000) Foundations of Inventory Management (McGraw-Hill, New York).Google Scholar
Zipkin P (2008) On the structure of lost-sales inventory models. Oper. Res. 56(4):937–944.Link, Google Scholar

Volume 67, Issue 10

October 2021

Pages 5969-6627, iii-iv

Article Information

Metrics

Information

Received:February 11, 2019
Accepted:August 02, 2020
Published Online:February 08, 2021

Cite as

Hao Yuan , Qi Luo , Cong Shi (2021) Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs. Management Science 67(10):6089-6115.

https://doi.org/10.1287/mnsc.2020.3799

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Marrying Stochastic Gradient Descent with Bandits: Learning Algorithms for Inventory Systems with Fixed Costs

References

Volume 67, Issue 10

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News