Online Joint Assortment-Inventory Optimization Under MNL Choices

Published Online:https://doi.org/10.1287/opre.2023.0167

References

  • Agrawal R (1995) Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probability 27(4):1054–1078.CrossrefGoogle Scholar
  • Agrawal S (2019) Recent advances in multiarmed bandits for sequential decision making. Operation Research and Management Science in the Age of Analytics (INFORMS, Cantonsville, MD), 167–188. LinkGoogle Scholar
  • Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Proc. 16th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 99–107.Google Scholar
  • Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):1–24.CrossrefGoogle Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Kale S, Shamir O, eds. Proc. 2017 Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 65 (PMLR, New York), 76–78.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Aouad A, Segev D (2022) The stability of MNL-based demand under dynamic customer substitution and its algorithmic implications. Oper. Res. 71(4):1216–1249.LinkGoogle Scholar
  • Aouad A, Levi R, Segev D (2018) Greedy-like algorithms for dynamic assortment planning under multinomial logit preferences. Oper. Res. 66(5):1321–1345.LinkGoogle Scholar
  • Aouad A, Levi R, Segev D (2019) Approximation algorithms for dynamic assortment optimization models. Math. Oper. Res. 44(2):487–511.LinkGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Bensoussan A, Cakanyıldırım M, Sethi SP (2007) A multiperiod newsvendor problem with partially observed demand. Math. Oper. Res. 32(2):322–344.LinkGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122. CrossrefGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Chen B, Chao X (2020) Dynamic inventory control with stockout substitution and demand learning. Management Sci. 66(11):5108–5127.LinkGoogle Scholar
  • Chen X, Wang Y (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.CrossrefGoogle Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28 (PMLR, New York), 151–159.Google Scholar
  • Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(216):1–44.Google Scholar
  • Chen X, Wang Y, Zhou Y (2021a) Optimal policy for dynamic assortment planning under multinomial logit models. Math. Oper. Res. 46(4):1639–1657.LinkGoogle Scholar
  • Chen X, Shi C, Wang Y, Zhou Y (2021b) Dynamic assortment planning under nested logit models. Production Oper. Management 30(1):85–102.CrossrefGoogle Scholar
  • Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.LinkGoogle Scholar
  • Ding X, Puterman ML, Bisi A (2002) The censored newsvendor and the optimal acquisition of information. Oper. Res. 50(3):517–527.LinkGoogle Scholar
  • Dobson AJ, Barnett AG (2018) An Introduction to Generalized Linear Models (Chapman and Hall/CRC, New York).Google Scholar
  • Dzyabura D, Jagabathula S (2018) Offline assortment optimization in the presence of an online channel. Management Sci. 64(6):2767–2786.LinkGoogle Scholar
  • Farahat A, Lee J (2018) The multiproduct newsvendor problem with customer choice. Oper. Res. 66(1):123–136.LinkGoogle Scholar
  • Farias VF, Jagabathula S, Shah D (2013) A nonparametric approach to modeling choice with limited data. Management Sci. 59(2):305–322.LinkGoogle Scholar
  • Feldman J, Zhang DJ, Liu X, Zhang N (2022) Customer choice models vs. machine learning: Finding optimal product displays on Alibaba. Oper. Res. 70(1):309–328.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Proc. 24th Internat. Conf. Neural Inform. Process. Syst., vol. 1 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
  • Gao X, Zhang H (2022) An efficient learning framework for multiproduct inventory systems with customer choices. Production Oper. Management 31(6):2492–2516.CrossrefGoogle Scholar
  • Gao P, Ma Y, Chen N, Gallego G, Li A, Rusmevichientong P, Topaloglu H (2021) Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection. Oper. Res. 69(5):1509–1532.LinkGoogle Scholar
  • Goyal V, Levi R, Segev D (2016) Near-optimal algorithms for the assortment planning problem under dynamic substitution and stochastic demand. Oper. Res. 64(1):219–235.LinkGoogle Scholar
  • Honhon D, Seshadri S (2013) Fixed vs. random proportions demand models for the assortment planning problem under stockout-based substitution. Manufacturing Service Oper. Management 15(3):378–386.LinkGoogle Scholar
  • Honhon D, Gaur V, Seshadri S (2010) Assortment planning and inventory decisions under stockout-based substitution. Oper. Res. 58(5):1364–1379.LinkGoogle Scholar
  • Kamishima T (2003) Nantonac collaborative filtering: Recommendation based on order responses. Proc 9th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 583–588.Google Scholar
  • Kok AG, Fisher ML, Vaidyanathan R (2008) Assortment planning: Review of literature and industry practice. Retail Supply Chain Management 122(1):99–153.CrossrefGoogle Scholar
  • Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li G, Rusmevichientong P, Topaloglu H (2015) The d-level nested logit model: Assortment and price optimization problems. Oper. Res. 63(2):325–342.LinkGoogle Scholar
  • Liang A, Jasin S, Uichanco J (2021) Assortment and inventory planning under dynamic substitution with MNL model: An LP approach and an asymptotically optimal policy. Technical report, University of Michigan, Ann Arbor.Google Scholar
  • Lu X, Song JS, Zhu K (2005) On “the censored newsvendor and the optimal acquisition of information.” Oper. Res. 53(6):1024–1026.LinkGoogle Scholar
  • Luo Y, Sun WW, Liu Y (2024) Distribution-free contextual dynamic pricing. Math. Oper. Res. 49(1):599–618.LinkGoogle Scholar
  • Lyu J, Xie J, Yuan S, Zhou Y (2025) A minibatch stochastic gradient descent-based learning metapolicy for inventory systems with myopic optimal policy. Management Sci. 71(7):5572–5588.Google Scholar
  • Mahajan S, van Ryzin G (2001) Stocking retail assortments under dynamic consumer substitution. Oper. Res. 49(3):334–351.LinkGoogle Scholar
  • Mouchtaki O, Housni OE, Gallego G, Goyal V, Humair S, Kim S, Sadighian A, et al. (2026) Joint assortment and inventory planning under the Markov chain choice model. Management Sci., ePub ahead of print February 4, https://doi.org/10.1287/mnsc.2023.01322Google Scholar
  • Oh M, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Oh M, Iyengar G (2021) Multinomial logit contextual bandits: Provable optimality and practicality. Proc. AAAI Conf. Artificial Intelligence 35(10):9205–9213.CrossrefGoogle Scholar
  • Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Rusmevichientong P, Shmoys D, Tong C, Topaloglu H (2014) Assortment optimization under the multinomial logit model with random choice parameters. Production Oper. Management 23(11):2023–2039.CrossrefGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Segev D (2019) Assortment planning with nested preferences: Dynamic programming with distributions as states? Algorithmica 81(1):393–417.CrossrefGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Smith SA, Agrawal N (2000) Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1):50–64.LinkGoogle Scholar
  • Sumida M, Gallego G, Rusmevichientong P, Topaloglu H, Davis J (2021) Revenue-utility tradeoff in assortment optimization under the multinomial logit model with totally unimodular constraints. Management Sci. 67(5):2845–2869.LinkGoogle Scholar
  • Sun S, Udwani R, Shen ZJM (2025) A unified algorithmic framework for dynamic assortment optimization under MNL choice. Proc. 26th ACM Conf. Econom. Comput. (ACM, New York), 789.Google Scholar
  • Talluri K, van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.LinkGoogle Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • van Ryzin G, Mahajan S (1999) On the relationship between inventory costs and variety benefits in retail assortments. Management Sci. 45(11):1496–1509.LinkGoogle Scholar
  • Zhang J, Ma W, Topaloglu H (2025) Technical note—Leveraging the degree of dynamic substitution in assortment and inventory planning. Oper. Res. 73(3):1248–11259.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.