Online Joint Assortment-Inventory Optimization Under MNL Choices

Yong Liang
Yong Liang
[email protected]
https://orcid.org/0000-0002-7052-2248
Research Center for Contemporary Management and School of Economics and Management, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Xiaojie Mao
Corresponding Author
Xiaojie Mao
[email protected]
https://orcid.org/0000-0003-2985-1741
Research Center for Contemporary Management and School of Economics and Management, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Shiyuan Wang
Shiyuan Wang
[email protected]
https://orcid.org/0009-0001-6199-3283
Department of Operations Management, College of Business, Shanghai University of Finance and Economics, Shanghai 200433, China
Search for more papers by this author

Research Center for Contemporary Management and School of Economics and Management, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Xiaojie Mao

Corresponding Author

Xiaojie Mao

[email protected]

https://orcid.org/0000-0003-2985-1741

Research Center for Contemporary Management and School of Economics and Management, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Shiyuan Wang

[email protected]

https://orcid.org/0009-0001-6199-3283

Department of Operations Management, College of Business, Shanghai University of Finance and Economics, Shanghai 200433, China

Search for more papers by this author

Published Online:7 Apr 2026https://doi.org/10.1287/opre.2023.0167

References

Agrawal R (1995) Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probability 27(4):1054–1078.Crossref, Google Scholar
Agrawal S (2019) Recent advances in multiarmed bandits for sequential decision making. Operation Research and Management Science in the Age of Analytics (INFORMS, Cantonsville, MD), 167–188. Link, Google Scholar
Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Proc. 16th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 99–107.Google Scholar
Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):1–24.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Kale S, Shamir O, eds. Proc. 2017 Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 65 (PMLR, New York), 76–78.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Aouad A, Segev D (2022) The stability of MNL-based demand under dynamic customer substitution and its algorithmic implications. Oper. Res. 71(4):1216–1249.Link, Google Scholar
Aouad A, Levi R, Segev D (2018) Greedy-like algorithms for dynamic assortment planning under multinomial logit preferences. Oper. Res. 66(5):1321–1345.Link, Google Scholar
Aouad A, Levi R, Segev D (2019) Approximation algorithms for dynamic assortment optimization models. Math. Oper. Res. 44(2):487–511.Link, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Bensoussan A, Cakanyıldırım M, Sethi SP (2007) A multiperiod newsvendor problem with partially observed demand. Math. Oper. Res. 32(2):322–344.Link, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122. Crossref, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Chen B, Chao X (2020) Dynamic inventory control with stockout substitution and demand learning. Management Sci. 66(11):5108–5127.Link, Google Scholar
Chen X, Wang Y (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28 (PMLR, New York), 151–159.Google Scholar
Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(216):1–44.Google Scholar
Chen X, Wang Y, Zhou Y (2021a) Optimal policy for dynamic assortment planning under multinomial logit models. Math. Oper. Res. 46(4):1639–1657.Link, Google Scholar
Chen X, Shi C, Wang Y, Zhou Y (2021b) Dynamic assortment planning under nested logit models. Production Oper. Management 30(1):85–102.Crossref, Google Scholar
Cheung WC, Ma W, Simchi-Levi D, Wang X (2022) Inventory balancing with online learning. Management Sci. 68(3):1776–1807.Link, Google Scholar
Ding X, Puterman ML, Bisi A (2002) The censored newsvendor and the optimal acquisition of information. Oper. Res. 50(3):517–527.Link, Google Scholar
Dobson AJ, Barnett AG (2018) An Introduction to Generalized Linear Models (Chapman and Hall/CRC, New York).Google Scholar
Dzyabura D, Jagabathula S (2018) Offline assortment optimization in the presence of an online channel. Management Sci. 64(6):2767–2786.Link, Google Scholar
Farahat A, Lee J (2018) The multiproduct newsvendor problem with customer choice. Oper. Res. 66(1):123–136.Link, Google Scholar
Farias VF, Jagabathula S, Shah D (2013) A nonparametric approach to modeling choice with limited data. Management Sci. 59(2):305–322.Link, Google Scholar
Feldman J, Zhang DJ, Liu X, Zhang N (2022) Customer choice models vs. machine learning: Finding optimal product displays on Alibaba. Oper. Res. 70(1):309–328.Link, Google Scholar
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Proc. 24th Internat. Conf. Neural Inform. Process. Syst., vol. 1 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
Gao X, Zhang H (2022) An efficient learning framework for multiproduct inventory systems with customer choices. Production Oper. Management 31(6):2492–2516.Crossref, Google Scholar
Gao P, Ma Y, Chen N, Gallego G, Li A, Rusmevichientong P, Topaloglu H (2021) Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection. Oper. Res. 69(5):1509–1532.Link, Google Scholar
Goyal V, Levi R, Segev D (2016) Near-optimal algorithms for the assortment planning problem under dynamic substitution and stochastic demand. Oper. Res. 64(1):219–235.Link, Google Scholar
Honhon D, Seshadri S (2013) Fixed vs. random proportions demand models for the assortment planning problem under stockout-based substitution. Manufacturing Service Oper. Management 15(3):378–386.Link, Google Scholar
Honhon D, Gaur V, Seshadri S (2010) Assortment planning and inventory decisions under stockout-based substitution. Oper. Res. 58(5):1364–1379.Link, Google Scholar
Kamishima T (2003) Nantonac collaborative filtering: Recommendation based on order responses. Proc 9th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 583–588.Google Scholar
Kok AG, Fisher ML, Vaidyanathan R (2008) Assortment planning: Review of literature and industry practice. Retail Supply Chain Management 122(1):99–153.Crossref, Google Scholar
Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Li G, Rusmevichientong P, Topaloglu H (2015) The d-level nested logit model: Assortment and price optimization problems. Oper. Res. 63(2):325–342.Link, Google Scholar
Liang A, Jasin S, Uichanco J (2021) Assortment and inventory planning under dynamic substitution with MNL model: An LP approach and an asymptotically optimal policy. Technical report, University of Michigan, Ann Arbor.Google Scholar
Lu X, Song JS, Zhu K (2005) On “the censored newsvendor and the optimal acquisition of information.” Oper. Res. 53(6):1024–1026.Link, Google Scholar
Luo Y, Sun WW, Liu Y (2024) Distribution-free contextual dynamic pricing. Math. Oper. Res. 49(1):599–618.Link, Google Scholar
Lyu J, Xie J, Yuan S, Zhou Y (2025) A minibatch stochastic gradient descent-based learning metapolicy for inventory systems with myopic optimal policy. Management Sci. 71(7):5572–5588.Google Scholar
Mahajan S, van Ryzin G (2001) Stocking retail assortments under dynamic consumer substitution. Oper. Res. 49(3):334–351.Link, Google Scholar
Mouchtaki O, Housni OE, Gallego G, Goyal V, Humair S, Kim S, Sadighian A, et al. (2026) Joint assortment and inventory planning under the Markov chain choice model. Management Sci., ePub ahead of print February 4, https://doi.org/10.1287/mnsc.2023.01322Google Scholar
Oh M, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
Oh M, Iyengar G (2021) Multinomial logit contextual bandits: Provable optimality and practicality. Proc. AAAI Conf. Artificial Intelligence 35(10):9205–9213.Crossref, Google Scholar
Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Rusmevichientong P, Shmoys D, Tong C, Topaloglu H (2014) Assortment optimization under the multinomial logit model with random choice parameters. Production Oper. Management 23(11):2023–2039.Crossref, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Segev D (2019) Assortment planning with nested preferences: Dynamic programming with distributions as states? Algorithmica 81(1):393–417.Crossref, Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
Smith SA, Agrawal N (2000) Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1):50–64.Link, Google Scholar
Sumida M, Gallego G, Rusmevichientong P, Topaloglu H, Davis J (2021) Revenue-utility tradeoff in assortment optimization under the multinomial logit model with totally unimodular constraints. Management Sci. 67(5):2845–2869.Link, Google Scholar
Sun S, Udwani R, Shen ZJM (2025) A unified algorithmic framework for dynamic assortment optimization under MNL choice. Proc. 26th ACM Conf. Econom. Comput. (ACM, New York), 789.Google Scholar
Talluri K, van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
van Ryzin G, Mahajan S (1999) On the relationship between inventory costs and variety benefits in retail assortments. Management Sci. 45(11):1496–1509.Link, Google Scholar
Zhang J, Ma W, Topaloglu H (2025) Technical note—Leveraging the degree of dynamic substitution in assortment and inventory planning. Oper. Res. 73(3):1248–11259.Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:April 03, 2023
Accepted:February 08, 2026
Published Online:April 07, 2026

Cite as

Yong Liang, Xiaojie Mao, Shiyuan Wang (2026) Online Joint Assortment-Inventory Optimization Under MNL Choices. Operations Research 0(0).

https://doi.org/10.1287/opre.2023.0167

Keywords

Acknowledgments

The authors thank the area editor Professor Gustavo Vulcano, anonymous associate editor, and reviewers for insightful comments and suggestions that led to significant improvement of this paper. Authors are listed in alphabetical order.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Joint Assortment-Inventory Optimization Under MNL Choices

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News