Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning

Yiyun Luo
Yiyun Luo
[email protected]
https://orcid.org/0009-0004-3788-1902
School of Statistics and Data Science, Shanghai University of Finance and Economics, Shanghai 200433, China
Search for more papers by this author
,
Will Wei Sun
Will Wei Sun
[email protected]
https://orcid.org/0000-0002-8412-6430
Daniels School of Business, Purdue University, West Lafayette, Indiana 47907
Search for more papers by this author
,
Yufeng Liu
Corresponding Author
Yufeng Liu
[email protected]
https://orcid.org/0000-0002-1686-0545
Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
Search for more papers by this author

School of Statistics and Data Science, Shanghai University of Finance and Economics, Shanghai 200433, China

Search for more papers by this author

Will Wei Sun

[email protected]

https://orcid.org/0000-0002-8412-6430

Daniels School of Business, Purdue University, West Lafayette, Indiana 47907

Search for more papers by this author

Yufeng Liu

Corresponding Author

Yufeng Liu

[email protected]

https://orcid.org/0000-0002-1686-0545

Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599

Search for more papers by this author

Published Online:11 Aug 2025https://doi.org/10.1287/opre.2024.1556

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2312–2320.Google Scholar
Abeliuk A, Berbeglia G, Cebrian M, Van Hentenryck P (2016) Assortment optimization under a multinomial logit model with position bias and social influence. 4OR 14(1):57–75.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Kale S, Shamir O, eds. Proc. 30th Conf. Learn. Theory (PMLR, New York), 76–78.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Aouad A, Segev D (2021) Display optimization for vertically differentiated locations under multinomial logit preferences. Management Sci. 67(6):3519–3550.Link, Google Scholar
Aouad A, Farias V, Levi R (2021) Assortment optimization under consider-then-choose choice models. Management Sci. 67(6):3368–3386.Link, Google Scholar
Aznag A, Goyal V, Perivier N (2021) MNL-bandit with knapsacks: A near optimal algorithm. Preprint, submitted June 2, https://arxiv.org/abs/2106.01135.Google Scholar
Berge C (1963) Topological Spaces (Oliver and Boyd, Edinburgh, UK).Google Scholar
Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N, Lugosi G (2013) Bandits with heavy tail. IEEE Trans. Inform. Theory 59(11):7711–7717.Crossref, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Chen X, Wang Y (2018) A note on a tight lower bound for capacitated mnl-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
Chen X, Wang Y, Zhou Y (2021a) Optimal policy for dynamic assortment planning under multinomial logit models. Math. Oper. Res. 46(4):1639–1657.Link, Google Scholar
Chen X, Shi C, Wang Y, Zhou Y (2021b) Dynamic assortment planning under nested logit models. Production Oper. Management 30(1):85–102.Crossref, Google Scholar
Chen J, Dong H, Wang X, Feng F, Wang M, He X (2023) Bias and debias in recommender system: A survey and future directions. ACM Trans. Inform. Systems 41(3):1–39.Crossref, Google Scholar
Cheung WC, Tan V, Zhong Z (2019) A Thompson sampling algorithm for cascading bandits. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 89 (PMLR, New York), 438–447.Google Scholar
Craswell N, Zoeter O, Taylor M, Ramsey B (2008) An experimental comparison of click position-bias models. Najork M, Broder AZ, Chakrabarti S, eds. Proc. 2008 Internat. Conf. Web Search Data Mining (Palo Alto, California), 87–94.Google Scholar
Feldman J, Segev D (2022) The multinomial logit model with sequential offerings: Algorithmic frameworks for product recommendation displays. Oper. Res. 70(4):2162–2184.Link, Google Scholar
Foussoul A, Goyal V, Gupta V (2023) MNL-bandit in non-stationary environments. Preprint, submitted March 4, https://arxiv.org/abs/2303.02504.Google Scholar
Gallego G, Li A, Truong VA, Wang X (2020) Approximation algorithms for product framing and pricing. Oper. Res. 68(1):134–160.Link, Google Scholar
Ke C, Wang R, Zhao Z (2023) Discrete choice models with piecewise linear utility: Modeling, estimation and pricing. Preprint, submitted March 20, http://dx.doi.org/10.2139/ssrn.4394213.Google Scholar
Kveton B, Szepesvari C, Wen Z, Ashkan A (2015) Cascading bandits: Learning to rank in the cascade model. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn., vol. 36 (PMLR, New York), 767–776.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Li S, Luo Q, Huang Z, Shi C (2025) Online learning for constrained assortment optimization under Markov chain choice model. Oper. Res. 73(1):109–138.Google Scholar
Li S, Wang B, Zhang S, Chen W (2016) Contextual combinatorial cascading bandits. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 1245–1253.Google Scholar
Medina AM, Yang S (2016) No-regret algorithms for heavy-tailed linear bandits. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 1642–1650.Google Scholar
Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Shen S, Chen X, Fang E, Lu J (2023) Combinatorial inference on the optimal assortment in multinomial logit models. Preprint, submitted January 28, https://arxiv.org/abs/2301.12254.Google Scholar
Wang R, Zhao Z, Ke C (2022) Modeling consumer choice and optimizing assortment under the threshold multinomial logit model. Preprint, submitted August 8, https://doi.org/10.2139/ssrn.4184044.Google Scholar
Zhalechian M, Keyvanshokooh E, Shi C, Van Oyen MP (2022) Online resource allocation with personalized learning. Oper. Res. 70(4):2138–2161.Link, Google Scholar

Volume 74, Issue 1

January-February 2026

Pages iii-vii, 1-571, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:March 04, 2024
Accepted:June 23, 2025
Published Online:August 11, 2025

Cite as

Yiyun Luo, Will Wei Sun, Yufeng Liu (2025) Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning. Operations Research 74(1):224-242.

https://doi.org/10.1287/opre.2024.1556

Keywords

Acknowledgments

The authors thank the editor-in-chief (Amy R. Ward) and area editor (Xi Chen) for guidance and oversight throughout the review process and the associate editor and anonymous reviewers for insightful comments and constructive suggestions. The code and data to support the numerical experiments in this paper can be found at https://github.com/yiyun851/Assortment-Positioning/blob/main/Code_Data.zip.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning

References

Volume 74, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News