Tiered Assortment: Optimization and Online Learning

Junyu Cao
Corresponding Author
Junyu Cao
[email protected]
https://orcid.org/0000-0001-9235-1411
McCombs School of Business, The University of Texas at Austin, Austin, Texas 78712;
Search for more papers by this author
,
Wei Sun
Wei Sun
[email protected]
https://orcid.org/0000-0002-5352-7629
IBM Research, Yorktown Heights, New York 10598
Search for more papers by this author

Junyu Cao

Corresponding Author

Junyu Cao

[email protected]

https://orcid.org/0000-0001-9235-1411

McCombs School of Business, The University of Texas at Austin, Austin, Texas 78712;

Search for more papers by this author

Wei Sun

[email protected]

https://orcid.org/0000-0002-5352-7629

IBM Research, Yorktown Heights, New York 10598

Search for more papers by this author

Published Online:4 Oct 2023https://doi.org/10.1287/mnsc.2023.4940

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Process. Systems 24:2312–2320.Google Scholar
Agrawal P, Tulabandhula T, Avadhanula V (2023) A tractable online learning algorithm for the multinomial logit contextual bandit. Eur. J. Oper. Res. Forthcoming.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. 2017 Conf. Learn. Theory, vol. 65 (PMLR, New York), 76–78.Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Aouad A, Segev D (2021) Display optimization for vertically differentiated locations under multinomial logit preferences. Management Sci. 67(6):3519–3550.Link, Google Scholar
Aouad A, Levi R, Segev D (2018) Greedy-like algorithms for dynamic assortment planning under multinomial logit preferences. Oper. Res. 66(5):1321–1345.Link, Google Scholar
Asadpour A, Saberi A (2010) An approximation algorithm for max-min fair allocation of indivisible goods. SIAM J. Comput. 39(7):2970–2989.Crossref, Google Scholar
Baltas G (2003) Determinants of Internet advertising effectiveness: An empirical study. Internat. J. Marketing Res. 45(4):1–9.Google Scholar
Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Bansal N, Sviridenko M (2006) The Santa Claus problem. Proc. 38th Annual ACM Sympos. Theory Comput., STOC ’06 (Association for Computing Machinery, New York), 31–40.Google Scholar
Bateni M, Charikar M, Guruswami V (2009) MaxMin allocation via degree lower-bounded arborescences. Proc. 41st Annual ACM Sympos. Theory Comput., STOC ’09 (Association for Computing Machinery, New York), 543–552.Google Scholar
Bezáková I, Dani V (2005) Allocating indivisible goods. ACM SIGecom Exchanges 5(3):11–18.Crossref, Google Scholar
Bietti A, Agarwal A, Langford J (2021) A contextual bandit bake-off. J. Machine Learn. Res. 22:5928–5976.Google Scholar
Broussard G (2000) How advertising frequency can work to build online advertising effectiveness. Internat. J. Marketing Res. 42(4):1–13.Google Scholar
Cao J, Sun W (2019a) Dynamic learning of sequential choice bandit problem under marketing fatigue. Proc. 33rd AAAI Conf. Artificial Intelligence (AAAI-19), vol. 33 (AAAI Press, Palo Alto, CA).Google Scholar
Cao J, Sun W (2019b) Dynamic learning with frequent new product launches: A sequential multinomial logit bandit problem. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 912–920.Google Scholar
Cao J, Sun W, Shen ZJM (2023) Doubly adaptive cascading bandits with user abandonment. Preprint, submitted March 18, https://dx.doi.org/10.2139/ssrn.3355211.Google Scholar
Caprara A, Kellerer H, Pferschy U, Pisinger D (2000) Approximation algorithms for knapsack problems with cardinality constraints. Eur. J. Oper. Res. 123(2):333–345.Crossref, Google Scholar
Chakrabarty D, Chuzhoy J, Khanna S (2009) On allocating goods to maximize fairness. 2009 50th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 107–116.Google Scholar
Chen N, Li A, Yang S (2021) Revenue maximization and learning in products ranking. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 316–317.Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (PMLR, New York), 151–159.Google Scholar
Chen W, Wang Y, Yuan Y, Wang Q (2016) Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. J. Machine Learn. Res. 17(1):1746–1778.Google Scholar
Chen X, Wang Y (2018) A note on a tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(1):8918–8961.Google Scholar
Cheung WC, Simchi-Levi D (2017) Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Preprint, submitted November 21, https://dx.doi.org/10.2139/ssrn.3075658.Google Scholar
Cheung WC, Tan V, Zhong Z (2019) A Thompson sampling algorithm for cascading bandits. Proc. 22nd Internat. Conf. Artificial Intelligence Statist., vol. 89 (PMLR, New York), 438–447.Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 208–214.Google Scholar
Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
Combes R, Magureanu S, Proutiere A, Laroche C (2015a) Learning to rank: Regret lower bounds and efficient algorithms. Proc. 2015 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (ACM, New York), 231–244.Google Scholar
Combes R, Talebi Mazraeh Shahi MS, Proutiere A, Lelarge M (2015b) Combinatorial bandits revisited. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, NIPS’15, vol. 2 (MIT Press, Cambridge, MA), 2116–2124.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Conf. Learn. Theory, 355–366.Google Scholar
Dell’Amico M, Martello S (1999) Reduction of the three-partition problem. J. Combin. Optim. 3(1):17–30.Crossref, Google Scholar
den Boer AV, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–130.Link, Google Scholar
Derakhshan M, Golrezaei N, Manshadi V, Mirrokni V (2022) Product ranking on online platforms. Management Sci. 68(6):4024–4041.Link, Google Scholar
Fata E, Ma W, Simchi-Levi D (2019) Multi-stage and multi-customer assortment optimization with inventory constraints. Preprint, submitted August 26, https://arxiv.org/abs/1908.09808.Google Scholar
Feldman J, Segev D (2022) The multinomial logit model with sequential offerings: Algorithmic frameworks for product recommendation displays. Oper. Res. 70(4):2162–2184.Link, Google Scholar
Ferreira KJ, Parthasarathy S, Sekar S (2022) Learning to rank an assortment of products. Management Sci. 68(3):1828–1848.Link, Google Scholar
Fessenden T (2018) Scrolling and attention. Accessed February 26, 2021, https://www.nngroup.com/articles/scrolling-and-attention/.Google Scholar
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, NIPS’10, vol. 1 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
Flores A, Berbeglia G, Van Hentenryck P (2019) Assortment optimization under the sequential multinomial logit model. Eur. J. Oper. Res. 273(3):1052–1064.Crossref, Google Scholar
Gallego G, Iyengar G, Phillips R, Dubey A (2004) Managing flexible products on a network. Preprint, submitted July 1, https://dx.doi.org/10.2139/ssrn.3567371.Google Scholar
Gao P, Ma Y, Chen N, Gallego G, Li A, Rusmevichientong P, Topaloglu H (2021) Assortment optimization and pricing under the multinomial logit model with impatient customers: Sequential recommendation and selection. Oper. Res. 69:1509–1532.Link, Google Scholar
Gao X, Jasin S, Najafi S, Zhang H (2022) Joint learning and optimization for multi-product pricing (and ranking) under a general cascade click model. Management Sci. 68:7362–7382.Link, Google Scholar
Golrezaei N, Manshadi V, Schneider J, Sekar S (2022) Learning product rankings robust to fake users. Oper. Res. 71(4):1171–1196.Google Scholar
Jasin S, Lyu C, Najafi S, Zhang H (2023) Assortment optimization with multi-item basket purchase under multivariate MNL model. Manufacturing Service Oper. Management, ePub ahead of print July 24, https://pubsonline.informs.org/doi/abs/10.1287/msom.2021.0526.Google Scholar
Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
Javanmard A, Nazerzadeh H, Shao S (2020) Multi-product dynamic pricing in high-dimensions with heterogeneous price sensitivity. 2020 IEEE Internat. Sympos. Inform. Theory (ISIT) (IEEE, Piscataway, NJ), 2652–2657.Google Scholar
Kök AG, Fisher ML, Vaidyanathan R (2008) Assortment Planning: Review of Literature and Industry Practice. Retail Supply Chain Management. International Series in Operations Research & Management Science, vol. 122 (Springer, Boston), 99–153.Google Scholar
Kveton B, Szepesvari C, Wen Z, Ashkan A (2015a) Cascading bandits: Learning to rank in the cascade model. Proc. 32nd Internat. Conf. Internat. Conf. Machine Learn., vol. 37 (PMLR, New York), 767–776.Google Scholar
Kveton B, Wen Z, Ashkan A, Szepesvari C (2015b) Combinatorial cascading bandits. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, NIPS’15, vol. 1 (MIT Press, Cambridge, MA), 1450–1458.Google Scholar
Kveton B, Zaheer M, Szepesvari C, Li L, Ghavamzadeh M, Boutilier C (2020) Randomized exploration in generalized linear bandits. Proc. 23rd Internat. Conf. Artificial Intelligence and Statist., vol. 108 (PMLR, New York), 2066–2076.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Lei Y, Jasin S, Uichanco J, Vakhutinsky A (2021) Joint product framing (display, ranking, pricing) and order fulfillment under the multinomial logit model for e-commerce retailers. Manufacturing Service Oper. Management 24(3):1529–1546.Link, Google Scholar
Li H, Huh WT (2011) Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing Service Oper. Management 13(4):549–563.Link, Google Scholar
Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., ICML’17, vol. 70 ( JMLR.org) (ICML, San Diego, CA), 2071–2080.Google Scholar
Li S, Luo Q, Huang Z, Shi C (2022a) Online learning for constrained assortment optimization under Markov chain choice model. Preprint, submitted April 9, https://dx.doi.org/10.2139/ssrn.4079753.Google Scholar
Li W, Lee J, Shroff N (2022b) A faster FPTAS for knapsack problem with cardinality constraint. Discrete Appl. Math. 315:71–85.Crossref, Google Scholar
Liu N, Ma Y, Topaloglu H (2020) Assortment optimization under the multinomial logit model with sequential offerings. INFORMS J. Comput. 32:835–853.Link, Google Scholar
Liu Q, Van Ryzin G (2008) On the choice-based linear programming model for network revenue management. Manufacturing Service Oper. Management 10(2):288–310.Link, Google Scholar
Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
Mutznik O (2020) State of pagination in ecommerce: Case study & analysis of the top 150 UK fashion retailers. Accessed March 3, 2021, https://www.deepcrawl.com/blog/best-practice/state-of-pagination-in-ecommerce/.Google Scholar
Oh Mh, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Adv. Neural Inform. Process. Systems 32:3151–3161.Google Scholar
Oh Mh, Iyengar G (2021) Multinomial logit contextual bandits: Provable optimality and practicality. Proc. Conf. AAAI Artif. Intell. 35:9205–9213.Google Scholar
Ou M, Li N, Zhu S, Jin R (2018) Multinomial logit bandit with linear utility functions. Preprint, submitted May 8, https://arxiv.org/abs/1805.02971.Google Scholar
Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
Robbins H (1985) Some aspects of the sequential design of experiments. Herbert Robbins Selected Papers. Bull. Amer. Math. Soc. 58(5):527–535.Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Rusmevichientong P, Shmoys D, Tong C, Topaloglu H (2014) Assortment optimization under the multinomial logit model with random choice parameters. Production Oper. Management 23(11):2023–2039.Crossref, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Talluri K, Van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
Topaloglu H (2013) Joint stocking and product offer decisions under the multinomial logit model. Production Oper. Management 22(5):1182–1199.Crossref, Google Scholar
Train KE (2009) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Wang R (2012) Capacitated assortment and price optimization under the multinomial logit model. Oper. Res. Lett. 40(6):492–497.Crossref, Google Scholar
Xu Y, Wang Z (2023) Assortment optimization for a multistage choice model. Manufacturing Service Oper. Management 25(5):1748–1764.Google Scholar
Zhong Z, Chueng WC, Tan VY (2021) Thompson sampling algorithms for cascading bandits. J. Machine Learn. Res. 22(218):1–66.Google Scholar
Zong S, Ni H, Sung K, Ke NR, Wen Z, Kveton B (2016) Cascading bandits for large-scale recommendation problems. Preprint, submitted March 17, https://arxiv.org/abs/1603.05359.Google Scholar

Volume 70, Issue 8

August 2024

Pages v-vii, 4953-5625, iii-v

Article Information

Supplemental Material

Metrics

Information

Received:September 27, 2021
Accepted:April 09, 2023
Published Online:October 04, 2023

Cite as

Junyu Cao, Wei Sun (2023) Tiered Assortment: Optimization and Online Learning. Management Science 70(8):5481-5501.

https://doi.org/10.1287/mnsc.2023.4940

Keywords

Acknowledgments

The authors thank J. George Shanthikumar (department editor), the associate editor, and three anonymous referees for their constructive and insightful review comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Tiered Assortment: Optimization and Online Learning

References

Volume 70, Issue 8

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News