MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

Shipra Agrawal
Shipra Agrawal
Department of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Vashist Avadhanula
Corresponding Author
Vashist Avadhanula
http://orcid.org/0000-0002-3045-691X
Decision, Risk, and Operations Division, Columbia Business School, Columbia University, New York, New York 10027
Search for more papers by this author
,
Vineet Goyal
Vineet Goyal
Department of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Assaf Zeevi
Assaf Zeevi
Decision, Risk, and Operations Division, Columbia Business School, Columbia University, New York, New York 10027
Search for more papers by this author

Department of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science, Columbia University, New York, New York 10027;

Search for more papers by this author

Vashist Avadhanula

Corresponding Author

Vashist Avadhanula

http://orcid.org/0000-0002-3045-691X

Decision, Risk, and Operations Division, Columbia Business School, Columbia University, New York, New York 10027

Search for more papers by this author

Vineet Goyal

Department of Industrial Engineering and Operations Research, Fu Foundation School of Engineering and Applied Science, Columbia University, New York, New York 10027;

Search for more papers by this author

Assaf Zeevi

Decision, Risk, and Operations Division, Columbia Business School, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:10 Sep 2019https://doi.org/10.1287/opre.2018.1832

References

Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):30:1–30:24.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. Machine Learn. Res. 65: 76–78.Google Scholar
Angluin D, Valiant LG (1977) Fast probabilistic algorithms for hamiltonian circuits and matchings. Proc. 9th Annual ACM Sympos. Theory Comput. (STOC ’77) (Elsevier, New York), 30–41.Crossref, Google Scholar
Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(November):397–422.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Avadhanula V, Bhandari J, Goyal V, Zeevi A (2016) On the tightness of an LP relaxation for rational optimization and its applications. Oper. Res. Lett. 44(5):612–617.Crossref, Google Scholar
Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):Article 4.Google Scholar
Ben-Akiva M, Lerman S (1985) Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press Series in Transportation Studies, vol. 9 (MIT Press, Cambridge, MA).Google Scholar
Blanchet J, Gallego G, Goyal V (2016) A Markov chain approximation to choice modeling. Oper. Res. 64(4):886–905.Link, Google Scholar
Borovkov AA (1984) Mathematical Statistics: Estimation of Parameters, Testing of Hypotheses (in Russian) (Nauka, Moscow).Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Machine Learn. 5(1):1–122.Crossref, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2249–2257.Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. Machine Learn. Res. 28:151–159.Google Scholar
Chen X, Wang Y (2018) A note on tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
Davis J, Gallego G, Topaloglu H (2013) Assortment planning under the multinomial logit model with totally unimodular constraint structures. Technical report, Cornell University, Ithaca, NY.Google Scholar
Davis JM, Gallego G, Topaloglu H (2014) Assortment optimization under variants of the nested logit model. Oper. Res. 62(2):250–273.Link, Google Scholar
Désir A, Goyal V, Zhang J (2014) Near-optimal algorithms for capacity constrained assortment optimization. Working paper, Columbia University, New York.Google Scholar
Désir A, Goyal V, Segev D, Ye C (2015) Capacity constrained assortment optimization under the Markov chain based choice model. Working paper, Columbia University, New York.Crossref, Google Scholar
Farias V, Jagabathula S, Shah D (2013) A nonparametric approach to modeling choice with limited data. Management Sci. 59(2):305–322.Link, Google Scholar
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
Gallego G, Topaloglu H (2014) Constrained assortment optimization for the nested logit model. Management Sci. 60(10):2583–2601.Link, Google Scholar
Gallego G, Ratliff R, Shebalov S (2014) A general attraction model and sales-based linear program for network revenue management under customer choice. Oper. Res. 63(1):212–232.Link, Google Scholar
Kallus N, Udell M (2016) Dynamic assortment personalization in high dimensions. Working paper, Cornell University, Ithaca, NY.Google Scholar
Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (STOC ’08) (ACM, New York), 681–690.Crossref, Google Scholar
Kök AG, Fisher ML (2007) Demand estimation and assortment optimization under substitution: Methodology and application. Oper. Res. 55(6):1001–1021.Link, Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Li G, Rusmevichientong P, Topaloglu H (2015) The d-level nested logit model: Assortment and price optimization problems. Oper. Res. 63(2):325–342.Link, Google Scholar
Lichman M (2013) UCI Machine Learning Repository. Accessed March 27, 2019, http://archive.ics.uci.edu/ml/datasets/car+evaluation.Google Scholar
Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, New York).Google Scholar
May BC, Korda N, Lee A, Leslie DS (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069–2106.Google Scholar
McFadden D (1978) Modeling the choice of residential location. Transportation Res. Record (673):72–77.Google Scholar
Mitzenmacher M, Upfal E (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Plackett RL (1975) The analysis of permutations. Appl. Statist. 24(2):193–202.Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Rusmevichientong P, Shen ZM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Talluri K, van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
Train KE (2009) Discrete Choice Methods with Simulation, 2nd ed. (Cambridge University Press, New York).Crossref, Google Scholar
Williams HCWL (1977) On the formation of travel demand models and economic evaluation measures of user benefit. Environ. Planning A Econom. Space 9(3):285–344.Crossref, Google Scholar

Volume 67, Issue 5

September-October 2019

Pages ii-iv, 1209-1502

Article Information

Metrics

Information

Received:February 27, 2017
Accepted:October 29, 2018
Published Online:September 10, 2019

Cite as

Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi (2019) MNL-Bandit: A Dynamic Learning Approach to Assortment Selection. Operations Research 67(5):1453-1485.

https://doi.org/10.1287/opre.2018.1832

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

References

Volume 67, Issue 5

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News