MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

Published Online:https://doi.org/10.1287/opre.2018.1832

References

  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
  • Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):30:1–30:24.CrossrefGoogle Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2017) Thompson sampling for the MNL-bandit. Proc. Machine Learn. Res. 65: 76–78.Google Scholar
  • Angluin D, Valiant LG (1977) Fast probabilistic algorithms for hamiltonian circuits and matchings. Proc. 9th Annual ACM Sympos. Theory Comput. (STOC ’77) (Elsevier, New York), 30–41.CrossrefGoogle Scholar
  • Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(November):397–422.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Avadhanula V, Bhandari J, Goyal V, Zeevi A (2016) On the tightness of an LP relaxation for rational optimization and its applications. Oper. Res. Lett. 44(5):612–617.CrossrefGoogle Scholar
  • Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):Article 4.Google Scholar
  • Ben-Akiva M, Lerman S (1985) Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press Series in Transportation Studies, vol. 9 (MIT Press, Cambridge, MA).Google Scholar
  • Blanchet J, Gallego G, Goyal V (2016) A Markov chain approximation to choice modeling. Oper. Res. 64(4):886–905.LinkGoogle Scholar
  • Borovkov AA (1984) Mathematical Statistics: Estimation of Parameters, Testing of Hypotheses (in Russian) (Nauka, Moscow).Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2249–2257.Google Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. Machine Learn. Res. 28:151–159.Google Scholar
  • Chen X, Wang Y (2018) A note on tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.CrossrefGoogle Scholar
  • Davis J, Gallego G, Topaloglu H (2013) Assortment planning under the multinomial logit model with totally unimodular constraint structures. Technical report, Cornell University, Ithaca, NY.Google Scholar
  • Davis JM, Gallego G, Topaloglu H (2014) Assortment optimization under variants of the nested logit model. Oper. Res. 62(2):250–273.LinkGoogle Scholar
  • Désir A, Goyal V, Zhang J (2014) Near-optimal algorithms for capacity constrained assortment optimization. Working paper, Columbia University, New York.Google Scholar
  • Désir A, Goyal V, Segev D, Ye C (2015) Capacity constrained assortment optimization under the Markov chain based choice model. Working paper, Columbia University, New York.CrossrefGoogle Scholar
  • Farias V, Jagabathula S, Shah D (2013) A nonparametric approach to modeling choice with limited data. Management Sci. 59(2):305–322.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
  • Gallego G, Topaloglu H (2014) Constrained assortment optimization for the nested logit model. Management Sci. 60(10):2583–2601.LinkGoogle Scholar
  • Gallego G, Ratliff R, Shebalov S (2014) A general attraction model and sales-based linear program for network revenue management under customer choice. Oper. Res. 63(1):212–232.LinkGoogle Scholar
  • Kallus N, Udell M (2016) Dynamic assortment personalization in high dimensions. Working paper, Cornell University, Ithaca, NY.Google Scholar
  • Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (STOC ’08) (ACM, New York), 681–690.CrossrefGoogle Scholar
  • Kök AG, Fisher ML (2007) Demand estimation and assortment optimization under substitution: Methodology and application. Oper. Res. 55(6):1001–1021.LinkGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Li G, Rusmevichientong P, Topaloglu H (2015) The d-level nested logit model: Assortment and price optimization problems. Oper. Res. 63(2):325–342.LinkGoogle Scholar
  • Lichman M (2013) UCI Machine Learning Repository. Accessed March 27, 2019, http://archive.ics.uci.edu/ml/datasets/car+evaluation.Google Scholar
  • Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, New York).Google Scholar
  • May BC, Korda N, Lee A, Leslie DS (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069–2106.Google Scholar
  • McFadden D (1978) Modeling the choice of residential location. Transportation Res. Record (673):72–77.Google Scholar
  • Mitzenmacher M, Upfal E (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Plackett RL (1975) The analysis of permutations. Appl. Statist. 24(2):193–202.CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Rusmevichientong P, Shen ZM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Talluri K, van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.LinkGoogle Scholar
  • Train KE (2009) Discrete Choice Methods with Simulation, 2nd ed. (Cambridge University Press, New York).CrossrefGoogle Scholar
  • Williams HCWL (1977) On the formation of travel demand models and economic evaluation measures of user benefit. Environ. Planning A Econom. Space 9(3):285–344.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.