Thompson Sampling for the Multinomial Logit Bandit

Published Online:https://doi.org/10.1287/moor.2020.0096

References

  • [1] Abramowitz M, Stegun IA (1964) Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables (US Government Printing Office, Washington, DC).Google Scholar
  • [2] Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Proc. Sixteenth Internat. Conf. Artificial Intelligence Statist. (AISTATS), vol. 31 (JMLR W&CP, Scotsdale, AZ), 99–107.Google Scholar
  • [3] Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (JMLR W&CP, Atlanta), 127–135.Google Scholar
  • [4] Agrawal S, Avadhanula V, Goyal V, Zeevi A (2016) A near-optimal exploration-exploitation approach for assortment selection. Proc. 2016 ACM Conf. Econom. Comput. (EC) (Association for Computing Machinery, New York), 599–600.Google Scholar
  • [5] Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • [6] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.CrossrefGoogle Scholar
  • [7] Avadhanula V, Bhandari J, Goyal V, Zeevi A (2016) On the tightness of an LP relaxation for rational optimization and its applications. Oper. Res. Lett. 44(5):612–617.CrossrefGoogle Scholar
  • [8] Ben-Akiva M, Lerman S (1985) Discrete Choice Analysis: Theory and Application to Travel Demand, vol. 9 (MIT Press, Cambridge, MA).Google Scholar
  • [9] Chen X, Krishnamurthy A, Wang Y (2023) Robust dynamic assortment optimization in the presence of outlier customers. Oper. Res. 72(3):999–1015.Google Scholar
  • [10] Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(1).Google Scholar
  • [11] Chen X, Wang Y, Zhou Y (2021) Dynamic assortment selection under the nested logit models. Production Oper. Management 30:85–102.Google Scholar
  • [12] Cheung W, Simchi-Levi D (2017) Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Preprint, submitted November 27, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3075658.Google Scholar
  • [13] Davis J, Gallego G, Topaloglu H (2013) Assortment planning under the multinomial logit model with totally unimodular constraint structures. Technical report, Cornell University, Ithaca, NY.Google Scholar
  • [14] Feng Y, Caldentey R, Christopher TR (2021) Robust learning of consumer preferences. Oper. Res. 70(2):918–962.Google Scholar
  • [15] Gopalan A, Mannor S, Mansour Y (2014) Thompson sampling for complex online problems. Proc. 31st Internat. Conf. Machine Learn., vol. 32 (JMLR W&CP, Beijing), 100–108.Google Scholar
  • [16] Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Proc. 27th Internat. Conf. Machine Learn. (ICML) (Omnipress, Madison, WI), 13–20.Google Scholar
  • [17] Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Proc. 23rd Internat. Conf. Algorithmic Learn. Theory (Springer-Verlag, Berlin, Heidelberg), 199–213.Google Scholar
  • [18] Korda N, Kaufmann E, Munos R (2013) Thompson sampling for 1-dimensional exponential family bandits. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • [19] Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (Wiley, New York).Google Scholar
  • [20] May BC, Korda N, Lee A, Leslie DS (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069–2106.Google Scholar
  • [21] McFadden D (1978) Modelling the Choice of Residential Location (Institute of Transportation Studies, University of California, Berkeley).Google Scholar
  • [22] Miao S, Chao X (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
  • [23] Miao S, Chao X (2019) Fast algorithms for online personalized assortment optimization in a big data regime. Preprint, submitted August 5, http://dx.doi.org/10.2139/ssrn.3432574.Google Scholar
  • [24] Oh M, Iyengar G (2019) Multinomial logit contextual bandits. Working paper, Columbia University, New York.Google Scholar
  • [25] Oh M, Iyengar G (2019) Thompson sampling for multinomial logit contextual bandits. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 3145–3155.Google Scholar
  • [26] Oliver C, Li L (2011) An empirical evaluation of Thompson sampling. Adv. Neural Inform. Processing Systems (NIPS), vol. 24 (Curran Associates, Inc., Red Hook, NY), 2249–2257.Google Scholar
  • [27] Plackett RL (1975) The analysis of permutations. J. Roy. Statist. Soc. Ser. C Appl. Statist. 24(2):193–202.Google Scholar
  • [28] Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • [29] Rusmevichientong P, Shen ZM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • [30] Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • [31] Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A Tutorial on Thompson Sampling (Now Foundations and Trends, Norwell, MA).CrossrefGoogle Scholar
  • [32] Saha A, Gopalan A (2019) Regret minimisation in multinomial logit bandits. Preprint, submitted March 1, https://arxiv.org/abs/1903.00543v1.Google Scholar
  • [33] Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • [34] Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • [35] Train K (2003) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [36] Wang Y, Chen X, Zhou Y (2018) Near-optimal policies for dynamic multinomial logit assortment selection models. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 3101–3110.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.