Thompson Sampling for the Multinomial Logit Bandit
References
- [1] (1964) Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables (US Government Printing Office, Washington, DC).Google Scholar
- [2] (2013) Further optimal regret bounds for Thompson sampling. Proc. Sixteenth Internat. Conf. Artificial Intelligence Statist. (AISTATS), vol. 31 (JMLR W&CP, Scotsdale, AZ), 99–107.Google Scholar
- [3] (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (JMLR W&CP, Atlanta), 127–135.Google Scholar
- [4] (2016) A near-optimal exploration-exploitation approach for assortment selection. Proc. 2016 ACM Conf. Econom. Comput. (EC) (Association for Computing Machinery, New York), 599–600.Google Scholar
- [5] (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
- [6] (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
- [7] (2016) On the tightness of an LP relaxation for rational optimization and its applications. Oper. Res. Lett. 44(5):612–617.Crossref, Google Scholar
- [8] (1985) Discrete Choice Analysis: Theory and Application to Travel Demand, vol. 9 (MIT Press, Cambridge, MA).Google Scholar
- [9] (2023) Robust dynamic assortment optimization in the presence of outlier customers. Oper. Res. 72(3):999–1015.Google Scholar
- [10] (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(1).Google Scholar
- [11] (2021) Dynamic assortment selection under the nested logit models. Production Oper. Management 30:85–102.Google Scholar
- [12] (2017) Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Preprint, submitted November 27, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3075658.Google Scholar
- [13] (2013) Assortment planning under the multinomial logit model with totally unimodular constraint structures. Technical report, Cornell University, Ithaca, NY.Google Scholar
- [14] (2021) Robust learning of consumer preferences. Oper. Res. 70(2):918–962.Google Scholar
- [15] (2014) Thompson sampling for complex online problems. Proc. 31st Internat. Conf. Machine Learn., vol. 32 (JMLR W&CP, Beijing), 100–108.Google Scholar
- [16] (2010) Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Proc. 27th Internat. Conf. Machine Learn. (ICML) (Omnipress, Madison, WI), 13–20.Google Scholar
- [17] (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Proc. 23rd Internat. Conf. Algorithmic Learn. Theory (Springer-Verlag, Berlin, Heidelberg), 199–213.Google Scholar
- [18] (2013) Thompson sampling for 1-dimensional exponential family bandits. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- [19] (1959) Individual Choice Behavior: A Theoretical Analysis (Wiley, New York).Google Scholar
- [20] (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069–2106.Google Scholar
- [21] (1978) Modelling the Choice of Residential Location (Institute of Transportation Studies, University of California, Berkeley).Google Scholar
- [22] (2020) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
- [23] (2019) Fast algorithms for online personalized assortment optimization in a big data regime. Preprint, submitted August 5, http://dx.doi.org/10.2139/ssrn.3432574.Google Scholar
- [24] (2019) Multinomial logit contextual bandits. Working paper, Columbia University, New York.Google Scholar
- [25] (2019) Thompson sampling for multinomial logit contextual bandits. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 3145–3155.Google Scholar
- [26] (2011) An empirical evaluation of Thompson sampling. Adv. Neural Inform. Processing Systems (NIPS), vol. 24 (Curran Associates, Inc., Red Hook, NY), 2249–2257.Google Scholar
- [27] (1975) The analysis of permutations. J. Roy. Statist. Soc. Ser. C Appl. Statist. 24(2):193–202.Google Scholar
- [28] (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- [29] (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
- [30] (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- [31] (2018) A Tutorial on Thompson Sampling (Now Foundations and Trends, Norwell, MA).Crossref, Google Scholar
- [32] (2019) Regret minimisation in multinomial logit bandits. Preprint, submitted March 1, https://arxiv.org/abs/1903.00543v1.Google Scholar
- [33] (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
- [34] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- [35] (2003) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [36] (2018) Near-optimal policies for dynamic multinomial logit assortment selection models. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 3101–3110.Google Scholar

