MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
Published Online:10 Sep 2019https://doi.org/10.1287/opre.2018.1832
References
- (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
- (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):30:1–30:24.Crossref, Google Scholar
- (2017) Thompson sampling for the MNL-bandit. Proc. Machine Learn. Res. 65: 76–78.Google Scholar
- (1977) Fast probabilistic algorithms for hamiltonian circuits and matchings. Proc. 9th Annual ACM Sympos. Theory Comput. (STOC ’77) (Elsevier, New York), 30–41.Crossref, Google Scholar
- (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(November):397–422.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2016) On the tightness of an LP relaxation for rational optimization and its applications. Oper. Res. Lett. 44(5):612–617.Crossref, Google Scholar
- (2015) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):Article 4.Google Scholar
- (1985) Discrete Choice Analysis: Theory and Application to Travel Demand, MIT Press Series in Transportation Studies, vol. 9 (MIT Press, Cambridge, MA).Google Scholar
- (2016) A Markov chain approximation to choice modeling. Oper. Res. 64(4):886–905.Link, Google Scholar
- (1984) Mathematical Statistics: Estimation of Parameters, Testing of Hypotheses (in Russian) (Nauka, Moscow).Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2249–2257.Google Scholar
- (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. Machine Learn. Res. 28:151–159.Google Scholar
- (2018) A note on tight lower bound for capacitated MNL-bandit assortment selection models. Oper. Res. Lett. 46(5):534–537.Crossref, Google Scholar
- (2013) Assortment planning under the multinomial logit model with totally unimodular constraint structures. Technical report, Cornell University, Ithaca, NY.Google Scholar
- (2014) Assortment optimization under variants of the nested logit model. Oper. Res. 62(2):250–273.Link, Google Scholar
- (2014) Near-optimal algorithms for capacity constrained assortment optimization. Working paper, Columbia University, New York.Google Scholar
- (2015) Capacity constrained assortment optimization under the Markov chain based choice model. Working paper, Columbia University, New York.Crossref, Google Scholar
- (2013) A nonparametric approach to modeling choice with limited data. Management Sci. 59(2):305–322.Link, Google Scholar
- (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
- (2014) Constrained assortment optimization for the nested logit model. Management Sci. 60(10):2583–2601.Link, Google Scholar
- (2014) A general attraction model and sales-based linear program for network revenue management under customer choice. Oper. Res. 63(1):212–232.Link, Google Scholar
- (2016) Dynamic assortment personalization in high dimensions. Working paper, Cornell University, Ithaca, NY.Google Scholar
- (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (STOC ’08) (ACM, New York), 681–690.Crossref, Google Scholar
- (2007) Demand estimation and assortment optimization under substitution: Methodology and application. Oper. Res. 55(6):1001–1021.Link, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2015) The d-level nested logit model: Assortment and price optimization problems. Oper. Res. 63(2):325–342.Link, Google Scholar
- (2013) UCI Machine Learning Repository. Accessed March 27, 2019, http://archive.ics.uci.edu/ml/datasets/car+evaluation.Google Scholar
- (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, New York).Google Scholar
- (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069–2106.Google Scholar
- (1978) Modeling the choice of residential location. Transportation Res. Record (673):72–77.Google Scholar
- (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1975) The analysis of permutations. Appl. Statist. 24(2):193–202.Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
- (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
- (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
- (2009) Discrete Choice Methods with Simulation, 2nd ed. (Cambridge University Press, New York).Crossref, Google Scholar
- (1977) On the formation of travel demand models and economic evaluation measures of user benefit. Environ. Planning A Econom. Space 9(3):285–344.Crossref, Google Scholar

