The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
References
- (2012) Analysis of Thompson Sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Conf. Learn. Theory (COLT) (PMLR), 23:39.1–39.26.Google Scholar
- (2013) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Proc. 16th Internat. Conf. Artificial Intelligence Statist. (AISTATS), 99–107.Google Scholar
- (2016) The Probabilistic Method, 4th ed., Wiley-Interscience Series in Discrete Mathematics and Optimization (John Wiley & Sons, New York).Google Scholar
- (2011) A primer on strategic games. Preprint, submitted February 1, https://arxiv.org/abs/1102.0203.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2016) Economic recommendation systems. Proc. 16th ACM Conf. Electronic Commerce (ACM-EC) (Association for Computing Machinery, New York), 757.Google Scholar
- (2019) Social learning and the innkeeper’s challenge. ACM Conf. Econom. Comput. (ACM-EC), (Association for Computing Machinery, New York), 153–170.Google Scholar
- (2019) Information design: A unified perspective. J. Econom. Literature 57(1):44–95.Crossref, Google Scholar
- (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Google Scholar
- (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).Crossref, Google Scholar
- (2013) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 26th Adv. Neural Inform. Processing Systems (NIPS), vol. 26 (Curran Associates, Inc., Red Hook, NY), 638–646.Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.Crossref, Google Scholar
- (2018) Incentivizing exploration by heterogeneous users. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory (COLT), vol. 75 (PMLR), 798–818.Google Scholar
- (2001) Stochastic monotonicity and realizable monotonicity. Ann. Probab. 29(2):938–978.Crossref, Google Scholar
- (2014) Incentivizing exploration. Proc. 15th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 5–22.Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices, 2nd ed. (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2016) Learning in social networks. Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press, Oxford, UK), 504–542.Google Scholar
- (2017) Learning, experimentation, and information design. Honoré B, Pakes A, Piazzesi M, Samuelson L, eds. Adv. Econom. Econometrics: 11th World Congress, vol. 1 (Cambridge University Press, Cambridge, UK), 63–98.Google Scholar
- (2019) Bayesian exploration with heterogenous agents. Web Conf. (International World Wide Web Conference Committee, Geneva), 751–761.Google Scholar
- (2020) Incentivizing exploration with selective data disclosure. Proc. 21st ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 647–648.Google Scholar
- (2019) Bayesian persuasion and information design. Annual Rev. Econom. 11(1):249–272.Crossref, Google Scholar
- (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. 23rd Internat. Conf. Algorithmic Learn. Theory (ALT) (Springer, Berlin, Heidelberg), 199–213.Google Scholar
- (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.Crossref, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2015) Bayesian incentive-compatible bandit exploration. Proc. 16th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 565–582.Google Scholar
- (2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.Link, Google Scholar
- (2022) Bayesian exploration: Incentivizing exploration in Bayesian games. Oper. Res. 70(2):1105–1127.Link, Google Scholar
- (2020) Free energy upper bound for mean-field vector spin glasses. Preprint, submitted October 18, https://arxiv.org/abs/2010.09114v1.Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2018) A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston), 11(1):1–96.Crossref, Google Scholar
- (2019) Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).Crossref, Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4):285–294.Crossref, Google Scholar

