The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Published Online:https://doi.org/10.1287/opre.2022.2401

References

  • Agrawal S, Goyal N (2012) Analysis of Thompson Sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Conf. Learn. Theory (COLT) (PMLR), 23:39.1–39.26.Google Scholar
  • Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Proc. 16th Internat. Conf. Artificial Intelligence Statist. (AISTATS), 99–107.Google Scholar
  • Alon N, Spencer J (2016) The Probabilistic Method, 4th ed., Wiley-Interscience Series in Discrete Mathematics and Optimization (John Wiley & Sons, New York).Google Scholar
  • Apt KR (2011) A primer on strategic games. Preprint, submitted February 1, https://arxiv.org/abs/1102.0203.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Bahar G, Smorodinsky R, Tennenholtz M (2016) Economic recommendation systems. Proc. 16th ACM Conf. Electronic Commerce (ACM-EC) (Association for Computing Machinery, New York), 757.Google Scholar
  • Bahar G, Smorodinsky R, Tennenholtz M (2019) Social learning and the innkeeper’s challenge. ACM Conf. Econom. Comput. (ACM-EC), (Association for Computing Machinery, New York), 153–170.Google Scholar
  • Bergemann D, Morris S (2019) Information design: A unified perspective. J. Econom. Literature 57(1):44–95.CrossrefGoogle Scholar
  • Bimpikis K, Papanastasiou Y, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).CrossrefGoogle Scholar
  • Bubeck S, Liu C-Y (2013) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 26th Adv. Neural Inform. Processing Systems (NIPS), vol. 26 (Curran Associates, Inc., Red Hook, NY), 638–646.Google Scholar
  • Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Che Y-K, Hörner J (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.CrossrefGoogle Scholar
  • Chen B, Frazier PI, Kempe D (2018) Incentivizing exploration by heterogeneous users. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory (COLT), vol. 75 (PMLR), 798–818.Google Scholar
  • Fill JA, Machida M (2001) Stochastic monotonicity and realizable monotonicity. Ann. Probab. 29(2):938–978.CrossrefGoogle Scholar
  • Frazier P, Kempe D, Kleinberg JM, Kleinberg R (2014) Incentivizing exploration. Proc. 15th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 5–22.Google Scholar
  • Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices, 2nd ed. (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Golub B, Sadler ED (2016) Learning in social networks. Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press, Oxford, UK), 504–542.Google Scholar
  • Hörner J, Skrzypacz A (2017) Learning, experimentation, and information design. Honoré B, Pakes A, Piazzesi M, Samuelson L, eds. Adv. Econom. Econometrics: 11th World Congress, vol. 1 (Cambridge University Press, Cambridge, UK), 63–98.Google Scholar
  • Immorlica N, Mao J, Slivkins A, Wu S (2019) Bayesian exploration with heterogenous agents. Web Conf. (International World Wide Web Conference Committee, Geneva), 751–761.Google Scholar
  • Immorlica N, Mao J, Slivkins A, Wu S (2020) Incentivizing exploration with selective data disclosure. Proc. 21st ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 647–648.Google Scholar
  • Kamenica E (2019) Bayesian persuasion and information design. Annual Rev. Econom. 11(1):249–272.CrossrefGoogle Scholar
  • Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. 23rd Internat. Conf. Algorithmic Learn. Theory (ALT) (Springer, Berlin, Heidelberg), 199–213.Google Scholar
  • Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.CrossrefGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. Proc. 16th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 565–582.Google Scholar
  • Mansour Y, Slivkins A, Syrgkanis V (2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.LinkGoogle Scholar
  • Mansour Y, Slivkins A, Syrgkanis V, Wu S (2022) Bayesian exploration: Incentivizing exploration in Bayesian games. Oper. Res. 70(2):1105–1127.LinkGoogle Scholar
  • Mourrat J-C (2020) Free energy upper bound for mean-field vector spin glasses. Preprint, submitted October 18, https://arxiv.org/abs/2010.09114v1.Google Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston), 11(1):1–96.CrossrefGoogle Scholar
  • Slivkins A (2019) Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).CrossrefGoogle Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4):285–294.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.