The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

Mark Sellke
Corresponding Author
Mark Sellke
[email protected]
https://orcid.org/0000-0001-9166-8185
School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540;
Search for more papers by this author
,
Aleksandrs Slivkins
Aleksandrs Slivkins
[email protected]
https://orcid.org/0000-0001-6899-6383
Microsoft Research, New York, New York 10012
Search for more papers by this author

Mark Sellke

Corresponding Author

Mark Sellke

[email protected]

https://orcid.org/0000-0001-9166-8185

School of Mathematics, Institute for Advanced Study, Princeton, New Jersey 08540;

Search for more papers by this author

Aleksandrs Slivkins

[email protected]

https://orcid.org/0000-0001-6899-6383

Microsoft Research, New York, New York 10012

Search for more papers by this author

Published Online:29 Nov 2022https://doi.org/10.1287/opre.2022.2401

References

Agrawal S, Goyal N (2012) Analysis of Thompson Sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Conf. Learn. Theory (COLT) (PMLR), 23:39.1–39.26.Google Scholar
Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Proc. 16th Internat. Conf. Artificial Intelligence Statist. (AISTATS), 99–107.Google Scholar
Alon N, Spencer J (2016) The Probabilistic Method, 4th ed., Wiley-Interscience Series in Discrete Mathematics and Optimization (John Wiley & Sons, New York).Google Scholar
Apt KR (2011) A primer on strategic games. Preprint, submitted February 1, https://arxiv.org/abs/1102.0203.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Bahar G, Smorodinsky R, Tennenholtz M (2016) Economic recommendation systems. Proc. 16th ACM Conf. Electronic Commerce (ACM-EC) (Association for Computing Machinery, New York), 757.Google Scholar
Bahar G, Smorodinsky R, Tennenholtz M (2019) Social learning and the innkeeper’s challenge. ACM Conf. Econom. Comput. (ACM-EC), (Association for Computing Machinery, New York), 153–170.Google Scholar
Bergemann D, Morris S (2019) Information design: A unified perspective. J. Econom. Literature 57(1):44–95.Crossref, Google Scholar
Bimpikis K, Papanastasiou Y, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).Crossref, Google Scholar
Bubeck S, Liu C-Y (2013) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 26th Adv. Neural Inform. Processing Systems (NIPS), vol. 26 (Curran Associates, Inc., Red Hook, NY), 638–646.Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Che Y-K, Hörner J (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.Crossref, Google Scholar
Chen B, Frazier PI, Kempe D (2018) Incentivizing exploration by heterogeneous users. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory (COLT), vol. 75 (PMLR), 798–818.Google Scholar
Fill JA, Machida M (2001) Stochastic monotonicity and realizable monotonicity. Ann. Probab. 29(2):938–978.Crossref, Google Scholar
Frazier P, Kempe D, Kleinberg JM, Kleinberg R (2014) Incentivizing exploration. Proc. 15th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 5–22.Google Scholar
Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices, 2nd ed. (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Golub B, Sadler ED (2016) Learning in social networks. Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press, Oxford, UK), 504–542.Google Scholar
Hörner J, Skrzypacz A (2017) Learning, experimentation, and information design. Honoré B, Pakes A, Piazzesi M, Samuelson L, eds. Adv. Econom. Econometrics: 11th World Congress, vol. 1 (Cambridge University Press, Cambridge, UK), 63–98.Google Scholar
Immorlica N, Mao J, Slivkins A, Wu S (2019) Bayesian exploration with heterogenous agents. Web Conf. (International World Wide Web Conference Committee, Geneva), 751–761.Google Scholar
Immorlica N, Mao J, Slivkins A, Wu S (2020) Incentivizing exploration with selective data disclosure. Proc. 21st ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 647–648.Google Scholar
Kamenica E (2019) Bayesian persuasion and information design. Annual Rev. Econom. 11(1):249–272.Crossref, Google Scholar
Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. 23rd Internat. Conf. Algorithmic Learn. Theory (ALT) (Springer, Berlin, Heidelberg), 199–213.Google Scholar
Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.Crossref, Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. Proc. 16th ACM Conf. Econom. Comput. (ACM-EC) (Association for Computing Machinery, New York), 565–582.Google Scholar
Mansour Y, Slivkins A, Syrgkanis V (2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.Link, Google Scholar
Mansour Y, Slivkins A, Syrgkanis V, Wu S (2022) Bayesian exploration: Incentivizing exploration in Bayesian games. Oper. Res. 70(2):1105–1127.Link, Google Scholar
Mourrat J-C (2020) Free energy upper bound for mean-field vector spin glasses. Preprint, submitted October 18, https://arxiv.org/abs/2010.09114v1.Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston), 11(1):1–96.Crossref, Google Scholar
Slivkins A (2019) Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning (now Publishers, Inc., Boston).Crossref, Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3-4):285–294.Crossref, Google Scholar

Volume 71, Issue 5

September-October 2023

Pages iii-vii, 1441-1924, C2-C3

Article Information

Metrics

Information

Received:August 13, 2021
Accepted:September 29, 2022
Published Online:November 29, 2022

Cite as

Mark Sellke, Aleksandrs Slivkins (2022) The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity. Operations Research 71(5):1706-1732.

https://doi.org/10.1287/opre.2022.2401

Keywords

Acknowledgments

An extended abstract of this paper appeared at the Association for Computing Machinery Symposium on Economics and Computation 2021.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

References

Volume 71, Issue 5

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News