Aridor G, Mansour Y, Slivkins A, Wu S (2020) Competing bandits: The perils of exploration under competition. Preprint, submitted July 20, https://arxiv.org/abs/2007.10144.Google Scholar
Athey S, Segal I (2013) An efficient dynamic mechanism. Econometrica 81(6):2463–2485.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–775.Crossref, Google Scholar
Babaioff M, Kleinberg R, Slivkins A (2015a) Truthful mechanisms with implicit payment computation. J. ACM, 62(2):1–37.Crossref, Google Scholar
Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.Crossref, Google Scholar
Babaioff M, Dughmi S, Kleinberg RD, Slivkins A (2015b) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):1–26.Crossref, Google Scholar
Bahar G, Smorodinsky R, Tennenholtz M (2016) Economic recommendation systems. 16th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York).Google Scholar
Bahar G, Smorodinsky R, Tennenholtz M (2019) Social learning and the innkeeper’s challenge. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 153–170.Google Scholar
Bergemann D, Morris S (2013) Robust predictions in games with incomplete information. Econometrica 81(4):1251–1308.Crossref, Google Scholar
Bergemann D, Morris S (2019) Information design: A unified perspective. J. Econom. Literature 57(1):44–95.Crossref, Google Scholar
Bergemann D, Välimäki J (2000) Experimentation in markets. Rev. Econom. Stud. 67(2):213–234.Crossref, Google Scholar
Bergemann D, Välimäki J (2010) The dynamic pivot mechanism. Econometrica 78(2):771–789.Crossref, Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Bimpikis K, Papanastasiou Y, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Google Scholar
Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2):349–374.Crossref, Google Scholar
Bradonjic M, Ercal G, Meyerson A, Roytman A (2014) The price of mediation. Discrete Math. Theoretical Comput. Sci. 16(1):31–60.Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, vol. 5 (Now Publishers, Boston).Crossref, Google Scholar
Che Y-K, Hörner J (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.Crossref, Google Scholar
Devanur N, Kakade SM (2009) The price of truthfulness for pay-per-click auctions. 10th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York), 99–106.Google Scholar
Dughmi S, Xu H (2016) Algorithmic Bayesian persuasion. 48th ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 412–425.Google Scholar
Engelbrecht-Wiggans R (1986) On the value of private information in an auction: Ignorance may be bliss. Working Paper 1242, Bureau of Economic and Business Research, University of Illinois at Urbana-Champaign.Google Scholar
Frazier P, Kempe D, Kleinberg JM, Kleinberg R (2014) Incentivizing exploration. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 5–22.Google Scholar
Fudenberg D, Levine DK (1998) The Theory of Learning in Games (MIT Press, Boston).Google Scholar
Ghosh A, Hummel P (2013) Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. Innovations Theoretical Comput. Sci. Conf., 233–246.Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. B 41:148–177.Crossref, Google Scholar
Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Golub B, Sadler ED (2016) Learning in social networks. Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press).Google Scholar
Ho CJ, Slivkins A, Wortman Vaughan J (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intelligence Res. 55(1):317–359.Crossref, Google Scholar
Hörner J, Skrzypacz A (2017) Learning, experimentation, and information design. Honoré B, Pakes A, Piazzesi M, Samuelson L, eds. Advances in Economics and Econometrics: 11th World Congress, vol. 1 (Cambridge University Press, Cambridge, UK), 63–98.Crossref, Google Scholar
Immorlica N, Mao J, Slivkins A, Wu S (2019) Bayesian exploration with heterogenous agents. Web Conf. (The International World Wide Web Conference Committee, Geneva), 751–761.Google Scholar
Immorlica N, Mao J, Slivkins A, Wu S (2020) Incentivizing exploration with selective data disclosure. Preprint, submitted November 14, 2018; revised December 29, https://arxiv.org/abs/1811.06026.Google Scholar
Kakade SM, Lobel I, Nazerzadeh H (2013) Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res. 61(4):837–854.Link, Google Scholar
Kamenica E (2019) Bayesian persuasion and information design. Annual Rev. Econom. 11(1):249–272.Crossref, Google Scholar
Kamenica E, Gentzkow M (2011) Bayesian persuasion. Amer. Econom. Rev. 101(6):2590–2615.Crossref, Google Scholar
Keller G, Rady S (2003) Price dispersion and learning in a dynamic differentiated-goods duopoly. RAND J. Econom. 34(1):138–165.Crossref, Google Scholar
Kessler A (1998) The value of ignorance. RAND J. Econom. 29(2):339–354.Crossref, Google Scholar
Kleinberg RD, Leighton FT (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. IEEE Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, New York), 594–605.Google Scholar
Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.Crossref, Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. 16th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 565–582.Google Scholar
Mansour Y, Slivkins A, Syrgkanis V (2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.Link, Google Scholar
Mansour Y, Slivkins A, Syrgkanis V, Wu ZS (2016) Bayesian exploration: Incentivizing exploration in Bayesian games. Preprint, submitted February 24, https://arxiv.org/abs/1602.07570.Google Scholar
Sellke M, Slivkins A (2021) The price of incentivizing exploration: A characterization via Thompson sampling and sample complexity. 22nd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 795–796.Google Scholar
Simchowitz M, Slivkins A (2021) Incentives and exploration in reinforcement learning. Preprint, submitted February 28, https://arxiv.org/abs/2103.00360.Google Scholar
Singla A, Krause A (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. 22nd Internat. World Wide Web Conf. (The International World Wide Web Conference Committee, Geneva), 1167–1178.Google Scholar
Slivkins A (2019) Introduction to Multi-Armed Bandits. Foundations and Trends in Machine Learning, vol. 12 (Now Publishers, Boston).Crossref, Google Scholar
Syrgkanis V, Kempe D, Tardos E (2015) Information asymmetries in common-value auctions with discrete signals. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 303.Google Scholar
Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.Link, Google Scholar

Volume 70, Issue 2

March-April 2022

Pages iii-viii, 641-1291, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:December 21, 2018
Accepted:August 11, 2021
Published Online:December 29, 2021

Cite as

Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu (2022) Bayesian Exploration: Incentivizing Exploration in Bayesian Games. Operations Research 70(2):1105-1127.

https://doi.org/10.1287/opre.2021.2205

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bayesian Exploration: Incentivizing Exploration in Bayesian Games

References

Volume 70, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News