Bayesian Exploration: Incentivizing Exploration in Bayesian Games

Published Online:https://doi.org/10.1287/opre.2021.2205

References

  • Aridor G, Mansour Y, Slivkins A, Wu S (2020) Competing bandits: The perils of exploration under competition. Preprint, submitted July 20, https://arxiv.org/abs/2007.10144.Google Scholar
  • Athey S, Segal I (2013) An efficient dynamic mechanism. Econometrica 81(6):2463–2485.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–775.CrossrefGoogle Scholar
  • Babaioff M, Kleinberg R, Slivkins A (2015a) Truthful mechanisms with implicit payment computation. J. ACM, 62(2):1–37.CrossrefGoogle Scholar
  • Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.CrossrefGoogle Scholar
  • Babaioff M, Dughmi S, Kleinberg RD, Slivkins A (2015b) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):1–26.CrossrefGoogle Scholar
  • Bahar G, Smorodinsky R, Tennenholtz M (2016) Economic recommendation systems. 16th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York).Google Scholar
  • Bahar G, Smorodinsky R, Tennenholtz M (2019) Social learning and the innkeeper’s challenge. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 153–170.Google Scholar
  • Bergemann D, Morris S (2013) Robust predictions in games with incomplete information. Econometrica 81(4):1251–1308.CrossrefGoogle Scholar
  • Bergemann D, Morris S (2019) Information design: A unified perspective. J. Econom. Literature 57(1):44–95.CrossrefGoogle Scholar
  • Bergemann D, Välimäki J (2000) Experimentation in markets. Rev. Econom. Stud. 67(2):213–234.CrossrefGoogle Scholar
  • Bergemann D, Välimäki J (2010) The dynamic pivot mechanism. Econometrica 78(2):771–789.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Bimpikis K, Papanastasiou Y, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Google Scholar
  • Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2):349–374.CrossrefGoogle Scholar
  • Bradonjic M, Ercal G, Meyerson A, Roytman A (2014) The price of mediation. Discrete Math. Theoretical Comput. Sci. 16(1):31–60.Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, vol. 5 (Now Publishers, Boston).CrossrefGoogle Scholar
  • Che Y-K, Hörner J (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.CrossrefGoogle Scholar
  • Devanur N, Kakade SM (2009) The price of truthfulness for pay-per-click auctions. 10th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York), 99–106.Google Scholar
  • Dughmi S, Xu H (2016) Algorithmic Bayesian persuasion. 48th ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 412–425.Google Scholar
  • Engelbrecht-Wiggans R (1986) On the value of private information in an auction: Ignorance may be bliss. Working Paper 1242, Bureau of Economic and Business Research, University of Illinois at Urbana-Champaign.Google Scholar
  • Frazier P, Kempe D, Kleinberg JM, Kleinberg R (2014) Incentivizing exploration. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 5–22.Google Scholar
  • Fudenberg D, Levine DK (1998) The Theory of Learning in Games (MIT Press, Boston).Google Scholar
  • Ghosh A, Hummel P (2013) Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. Innovations Theoretical Comput. Sci. Conf., 233–246.Google Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. B 41:148–177.CrossrefGoogle Scholar
  • Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Golub B, Sadler ED (2016) Learning in social networks. Bramoullé Y, Galeotti A, Rogers B, eds. The Oxford Handbook of the Economics of Networks (Oxford University Press).Google Scholar
  • Ho CJ, Slivkins A, Wortman Vaughan J (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intelligence Res. 55(1):317–359.CrossrefGoogle Scholar
  • Hörner J, Skrzypacz A (2017) Learning, experimentation, and information design. Honoré B, Pakes A, Piazzesi M, Samuelson L, eds. Advances in Economics and Econometrics: 11th World Congress, vol. 1 (Cambridge University Press, Cambridge, UK), 63–98.CrossrefGoogle Scholar
  • Immorlica N, Mao J, Slivkins A, Wu S (2019) Bayesian exploration with heterogenous agents. Web Conf. (The International World Wide Web Conference Committee, Geneva), 751–761.Google Scholar
  • Immorlica N, Mao J, Slivkins A, Wu S (2020) Incentivizing exploration with selective data disclosure. Preprint, submitted November 14, 2018; revised December 29, https://arxiv.org/abs/1811.06026.Google Scholar
  • Kakade SM, Lobel I, Nazerzadeh H (2013) Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res. 61(4):837–854.LinkGoogle Scholar
  • Kamenica E (2019) Bayesian persuasion and information design. Annual Rev. Econom. 11(1):249–272.CrossrefGoogle Scholar
  • Kamenica E, Gentzkow M (2011) Bayesian persuasion. Amer. Econom. Rev. 101(6):2590–2615.CrossrefGoogle Scholar
  • Keller G, Rady S (2003) Price dispersion and learning in a dynamic differentiated-goods duopoly. RAND J. Econom. 34(1):138–165.CrossrefGoogle Scholar
  • Kessler A (1998) The value of ignorance. RAND J. Econom. 29(2):339–354.CrossrefGoogle Scholar
  • Kleinberg RD, Leighton FT (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. IEEE Sympos. Foundations Comput. Sci. (Institute of Electrical and Electronics Engineers, New York), 594–605.Google Scholar
  • Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.CrossrefGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. 16th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 565–582.Google Scholar
  • Mansour Y, Slivkins A, Syrgkanis V (2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.LinkGoogle Scholar
  • Mansour Y, Slivkins A, Syrgkanis V, Wu ZS (2016) Bayesian exploration: Incentivizing exploration in Bayesian games. Preprint, submitted February 24, https://arxiv.org/abs/1602.07570.Google Scholar
  • Sellke M, Slivkins A (2021) The price of incentivizing exploration: A characterization via Thompson sampling and sample complexity. 22nd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 795–796.Google Scholar
  • Simchowitz M, Slivkins A (2021) Incentives and exploration in reinforcement learning. Preprint, submitted February 28, https://arxiv.org/abs/2103.00360.Google Scholar
  • Singla A, Krause A (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. 22nd Internat. World Wide Web Conf. (The International World Wide Web Conference Committee, Geneva), 1167–1178.Google Scholar
  • Slivkins A (2019) Introduction to Multi-Armed Bandits. Foundations and Trends in Machine Learning, vol. 12 (Now Publishers, Boston).CrossrefGoogle Scholar
  • Syrgkanis V, Kempe D, Tardos E (2015) Information asymmetries in common-value auctions with discrete signals. ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 303.Google Scholar
  • Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318–331.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.