Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire R (2014) Taming the monster: A fast and simple algorithm for contextual bandits. 31st Internat. Conf. Machine Learn. (ICML), 1638–1646.Google Scholar
Alon N, Cesa-Bianchi N, Dekel O, Koren T (2015) Online learning with feedback graphs: Beyond bandits. 28th Conf. Learn Theory (COLT), 23–35.Google Scholar
Alon N, Cesa-Bianchi N, Gentile C, Mansour Y (2013) From bandits to experts: A tale of domination and independence. Advances in Neural Information Processing Systems (NIPS), vol. 27 (Curran Associates, Red Hook, NY), 1610–1618.Google Scholar
Arango J, Arts K, Braunschweiger P, Chadwick GL, Forster DG, Hansen K (2012) Good Clinical Practice Guide (CITI Program, Miami).Google Scholar
Athey S, Segal I (2013) An efficient dynamic mechanism. Econometrica 81(6):2463–2485.Google Scholar
Audibert JY, Bubeck S (2010) Regret bounds and minimax policies under partial monitoring. J. Machine Learn. Res. 11:2785–2836.Google Scholar
Audibert J-Y, Bubeck S, Lugosi G (2011) Minimax policies for combinatorial prediction games. 24th Conf. Learn. Theory (COLT), 107–132.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Google Scholar
Babaioff M, Kleinberg R, Slivkins A (2015a) Truthful mechanisms with implicit payment computation. J. ACM 62(2):Article 10.Google Scholar
Babaioff M, Sharma, Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.Google Scholar
Babaioff M, Dughmi S, Kleinberg RD, Slivkins A (2015b) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):Article 13.Google Scholar
Bahar G, Smorodinsky R, Tennenholtz M (2016) Economic recommendation systems. 16th ACM Conf. Electronic Commerce (EC) (ACM, New York), 757.Google Scholar
Bartók G, Foster DP, Pál D, Rakhlin A, Szepesvári C (2014) Partial monitoring - classification, regret bounds, and algorithms. Math. Oper. Res. 39(4):967–997.Link, Google Scholar
Bastani H, Bayati M, Khosravi K (2017) Mostly exploration-free algorithms for contextual bandits. Preprint, submitted April 28, https://arxiv.org/abs/1704.09011.Google Scholar
Bergemann D, Morris S (2016) Information design, Bayesian persuasion and Bayes correlated equilibrium. Amer. Econom. Rev. 106(5):586–591.Google Scholar
Bergemann D, Välimäki J (2010) The dynamic pivot mechanism. Econometrica 78(2):771–789.Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Bimpikis K, Papanastasiou Y, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1477–1973.Google Scholar
Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2):349–374.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. (Now Publishers, Boston).Google Scholar
Bubeck S, Munos R, Stoltz G (2011) Pure exploration in multi-armed bandit problems. Theoret. Comput. Sci. 412(19):1832–1852.Crossref, Google Scholar
Celis LE, Vishnoi NK (2017) Fair personalization. Preprint, submitted July 7, https://arxiv.org/abs/1707.02260.Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Che Y-K, Hörner J (2019) Optimal design for social learning. Quart. J. Econom. 133(2):871–925.Google Scholar
Chen B, Frazier PI, Kempe D (2018) Incentivizing exploration by heterogeneous users. Conf. Learn Theory (COLT), 798–818.Google Scholar
Chouldechova A (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5(2):153–163.Crossref, Google Scholar
Chow S-C, Chang M (2008) Adaptive design methods in clinical trials – A review. Orphanet J. Rare Diseases 3:11.Crossref, Google Scholar
Detry, MA, Lewis RJ, Broglio KR, Connor JT, Berry SM, Berry DA (2012) Standards for the design, conduct, and evaluation of adaptive randomized clinical trials. Report, Patient-Centered Outcomes Research Institute (PCORI), Washington, DC.Google Scholar
Devanur N, Kakade SM (2009) The price of truthfulness for pay-per-click auctions. 10th ACM Conf. Electronic Commerce (EC), 99–106.Google Scholar
Doob JL (1949) Application of the theory of martingales. Le Calcul des Probabilités et ses Applications (Centre National de la Recherche Scientifique, Paris), 23–27.Google Scholar
Dudík M, Hsu D, Kale S, Karampatziakis N, Langford J, Reyzin L, Zhang T (2011) Efficient optimal leanring for contextual bandits. 27th Conf. Uncertainty Artificial Intelligence (UAI) (AUAI Press, Arlington, VA), 169–178.Google Scholar
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R (2012) Fairness through awareness. Innovations Theoret. Comput. Sci. Conf. (ITCS) (ACM, New York), 214–226.Google Scholar
Ely J, Frankel A, Kamenica E (2015) Suspense and Surprise. J. Political Econom. 123(1):215–260.Crossref, Google Scholar
Even-Dar E, Mannor S, Mansour Y (2002) PAC bounds for multi-armed bandit and Markov decision processes. 15th Conf. Learn. Theory (COLT) Lecture Notes in Computer Science, vol 2375 (Springer, Berlin, Heidelberg), 255–270.Google Scholar
Even-Dar E, Mannor S, Mansour Y (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7:1079–1105.Google Scholar
Frazier P, Kempe D, Kleinberg JM, Kleinberg R (2014) Incentivizing exploration. ACM Conf. Econom. Comput. (ACM EC) (ACM, New York), 5–22.Google Scholar
Freidlin B, Simon R (2005) Adaptive signature design: An adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Res. 11(21):7872–7878.Crossref, Google Scholar
Freidlin B, Jiang W, Simon R (2007) Biomarker adaptive threshold design: A procedure for evaluating treatment with possible biomarker-defined subset effect. J. National Cancer Institute 99(13):1036–1043.Crossref, Google Scholar
Freidlin B, Jiang W, Simon R (2010) Cross-validated adaptive signature design. Clinical Cancer Res. 16(2):691–698.Crossref, Google Scholar
Freidlin B, Korn EL, Gray R, Martin A (2008) Multi-arm clinical trials of new agents: Some design considerations. Clinical Cancer Res. 14(14):43–68.Crossref, Google Scholar
Garimella S (2015) New oncology clinical trial designs: What works and what doesn’t? Evidence-Based Oncology 21(10):349–350.Google Scholar
Ghosal S (1996) A review of consistency and convergence rates of posterior distribution. Proc. Varanashi Sympos. Bayesian Inference, Varanasi, India.Google Scholar
Ghosh A, Hummel P (2013) Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. Innovations in Theoretical Computer Science Conf. (ITCS) (ACM, New York), 233–246.Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J. Royal Statist. Soc. B 41(2):148–177.Google Scholar
Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons).Crossref, Google Scholar
Goel A, Khanna S, Null B (2009) The ratio index for budgeted learning, with applications. 20th ACM-SIAM Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 18–27.Google Scholar
Guha S, Munagala K (2007) Approximation algorithms for budgeted learning problems. 39th ACM Sympos. Theory of Computing (STOC) (ACM, New York), 104–113.Google Scholar
György A, Linder T, Lugosi G, Ottucsák G (2007) The on-line shortest path problem under partial monitoring. J. Machine Learn. Res. 8:2369–2403.Google Scholar
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems (NIPS), vol. 29 (Curran Associates, Red Hook, NY), 3315–3323.Google Scholar
Hellmich M (2001) Monitoring clinical trials with multiple arms. Biometrics 57(3):892–898.Crossref, Google Scholar
Ho C-J, Slivkins A, Vaughan JW (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intelligence Res. 55:317–359.Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Crossref, Google Scholar
Hörner J, Skrzypacz A (2015) Selling information. J. Political Econom. 24(6):1515–1562.Google Scholar
Immorlica N, Mao J, Slivkins A, Wu S (2019) Bayesian exploration with heterogenous agents. Internat. World Wide Web Conf. Committee (IW3C2), Geneva, Switzerland, 751–761.Google Scholar
Joseph M, Kearns M, Morgenstern JH, Roth A (2016) Fairness in learning: Classic and contextual bandits. Advances in Neural Information Processing Systems (NIPS), vol. 29 (Curran Associates, Red Hook, NY), 325–333..Google Scholar
Kakade SM, Lobel I, Nazerzadeh H (2013) Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res. 61(4):837–854.Link, Google Scholar
Kale S, Reyzin L, Schapire RE (2010) Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), vol. 23 (Curran Associates, Red Hook, NY), 1054–1062.Google Scholar
Kamenica E, Gentzkow M (2011) Bayesian persuasion. Amer. Econom. Rev. 101(6):2590–2615.Crossref, Google Scholar
Kannan S, Kearns M, Morgenstern J, Pai M, Roth A, Vohra R, Wu ZS (2017) Fairness incentives for myopic agents. ACM Conf. Econom. Comput. (ACM EC) (ACM, New York), 369–386.Google Scholar
Kannan S, Morgenstern J, Roth A, Waggoner B, Wu ZS (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in Neural Information Processing Systems (NIPS), vol. 31 (Curran Associates, Red Hook, NY), 369–386.Google Scholar
Kearns M, Roth A, Wu ZS (2017) Meritocratic fairness for cross-population selection. Internat. Conf. Machine Learn. (ICML), 1828–1836.Google Scholar
Keller G, Rady S, Cripps M (2005) Strategic experimentation with exponential bandits. Econometrica 73(1):39–68.Crossref, Google Scholar
Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. Innovations Theoret. Comput. Sci. Conf. (ITCS), 43:1–43:23.Google Scholar
Kleinberg RD, Leighton FT (2003), The value of knowing a demand curve: Bounds on regret for online posted-price auctions. IEEE Sympos. Foundations Comput. Sci. (FOCS), (IEEE, Piscataway, NJ), 594–605.Google Scholar
Kleinberg RD, Waggoner B, Weyl EG (2016) Descending price optimally coordinates search. 17th ACM Conf. Econom. Comput. (ACM-EC) (ACM, New York), 23–24.Google Scholar
Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Langford J, Zhang T (2007) The epoch-greedy algorithm for contextual multi-armed bandits. Advances in Neural Information Processing Systems (NIPS), vol. 20 (Curran Associates, Red Hook, NY), 325–333.Google Scholar
Liu Y, Radanovic G, Dimitrakakis C, Mandal D, Parkes DC (2017) Calibrated fairness in bandits. Preprint, submitted July 6, https://arxiv.org/abs/1707.01875.Google Scholar
Maitland ML, Schilsky RL (2011) Clinical trials in the era of personalized oncology. CA: Cancer J. Clinicians 61(6):365–381.Crossref, Google Scholar
Mannor S, Tsitsiklis JN (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5:623–648.Google Scholar
Mansour Y, Slivkins A, Syrgkanis V, Wu S (2016) Bayesian exploration: Incentivizing exploration in Bayesian games. Preprint, submitted February 24, https://arxiv.org/abs/1602.07570.Google Scholar
Maron O, Moore AW (1993) Hoeffding races: Accelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems (NIPS), vol. 6 (Curran Associates, Red Hook, NY), 59–66.Google Scholar
Maron O, Moore AW (1997) The racing algorithm: Model selection for lazy learners. Artificial Intelligence Rev. 11(1–5):193–225.Crossref, Google Scholar
Mitzenmacher M, Upfal E (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Parmar MKB, Carpenter J, Sydes MR (2014) More multiarm randomised trials of superiority are needed. Lancet 384(9940):283–284.Crossref, Google Scholar
Raghavan M, Slivkins A, Vaughan JW, Wu ZS (2018) The externalities of exploration and how data diversity helps exploitation. Conf. Learn. Theory (COLT), 1724–1738.Google Scholar
Rayo L, Segal I (2010) Optimal information disclosure. J. Political Econom. 118(5):949–987.Crossref, Google Scholar
Redig A, Jänne P (2015) Basket trials and the evolution of clinical trial design in an era of genomic medicine. J. Clinical Oncology 33(9):975–977.Crossref, Google Scholar
Schmit S, Riquelme C (2018) Human interaction with recommendation systems. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 862–870.Google Scholar
Singla A, Krause A (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. 22nd Internat. World Wide Web Conf. (WWW), Geneva, Switzerland, 1167–1178.Google Scholar
Slivkins A, Vaughan JW (2013) Online decision making in crowdsourcing markets: Theoretical challenges. SIGecom Exchanges 12(2):4–23.Google Scholar
Taneva I (2016) Information design. Working paper, University of Edinburgh, Edinburgh, UK.Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
Vaidyanathan G (2012) Redefining clinical trials: The age of personalized medicine. Cell 148(6):1079–1080.Crossref, Google Scholar
Wang C-C, Kulkarni SR, Poor HV (2005) Bandit problems with side observations. IEEE Trans. Automat. Control. 50(3):338–355.Crossref, Google Scholar
Wen Z, Kveton B, Ashkan A (2015) Efficient learning in large-scale combinatorial semi-bandits. 32nd Internat. Conf. Machine Learn. (ICML), 1113–1122.Google Scholar
Woodroofe M (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74(368):799–806.Crossref, Google Scholar

Volume 68, Issue 4

July-August 2020

Pages ii-v, 965-1284, C2-C3

Article Information

Metrics

Information

Received:May 02, 2018
Accepted:July 17, 2019
Published Online:July 02, 2020

Cite as

Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis (2020) Bayesian Incentive-Compatible Bandit Exploration. Operations Research 68(4):1132-1161.

https://doi.org/10.1287/opre.2019.1949

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bayesian Incentive-Compatible Bandit Exploration

References

Volume 68, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News