Bayesian Incentive-Compatible Bandit Exploration
Published Online:2 Jul 2020https://doi.org/10.1287/opre.2019.1949
References
- (2014) Taming the monster: A fast and simple algorithm for contextual bandits. 31st Internat. Conf. Machine Learn. (ICML), 1638–1646.Google Scholar
- (2015) Online learning with feedback graphs: Beyond bandits. 28th Conf. Learn Theory (COLT), 23–35.Google Scholar
- (2013) From bandits to experts: A tale of domination and independence. Advances in Neural Information Processing Systems (NIPS), vol. 27 (Curran Associates, Red Hook, NY), 1610–1618.Google Scholar
- (2012) Good Clinical Practice Guide (CITI Program, Miami).Google Scholar
- (2013) An efficient dynamic mechanism. Econometrica 81(6):2463–2485.Google Scholar
- (2010) Regret bounds and minimax policies under partial monitoring. J. Machine Learn. Res. 11:2785–2836.Google Scholar
- (2011) Minimax policies for combinatorial prediction games. 24th Conf. Learn. Theory (COLT), 107–132.Google Scholar
- (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Google Scholar
- (2015a) Truthful mechanisms with implicit payment computation. J. ACM 62(2):Article 10.Google Scholar
- (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.Google Scholar
- (2015b) Dynamic pricing with limited supply. ACM Trans. Econom. Comput. 3(1):4.Google Scholar
- (2018) Bandits with knapsacks. J. ACM 65(3):Article 13.Google Scholar
- (2016) Economic recommendation systems. 16th ACM Conf. Electronic Commerce (EC) (ACM, New York), 757.Google Scholar
- (2014) Partial monitoring - classification, regret bounds, and algorithms. Math. Oper. Res. 39(4):967–997.Link, Google Scholar
- (2017) Mostly exploration-free algorithms for contextual bandits. Preprint, submitted April 28, https://arxiv.org/abs/1704.09011.Google Scholar
- (2016) Information design, Bayesian persuasion and Bayes correlated equilibrium. Amer. Econom. Rev. 106(5):586–591.Google Scholar
- (2010) The dynamic pivot mechanism. Econometrica 78(2):771–789.Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2018) Crowdsourcing exploration. Management Sci. 64(4):1477–1973.Google Scholar
- (1999) Strategic experimentation. Econometrica 67(2):349–374.Crossref, Google Scholar
- (2012) Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. (Now Publishers, Boston).Google Scholar
- (2011) Pure exploration in multi-armed bandit problems. Theoret. Comput. Sci. 412(19):1832–1852.Crossref, Google Scholar
- (2017) Fair personalization. Preprint, submitted July 7, https://arxiv.org/abs/1707.02260.Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2019) Optimal design for social learning. Quart. J. Econom. 133(2):871–925.Google Scholar
- (2018) Incentivizing exploration by heterogeneous users. Conf. Learn Theory (COLT), 798–818.Google Scholar
- (2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5(2):153–163.Crossref, Google Scholar
- (2008) Adaptive design methods in clinical trials – A review. Orphanet J. Rare Diseases 3:11.Crossref, Google Scholar
- (2012) Standards for the design, conduct, and evaluation of adaptive randomized clinical trials. Report, Patient-Centered Outcomes Research Institute (PCORI), Washington, DC.Google Scholar
- (2009) The price of truthfulness for pay-per-click auctions. 10th ACM Conf. Electronic Commerce (EC), 99–106.Google Scholar
- (1949) Application of the theory of martingales. Le Calcul des Probabilités et ses Applications (Centre National de la Recherche Scientifique, Paris), 23–27.Google Scholar
- (2011) Efficient optimal leanring for contextual bandits. 27th Conf. Uncertainty Artificial Intelligence (UAI) (AUAI Press, Arlington, VA), 169–178.Google Scholar
- (2012) Fairness through awareness. Innovations Theoret. Comput. Sci. Conf. (ITCS) (ACM, New York), 214–226.Google Scholar
- (2015) Suspense and Surprise. J. Political Econom. 123(1):215–260.Crossref, Google Scholar
- (2002) PAC bounds for multi-armed bandit and Markov decision processes. 15th Conf. Learn. Theory (COLT) Lecture Notes in Computer Science, vol 2375 (Springer, Berlin, Heidelberg), 255–270.Google Scholar
- (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7:1079–1105.Google Scholar
- (2014) Incentivizing exploration. ACM Conf. Econom. Comput. (ACM EC) (ACM, New York), 5–22.Google Scholar
- (2005) Adaptive signature design: An adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clinical Cancer Res. 11(21):7872–7878.Crossref, Google Scholar
- (2007) Biomarker adaptive threshold design: A procedure for evaluating treatment with possible biomarker-defined subset effect. J. National Cancer Institute 99(13):1036–1043.Crossref, Google Scholar
- (2010) Cross-validated adaptive signature design. Clinical Cancer Res. 16(2):691–698.Crossref, Google Scholar
- (2008) Multi-arm clinical trials of new agents: Some design considerations. Clinical Cancer Res. 14(14):43–68.Crossref, Google Scholar
- (2015) New oncology clinical trial designs: What works and what doesn’t? Evidence-Based Oncology 21(10):349–350.Google Scholar
- (1996) A review of consistency and convergence rates of posterior distribution. Proc. Varanashi Sympos. Bayesian Inference, Varanasi, India.Google Scholar
- (2013) Learning and incentives in user-generated content: multi-armed bandits with endogenous arms. Innovations in Theoretical Computer Science Conf. (ITCS) (ACM, New York), 233–246.Google Scholar
- (1979) Bandit processes and dynamic allocation indices (with discussion). J. Royal Statist. Soc. B 41(2):148–177.Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons).Crossref, Google Scholar
- (2009) The ratio index for budgeted learning, with applications. 20th ACM-SIAM Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 18–27.Google Scholar
- (2007) Approximation algorithms for budgeted learning problems. 39th ACM Sympos. Theory of Computing (STOC) (ACM, New York), 104–113.Google Scholar
- (2007) The on-line shortest path problem under partial monitoring. J. Machine Learn. Res. 8:2369–2403.Google Scholar
- (2016) Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems (NIPS), vol. 29 (Curran Associates, Red Hook, NY), 3315–3323.Google Scholar
- (2001) Monitoring clinical trials with multiple arms. Biometrics 57(3):892–898.Crossref, Google Scholar
- (2016) Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artificial Intelligence Res. 55:317–359.Google Scholar
- (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Crossref, Google Scholar
- (2015) Selling information. J. Political Econom. 24(6):1515–1562.Google Scholar
- (2019) Bayesian exploration with heterogenous agents. Internat. World Wide Web Conf. Committee (IW3C2), Geneva, Switzerland, 751–761.Google Scholar
- (2016) Fairness in learning: Classic and contextual bandits. Advances in Neural Information Processing Systems (NIPS), vol. 29 (Curran Associates, Red Hook, NY), 325–333..Google Scholar
- (2013) Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res. 61(4):837–854.Link, Google Scholar
- (2010) Non-stochastic bandit slate problems. Advances in Neural Information Processing Systems (NIPS), vol. 23 (Curran Associates, Red Hook, NY), 1054–1062.Google Scholar
- (2011) Bayesian persuasion. Amer. Econom. Rev. 101(6):2590–2615.Crossref, Google Scholar
- (2017) Fairness incentives for myopic agents. ACM Conf. Econom. Comput. (ACM EC) (ACM, New York), 369–386.Google Scholar
- (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in Neural Information Processing Systems (NIPS), vol. 31 (Curran Associates, Red Hook, NY), 369–386.Google Scholar
- (2017) Meritocratic fairness for cross-population selection. Internat. Conf. Machine Learn. (ICML), 1828–1836.Google Scholar
- (2005) Strategic experimentation with exponential bandits. Econometrica 73(1):39–68.Crossref, Google Scholar
- (2017) Inherent trade-offs in the fair determination of risk scores. Innovations Theoret. Comput. Sci. Conf. (ITCS), 43:1–43:23.Google Scholar
- (2003), The value of knowing a demand curve: Bounds on regret for online posted-price auctions. IEEE Sympos. Foundations Comput. Sci. (FOCS), (IEEE, Piscataway, NJ), 594–605.Google Scholar
- (2016) Descending price optimally coordinates search. 17th ACM Conf. Econom. Comput. (ACM-EC) (ACM, New York), 23–24.Google Scholar
- (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2007) The epoch-greedy algorithm for contextual multi-armed bandits. Advances in Neural Information Processing Systems (NIPS), vol. 20 (Curran Associates, Red Hook, NY), 325–333.Google Scholar
- (2017) Calibrated fairness in bandits. Preprint, submitted July 6, https://arxiv.org/abs/1707.01875.Google Scholar
- (2011) Clinical trials in the era of personalized oncology. CA: Cancer J. Clinicians 61(6):365–381.Crossref, Google Scholar
- (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5:623–648.Google Scholar
- (2016) Bayesian exploration: Incentivizing exploration in Bayesian games. Preprint, submitted February 24, https://arxiv.org/abs/1602.07570.Google Scholar
- (1993) Hoeffding races: Accelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems (NIPS), vol. 6 (Curran Associates, Red Hook, NY), 59–66.Google Scholar
- (1997) The racing algorithm: Model selection for lazy learners. Artificial Intelligence Rev. 11(1–5):193–225.Crossref, Google Scholar
- (2005) Probability and Computing: Randomized Algorithms and Probabilistic Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2014) More multiarm randomised trials of superiority are needed. Lancet 384(9940):283–284.Crossref, Google Scholar
- (2018) The externalities of exploration and how data diversity helps exploitation. Conf. Learn. Theory (COLT), 1724–1738.Google Scholar
- (2010) Optimal information disclosure. J. Political Econom. 118(5):949–987.Crossref, Google Scholar
- (2015) Basket trials and the evolution of clinical trial design in an era of genomic medicine. J. Clinical Oncology 33(9):975–977.Crossref, Google Scholar
- (2018) Human interaction with recommendation systems. Internat. Conf. Artificial Intelligence Statist. (AISTATS), 862–870.Google Scholar
- (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. 22nd Internat. World Wide Web Conf. (WWW), Geneva, Switzerland, 1167–1178.Google Scholar
- (2013) Online decision making in crowdsourcing markets: Theoretical challenges. SIGecom Exchanges 12(2):4–23.Google Scholar
- (2016) Information design. Working paper, University of Edinburgh, Edinburgh, UK.Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
- (2012) Redefining clinical trials: The age of personalized medicine. Cell 148(6):1079–1080.Crossref, Google Scholar
- (2005) Bandit problems with side observations. IEEE Trans. Automat. Control. 50(3):338–355.Crossref, Google Scholar
- (2015) Efficient learning in large-scale combinatorial semi-bandits. 32nd Internat. Conf. Machine Learn. (ICML), 1113–1122.Google Scholar
- (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74(368):799–806.Crossref, Google Scholar

