Fair Exploration via Axiomatic Bargaining

Published Online:https://doi.org/10.1287/mnsc.2022.01985

References

  • Alexander BM, Ba S, Berger MS, Berry DA, Cavenee WK, Chang SM, Cloughesy TF, et al. (2018) Adaptive global innovative learning environment for glioblastoma: GBM AGILE. Clinical Cancer Res. 24(4):737–743.CrossrefGoogle Scholar
  • Barker A, Sigman C, Kelloff G, Hylton N, Berry D, Esserman L (2009) I-spy 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology Therapy 86(1):97–100.CrossrefGoogle Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Berge C (1963) Topological Spaces (Oliver and Boyd Ltd., Edinburgh, UK).Google Scholar
  • Berry SM, Broglio KR, Groshen S, Berry DA (2013) Bayesian hierarchical modeling of patient subpopulations: Efficient designs of phase II oncology clinical trials. Clinical Trials 10(5):720–734.CrossrefGoogle Scholar
  • Berry SM, Carlin BP, Lee JJ, Muller P (2010) Bayesian Adaptive Methods for Clinical Trials (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Bertsimas D, Farias VF, Trichakis N (2011) The price of fairness. Oper. Res. 59(1):17–31.LinkGoogle Scholar
  • Chen B, Frazier P, Kempe D (2018) Incentivizing exploration by heterogeneous users. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learning Theory, vol. 75 (PMLR, New York), 798–818.Google Scholar
  • Combes R, Magureanu S, Proutiere A (2017) Minimal exploration in structured stochastic bandits. Preprint, submitted November 1, https://arxiv.org/abs/1711.00400.Google Scholar
  • Frazier P, Kempe D, Kleinberg J, Kleinberg R (2014) Incentivizing exploration. Proc. 15th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 5–22.Google Scholar
  • Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. J. Machine Learning Res. 19:359–376.Google Scholar
  • Gillen S, Jung C, Kearns M, Roth A (2018) Online learning with an unknown fairness metric. Preprint, submitted February 20, https://arxiv.org/abs/1802.06936.Google Scholar
  • Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.LinkGoogle Scholar
  • Graves TL, Lai TL (1997) Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Control Optim. 35(3):715–743.CrossrefGoogle Scholar
  • Hao B, Lattimore T, Szepesvari C (2020) Adaptive exploration in linear contextual bandit. Chiappa S, Calandra R, eds. Proc. Twenty Third Internat. Conf. Artificial Intelligence Statist., vol. 108 (PMLR, New York), 3536–3545.Google Scholar
  • Immorlica N, Mao J, Slivkins A, Wu ZS (2018) Incentivizing exploration with selective data disclosure. Preprint, submitted November 14, https://arxiv.org/abs/1811.06026.Google Scholar
  • Jiang LB, Liew SC (2005) Proportional fairness in wireless LANS and ad hoc networks. IEEE Wireless Comm. Networking Conf., vol. 3 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1551–1556.Google Scholar
  • Joseph M, Kearns M, Morgenstern J, Roth A (2016) Fairness in learning: Classic and contextual bandits. Preprint, submitted May 23, https://arxiv.org/abs/1605.07139.Google Scholar
  • Jung C, Kannan S, Lutz N (2020) Quantifying the burden of exploration and the unfairness of free riding. Proc. 14th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1892–1904.Google Scholar
  • Kalai E, Smorodinsky M (1975) Other solutions to Nash’s bargaining problem. Econometrica 43(3):513–518.CrossrefGoogle Scholar
  • Kaneko M, Nakamura K (1979) The Nash social welfare function. Econometrica 47(2):423–435.CrossrefGoogle Scholar
  • Kannan S, Morgenstern JH, Roth A, Waggoner B, Wu ZS (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances Neural Inform. Processing Systems (NeurIPS 2018) (Curran Associates, Red Hook, NY), 2227–2236.Google Scholar
  • Kannan S, Kearns M, Morgenstern J, Pai M, Roth A, Vohra R, Wu ZS (2017) Fairness incentives for myopic agents. Proc. 2017 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 369–386.Google Scholar
  • Kelly FP, Maulloo AK, Tan DK (1998) Rate control for communication networks: Shadow prices, proportional fairness and stability. J. Oper. Res. Soc. 49(3):237–252.CrossrefGoogle Scholar
  • Kim ES, Herbst RS, Wistuba II, Lee JJ, Blumenschein GR, Tsao A, Stewart DJ, et al. (2011) The BATTLE trial: Personalizing therapy for lung cancer. Cancer Discovery 1(1):44–53.CrossrefGoogle Scholar
  • Kleinberg R, Niculescu-Mizil A, Sharma Y (2010) Regret bounds for sleeping experts and bandits. Machine Learning 80(2):245–272.CrossrefGoogle Scholar
  • Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.CrossrefGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Advances Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T (2018) Refining the confidence level for optimistic bandit strategies. J. Machine Learning Res. 19(1):765–796.Google Scholar
  • Lattimore T, Szepesvari C (2017) The end of optimism? An asymptotic analysis of finite-armed linear bandits. Singh A, Zhu J, eds. Proc. 20th Internat. Conf. Artificial Intelligence Statist., vol. 54 (PMLR, New York), 728–737.Google Scholar
  • Liu Y, Radanovic G, Dimitrakakis C, Mandal D, Parkes DC (2017) Calibrated fairness in bandits. Preprint, submitted July 6, https://arxiv.org/abs/1707.01875.Google Scholar
  • Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. Proc. 16th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 565–582.Google Scholar
  • Mas-Colell A, Whinston MD, Green JR (1995) Microeconomic Theory, vol. 1 (Oxford University Press, New York).Google Scholar
  • Mo J, Walrand J (2000) Fair end-to-end window-based congestion control. IEEE Trans. Networking 8(5):556–567.CrossrefGoogle Scholar
  • Nash JF (1950) The bargaining problem. Econometrica 18(2):155–162.CrossrefGoogle Scholar
  • Papanastasiou Y, Bimpikis K, Savva N (2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.LinkGoogle Scholar
  • Patil V, Ghalme G, Nair V, Narahari Y (2020) Achieving fairness in the stochastic multi-armed bandit problem. Proc. AAAI Conf. Artificial Intelligence, vol. 34 (AAAI Association for the Advancement of Artificial Intelligence, Washington, DC), 5379–5386.Google Scholar
  • Polyak K (2011) Heterogeneity in breast cancer. J. Clinical Investigation 121(10):3786–3788.CrossrefGoogle Scholar
  • Raghavan M, Slivkins A, Wortman JV, Wu ZS (2018) The externalities of exploration and how data diversity helps exploitation. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learning Theory, vol. 75 (PMLR, New York), 1724–1738.Google Scholar
  • Sen A, Foster JE (1997) On Economic Inequality (Oxford University Press, Oxford, UK).Google Scholar
  • Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N, Soranzo N, Whittaker P, et al. (2009) A genome-wide association study confirms VKORC1, CYP2C9, and CYP4f2 as principal genetic determinants of warfarin dose. PLoS Genetics 5(3):e1000433.CrossrefGoogle Scholar
  • Van Parys B, Golrezaei N (2020) Optimal learning for structured bandits. Preprint, submitted August 12, https://dx.doi.org/10.2139/ssrn.3651397.Google Scholar
  • Whirl-Carrillo M, McDonagh EM, Hebert J, Gong L, Sangkuhl K, Thorn C, Altman RB, Klein TE (2012) Pharmacogenomics knowledge for personalized medicine. Clinical Pharmacology Therapy 92(4):414–417.CrossrefGoogle Scholar
  • Wysowski DK, Nourjah P, Swartz L (2007) Bleeding complications with warfarin use: A prevalent adverse effect resulting in regulatory action. Archives Internal Medicine 167(13):1414–1419.CrossrefGoogle Scholar
  • Yang L, Chen YZJ, Hajiemaili MH, Lui JC, Towsley D (2022) Distributed bandits with heterogeneous agents. IEEE INFOCOM 2022-IEEE Conf. Comput. Comm. (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 200–209.Google Scholar
  • Young HP (1995) Equity: In Theory and Practice (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Zhou X, Liu S, Kim ES, Herbst RS, Lee JJ (2008) Bayesian adaptive design for targeted therapy development in lung cancer–A step toward personalized medicine. Clinical Trials 5(3):181–193.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.