Maximal Objectives in the Multiarmed Bandit with Applications

Published Online:https://doi.org/10.1287/mnsc.2022.00801

References

  • Agrawal R (1995) Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054–1078.CrossrefGoogle Scholar
  • Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. Proc. 22nd Annual Conf. Learn. Theory, Quebec, Canada, 217–226.Google Scholar
  • Audibert J-Y, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. Proc. 23rd Annual Conf. Learn. Theory (Omnipress, Madison, WI), 41–53.Google Scholar
  • Auer P, Ortner R (2010) UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61(1–2):55–65.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.CrossrefGoogle Scholar
  • Baudry D, Russac Y, Kaufmann E (2022) Efficient algorithms for extreme bandits. Preprint, submitted March 21, https://arxiv.org/abs/2203.10883.Google Scholar
  • Bhatt S, Li P, Samorodnitsky G (2022) Extreme bandits using robust statistics. IEEE Trans. Inform. Theory 69(3):1761–1776.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. Proc. 20th Internat. Conf. Algorithmic Learn. Theory (Springer, Berlin, Heidelberg), 23–37.Google Scholar
  • Bubeck S, Munos R, Stoltz G (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19):1832–1852.CrossrefGoogle Scholar
  • Bubeck S, Wang T, Viswanathan N (2013) Multiple identifications in multi-armed bandits. Proc. 26th Annual Conf. Learn. Theory (JMLR.org), 258–265.Google Scholar
  • Carpentier A, Locatelli A (2016) Tight (lower) bounds for the fixed budget best arm identification bandit problem. Proc. 29th Annual Conf. Learn. Theory (JMLR.org), 590–604.Google Scholar
  • Carpentier A, Valko M (2014) Extreme bandits. Adv. Neural Inform. Processing Systems 27: 28th Annual Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 1089–1097.Google Scholar
  • Carpentier A, Valko M (2015) Simple regret for infinitely many armed bandits. Proc. 32nd Internat. Conf. Machine Learn. (PMLR, New York), 1133–1141.Google Scholar
  • Cesa-Bianchi N, Dekel O, Shamir O (2013) Online learning with switching costs and other adaptive adversaries. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1160–1168.Google Scholar
  • Cicirello VA, Smith SF (2005) The max k-armed bandit: A new model of exploration applied to search heuristic selection. Proc. 20th National Conf. Artificial Intelligence, vol. 3 (AAAI Press, Palo Alto, CA), 1355–1361.Google Scholar
  • Dekel O, Ding J, Koren T, Peres Y (2014) Bandits with switching costs: T2/3 regret. Proc. 46th Annual ACM Sympos. Theory Comput. (ACM, New York), 459–467.Google Scholar
  • Esfandiari H, Karbasi A, Mehrabian A, Mirrokni V (2021) Regret bounds for batched bandits. Proc. Conf. AAAI Artificial Intelligence, vol. 35 (AAAI Press, Palo Alto, CA), 7340–7348.Google Scholar
  • Even-Dar E, Mannor S, Mansour Y (2002) PAC bounds for multi-armed bandit and Markov decision processes. Proc. 15th Internat. Conf. Comput. Learn. Theory (Springer, Berlin, Heidelberg), 255–270.Google Scholar
  • Even-Dar E, Mannor S, Mansour Y (2010) Learning with global cost in stochastic environments. Kalai AT, Mohri M, eds. Proc. 23rd Annual Conf. Learn. Theory, vol. 1 (Omnipress, Madison, WI), 80–92.Google Scholar
  • Even-Dar E, Kleinberg R, Mannor S, Mansour Y (2009) Online learning for global cost functions. Proc. 22nd Annual Conf. Learn. Theory, Montreal, Canada, 1527–1537.Google Scholar
  • Even-Dar E, Mannor S, Mansour Y, Mahadevan S (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7(6):1079–1105.Google Scholar
  • Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 503–513.Google Scholar
  • Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory (JMLR.org), 359–376.Google Scholar
  • Garivier A, Lattimore T, Kaufmann E (2016) On explore-then-commit strategies. Adv. Neural Inform. Processing Systems, vol. 29.Google Scholar
  • Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Gourville JT, Soman D (2005) Overchoice and assortment type: When and why variety backfires. Marketing Sci. 24(3):382–395.LinkGoogle Scholar
  • György A, Kocsis L (2011) Efficient multi-start strategies for local search algorithms. J. Artificial Intelligence Res. 41:407–444.CrossrefGoogle Scholar
  • Hsu W-K, Xu J, Lin X, Bell MR (2022) Integrated online learning and adaptive control in queueing systems with uncertain payoffs. Oper. Res. 70(2):1166–1181.LinkGoogle Scholar
  • Jamieson K, Malloy M, Nowak R, Bubeck S (2014) lil’UCB: An optimal exploration algorithm for multi-armed bandits. Proc. 27th Annual Conf. Learn. Theory (JMLR.org), 423–439.Google Scholar
  • Jin T, Xu P, Xiao X, Gu Q (2021) Double explore-then-commit: Asymptotic optimality and beyond. Proc. 34th Annual Conf. Learn. Theory (PMLR, New York), 2584–2633.Google Scholar
  • Johari R, Kamble V, Kanoria Y (2021) Matching while learning. Oper. Res. 69(2):655–681.LinkGoogle Scholar
  • Jun K-S, Jamieson K, Nowak R, Zhu X (2016) Top arm identification in multi-armed bandits with batch arm pulls. Artificial Intelligence Statist., 139–148.Google Scholar
  • Kalyanakrishnan S, Tewari A, Auer P, Stone P (2012) PAC subset selection in stochastic multi-armed bandits. Proc. 29th Internat. Conf. Machine Learn., 227–234.Google Scholar
  • Karnin Z, Koren T, Somekh O (2013) Almost optimal exploration in multi-armed bandits. Proc. 30th Internat. Conf. Machine Learn. (JMLR.org), 1238–1246.Google Scholar
  • Kaufmann E, Kalyanakrishnan S (2013) Information complexity in bandit subset selection. Proc. 26th Annual Conf. Learn. Theory (JMLR.org), 228–251.Google Scholar
  • Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T (2018) Refining the confidence level for optimistic bandit strategies. J. Machine Learn. Res. 19(1):765–796.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge).CrossrefGoogle Scholar
  • Mannor S, Tsitsiklis JN (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5:623–648.Google Scholar
  • Martí R, Lozano JA, Mendiburu A, Hernando L (2018) Multi-start methods. Handbook of Heuristics (Springer, Berlin, Heidelberg), 155–175.CrossrefGoogle Scholar
  • Martí R, Aceves R, León MT, Moreno-Vega JM, Duarte A (2019) Intelligent multi-start methods. Handbook of Metaheuristics, 221–243.Google Scholar
  • Massoulié L, Xu K (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.LinkGoogle Scholar
  • Ozbay E, Kamble V (2022) Incentives for exploration at market equilibrium. Preprint, submitted March 28, https://dx.doi.org/10.2139/ssrn.4041075.Google Scholar
  • Perchet V, Rigollet P (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.CrossrefGoogle Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A tutorial on thompson sampling. Foundations Trends Machine Learn., 11(1):1–96.CrossrefGoogle Scholar
  • Settle RB, Golden LL (1974) Consumer perceptions: Overchoice in the market place. Adv. Consumer Res. 1:29–37.Google Scholar
  • Shah V, Gulikers L, Massoulié L, Vojnović M (2020) Adaptive matching for expert systems with uncertain task types. Oper. Res. 68(5):1403–1424.LinkGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Smith-Miles KA (2009) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surveys 41(6):1–25.CrossrefGoogle Scholar
  • Streeter MJ, Smith SF (2006a) An asymptotically optimal algorithm for the max k-armed bandit problem. Proc. 21st National Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 135–142.Google Scholar
  • Streeter MJ, Smith SF (2006b) A simple distribution-free approach to the max k-armed bandit problem. Proc. 12th Internat. Conf. Principles Practice Constraint Programming (Springer, Berlin, Heidelberg), 560–574.Google Scholar
  • Sun X, Zhao J (2022) Congestion-aware matching and learning for service platforms.Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Vaidhiyan NK, Sundaresan R (2017) Learning to detect an oddball target. IEEE Trans. Inform. Theory 64(2):831–852.CrossrefGoogle Scholar
  • Yu H, Zhang D, Rauchwerger L (2004) An adaptive algorithm selection framework. Proc. 13th Internat. Conf. Parallel Architecture Compilation Techniques (IEEE, Piscataway, NJ), 278–289.Google Scholar
  • Zhou Y, Chen X, Li J (2014) Optimal PAC multiple arm identification with applications to crowdsourcing. Proc. 31st Internat. Conf. Machine Learn. (JMLR.org), 217–225.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.