Maximal Objectives in the Multiarmed Bandit with Applications
Published Online:15 Mar 2024https://doi.org/10.1287/mnsc.2022.00801
References
- (1995) Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054–1078.Crossref, Google Scholar
- (2009) Minimax policies for adversarial and stochastic bandits. Proc. 22nd Annual Conf. Learn. Theory, Quebec, Canada, 217–226.Google Scholar
- (2010) Best arm identification in multi-armed bandits. Proc. 23rd Annual Conf. Learn. Theory (Omnipress, Madison, WI), 41–53.Google Scholar
- (2010) UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61(1–2):55–65.Crossref, Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
- (2022) Efficient algorithms for extreme bandits. Preprint, submitted March 21, https://arxiv.org/abs/2203.10883.Google Scholar
- (2022) Extreme bandits using robust statistics. IEEE Trans. Inform. Theory 69(3):1761–1776.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2009) Pure exploration in multi-armed bandits problems. Proc. 20th Internat. Conf. Algorithmic Learn. Theory (Springer, Berlin, Heidelberg), 23–37.Google Scholar
- (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19):1832–1852.Crossref, Google Scholar
- (2013) Multiple identifications in multi-armed bandits. Proc. 26th Annual Conf. Learn. Theory (JMLR.org), 258–265.Google Scholar
- (2016) Tight (lower) bounds for the fixed budget best arm identification bandit problem. Proc. 29th Annual Conf. Learn. Theory (JMLR.org), 590–604.Google Scholar
- (2014) Extreme bandits. Adv. Neural Inform. Processing Systems 27: 28th Annual Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 1089–1097.Google Scholar
- (2015) Simple regret for infinitely many armed bandits. Proc. 32nd Internat. Conf. Machine Learn. (PMLR, New York), 1133–1141.Google Scholar
- (2013) Online learning with switching costs and other adaptive adversaries. Adv. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1160–1168.Google Scholar
- (2005) The max k-armed bandit: A new model of exploration applied to search heuristic selection. Proc. 20th National Conf. Artificial Intelligence, vol. 3 (AAAI Press, Palo Alto, CA), 1355–1361.Google Scholar
- (2014) Bandits with switching costs: T2/3 regret. Proc. 46th Annual ACM Sympos. Theory Comput. (ACM, New York), 459–467.Google Scholar
- (2021) Regret bounds for batched bandits. Proc. Conf. AAAI Artificial Intelligence, vol. 35 (AAAI Press, Palo Alto, CA), 7340–7348.Google Scholar
- (2002) PAC bounds for multi-armed bandit and Markov decision processes. Proc. 15th Internat. Conf. Comput. Learn. Theory (Springer, Berlin, Heidelberg), 255–270.Google Scholar
- (2010) Learning with global cost in stochastic environments. Kalai AT, Mohri M, eds. Proc. 23rd Annual Conf. Learn. Theory, vol. 1 (Omnipress, Madison, WI), 80–92.Google Scholar
- (2009) Online learning for global cost functions. Proc. 22nd Annual Conf. Learn. Theory, Montreal, Canada, 1527–1537.Google Scholar
- (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7(6):1079–1105.Google Scholar
- (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 503–513.Google Scholar
- (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory (JMLR.org), 359–376.Google Scholar
- (2016) On explore-then-commit strategies. Adv. Neural Inform. Processing Systems, vol. 29.Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2005) Overchoice and assortment type: When and why variety backfires. Marketing Sci. 24(3):382–395.Link, Google Scholar
- (2011) Efficient multi-start strategies for local search algorithms. J. Artificial Intelligence Res. 41:407–444.Crossref, Google Scholar
- (2022) Integrated online learning and adaptive control in queueing systems with uncertain payoffs. Oper. Res. 70(2):1166–1181.Link, Google Scholar
- (2014) lil’UCB: An optimal exploration algorithm for multi-armed bandits. Proc. 27th Annual Conf. Learn. Theory (JMLR.org), 423–439.Google Scholar
- (2021) Double explore-then-commit: Asymptotic optimality and beyond. Proc. 34th Annual Conf. Learn. Theory (PMLR, New York), 2584–2633.Google Scholar
- (2021) Matching while learning. Oper. Res. 69(2):655–681.Link, Google Scholar
- (2016) Top arm identification in multi-armed bandits with batch arm pulls. Artificial Intelligence Statist., 139–148.Google Scholar
- (2012) PAC subset selection in stochastic multi-armed bandits. Proc. 29th Internat. Conf. Machine Learn., 227–234.Google Scholar
- (2013) Almost optimal exploration in multi-armed bandits. Proc. 30th Internat. Conf. Machine Learn. (JMLR.org), 1238–1246.Google Scholar
- (2013) Information complexity in bandit subset selection. Proc. 26th Annual Conf. Learn. Theory (JMLR.org), 228–251.Google Scholar
- (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2018) Refining the confidence level for optimistic bandit strategies. J. Machine Learn. Res. 19(1):765–796.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge).Crossref, Google Scholar
- (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5:623–648.Google Scholar
- (2018) Multi-start methods. Handbook of Heuristics (Springer, Berlin, Heidelberg), 155–175.Crossref, Google Scholar
- (2019) Intelligent multi-start methods. Handbook of Metaheuristics, 221–243.Google Scholar
- (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.Link, Google Scholar
- (2022) Incentives for exploration at market equilibrium. Preprint, submitted March 28, https://dx.doi.org/10.2139/ssrn.4041075.Google Scholar
- (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.Crossref, Google Scholar
- (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
- (2018) A tutorial on thompson sampling. Foundations Trends Machine Learn., 11(1):1–96.Crossref, Google Scholar
- (1974) Consumer perceptions: Overchoice in the market place. Adv. Consumer Res. 1:29–37.Google Scholar
- (2020) Adaptive matching for expert systems with uncertain task types. Oper. Res. 68(5):1403–1424.Link, Google Scholar
- (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
- (2009) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surveys 41(6):1–25.Crossref, Google Scholar
- (2006a) An asymptotically optimal algorithm for the max k-armed bandit problem. Proc. 21st National Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 135–142.Google Scholar
- (2006b) A simple distribution-free approach to the max k-armed bandit problem. Proc. 12th Internat. Conf. Principles Practice Constraint Programming (Springer, Berlin, Heidelberg), 560–574.Google Scholar
- (2022) Congestion-aware matching and learning for service platforms.Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- (2017) Learning to detect an oddball target. IEEE Trans. Inform. Theory 64(2):831–852.Crossref, Google Scholar
- (2004) An adaptive algorithm selection framework. Proc. 13th Internat. Conf. Parallel Architecture Compilation Techniques (IEEE, Piscataway, NJ), 278–289.Google Scholar
- (2014) Optimal PAC multiple arm identification with applications to crowdsourcing. Proc. 31st Internat. Conf. Machine Learn. (JMLR.org), 217–225.Google Scholar

