Satisficing in Time-Sensitive Bandit Learning

Published Online:https://doi.org/10.1287/moor.2021.1229

References

  • [1] Agrawal S, Goyal N (2013) Further optimal regret bounds for Thompson sampling. Proc. Sixteenth Internat. Conf. Artificial Intelligence Statist., 99–107.Google Scholar
  • [2] Berry DA, Chen RW, Zame A, Heath DC, Shepp LA (1997) Bandit problems with infinitely many arms. Ann. Statist. 25(5):2103–2116.CrossrefGoogle Scholar
  • [3] Bonald T, Proutiere A (2013) Two-target algorithms for infinite-armed bandits with Bernoulli rewards. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2184–2192.Google Scholar
  • [4] Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • [5] Bubeck S, Eldan R (2016) Multi-scale exploration of convex functions and bandit convex optimization. Conf. Learn. Theory (PMLR), 583–589.Google Scholar
  • [6] Bubeck S, Munos R, Stoltz G, Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12:1655–1695.Google Scholar
  • [7] Cappé O, Garivier A, Maillard OA, Munos R, Stoltz G (2013) Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41(3):1516–1541.CrossrefGoogle Scholar
  • [8] Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, eds. Proc. 24th Internat. Conf. Neural Inform. Processing System (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
  • [9] Cover T, Thomas J (2012) Elements of Information Theory (John Wiley & Sons, New York).Google Scholar
  • [10] Deshpande Y, Montanari A (2012) Linear bandits in high dimension and recommendation systems. 50th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1750–1754.Google Scholar
  • [11] Francetich A, Kreps D (2020) Choosing a good toolkit, i: Prior-free heuristics. J. Econom. Dynam. Control 111:103813.CrossrefGoogle Scholar
  • [12] Francetich A, Kreps D (2020) Choosing a good toolkit, ii: Bayes-rule based heuristics. J. Econom. Dynam. Control 111:103814.CrossrefGoogle Scholar
  • [13] Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
  • [14] Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th ACM Sympos. Theory Comput. (ACM, New York).CrossrefGoogle Scholar
  • [15] Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Advances Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • [16] Lattimore T, Szepesvári C (2019) An information-theoretic approach to minimax regret in partial monitoring. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, vol. 99 (PMLR), 2111–2139.Google Scholar
  • [17] Liu F, Buccapatnam S, Shroff N (2018) Information directed sampling for stochastic bandits with graph feedback. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI, Palo Alto CA).Google Scholar
  • [18] Lu X, Van Roy B (2019) Information-theoretic confidence bounds for reinforcement learning. Advances in Neural Information Processing Systems (Neurips) 32 (Curran Associates, Inc., Red Hook, NY), 2458–2466.Google Scholar
  • [19] Rusmevichientong P, Tsitsiklis J (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • [20] Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • [21] Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(68):1–30.Google Scholar
  • [22] Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A tutorial on Thompson sampling. Foundations Trends Machine Learn. 11(1):1–96.CrossrefGoogle Scholar
  • [23] Ryzhov I, Powell W, Frazier P (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.LinkGoogle Scholar
  • [24] Scott S (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.CrossrefGoogle Scholar
  • [25] Simon HA (1979) Rational decision making in business organizations. Amer. Econom. Rev. 69(4):493–513.Google Scholar
  • [26] Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • [27] Tossou AC, Dimitrakakis C, Dubhashi DP (2017) Thompson sampling for stochastic bandits with graph feedback. Proc. 31st AAAI Conf. Artificial Intelligence (AAAI, Palo Alto, CA), 2660–2666.CrossrefGoogle Scholar
  • [28] Tsybakov AB (2008) Introduction to Nonparametric Estimation (Springer Science & Business Media, Berlin).Google Scholar
  • [29] Wang Y, Audibert JY, Munos R (2009) Algorithms for infinitely many-armed bandits. Advances in Neural Information Processing Systems (NIPS), (Curran Associates, Inc., Red Hook, NY), 1729–1736.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.