Satisficing in Time-Sensitive Bandit Learning
References
- [1] (2013) Further optimal regret bounds for Thompson sampling. Proc. Sixteenth Internat. Conf. Artificial Intelligence Statist., 99–107.Google Scholar
- [2] (1997) Bandit problems with infinitely many arms. Ann. Statist. 25(5):2103–2116.Crossref, Google Scholar
- [3] (2013) Two-target algorithms for infinite-armed bandits with Bernoulli rewards. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2184–2192.Google Scholar
- [4] (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- [5] (2016) Multi-scale exploration of convex functions and bandit convex optimization. Conf. Learn. Theory (PMLR), 583–589.Google Scholar
- [6] (2011) X-armed bandits. J. Machine Learn. Res. 12:1655–1695.Google Scholar
- [7] (2013) Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41(3):1516–1541.Crossref, Google Scholar
- [8] (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, eds. Proc. 24th Internat. Conf. Neural Inform. Processing System (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
- [9] (2012) Elements of Information Theory (John Wiley & Sons, New York).Google Scholar
- [10] (2012) Linear bandits in high dimension and recommendation systems. 50th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1750–1754.Google Scholar
- [11] (2020) Choosing a good toolkit, i: Prior-free heuristics. J. Econom. Dynam. Control 111:103813.Crossref, Google Scholar
- [12] (2020) Choosing a good toolkit, ii: Bayes-rule based heuristics. J. Econom. Dynam. Control 111:103814.Crossref, Google Scholar
- [13] (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
- [14] (2008) Multi-armed bandits in metric spaces. Proc. 40th ACM Sympos. Theory Comput. (ACM, New York).Crossref, Google Scholar
- [15] (1985) Asymptotically efficient adaptive allocation rules. Advances Appl. Math. 6(1):4–22.Crossref, Google Scholar
- [16] (2019) An information-theoretic approach to minimax regret in partial monitoring. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, vol. 99 (PMLR), 2111–2139.Google Scholar
- [17] (2018) Information directed sampling for stochastic bandits with graph feedback. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI, Palo Alto CA).Google Scholar
- [18] (2019) Information-theoretic confidence bounds for reinforcement learning. Advances in Neural Information Processing Systems (Neurips) 32 (Curran Associates, Inc., Red Hook, NY), 2458–2466.Google Scholar
- [19] (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- [20] (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- [21] (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(68):1–30.Google Scholar
- [22] (2018) A tutorial on Thompson sampling. Foundations Trends Machine Learn. 11(1):1–96.Crossref, Google Scholar
- [23] (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- [24] (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.Crossref, Google Scholar
- [25] (1979) Rational decision making in business organizations. Amer. Econom. Rev. 69(4):493–513.Google Scholar
- [26] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- [27] (2017) Thompson sampling for stochastic bandits with graph feedback. Proc. 31st AAAI Conf. Artificial Intelligence (AAAI, Palo Alto, CA), 2660–2666.Crossref, Google Scholar
- [28] (2008) Introduction to Nonparametric Estimation (Springer Science & Business Media, Berlin).Google Scholar
- [29] (2009) Algorithms for infinitely many-armed bandits. Advances in Neural Information Processing Systems (NIPS), (Curran Associates, Inc., Red Hook, NY), 1729–1736.Google Scholar

