A General Framework for Bandit Problems Beyond Cumulative Objectives
References
- [1] (2019) Bandits with global convex constraints and objective. Oper. Res. 67(5):1486–1502.Link, Google Scholar
- [2] (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory, vol. 23, Proceedings of Machine Learning Research Series (PMLR, Edinburgh, UK), 39.1–39.26.Google Scholar
- [3] (1999) Coherent measures of risk. Math. Finance 9(3):203–228.Crossref, Google Scholar
- [4] (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- [5] (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. Foundations Comput. Sci. (IEEE, Washington, DC), 322–331.Google Scholar
- [6] (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends® in Machine Learn. 5(1):1–122.Google Scholar
- [7] (2018) A general approach to multi-armed bandits under risk criteria. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75, Proceedings of Machine Learning Research Series (PMLR), 1295–1306.Google Scholar
- [8] (2018) PAC bandits with risk constraints. Internat. Sympos. Artificial Intelligence Math. ISAIM 2018, Fort Lauderdale, FL. http://isaim2018.cs.virginia.edu/papers/ISAIM2018_ML_David_etal.pdf.Google Scholar
- [9] (1992) On the law of the iterated logarithm for martingales. Ann. Probab. 20(2):675–680.Crossref, Google Scholar
- [10] (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162(3):707–738.Crossref, Google Scholar
- [11] (2013) Exploration vs exploitation vs safety: Risk-aware multi-armed bandits. Ong CS, Ho TB, eds. Proc. 5th Asian Conf. Machine Learn., vol. 29, Proceedings of Machine Learning Research Series (PMLR, Australian National University, Canberra, Australia). 245–260.Google Scholar
- [12] (2020) Learning with CVaR-based feedback under potentially heavy tails. Preprint, submitted June 3, https://arxiv.org/abs/2006.02001.Google Scholar
- [13] (2018) Risk-averse approximate dynamic programming with quantile-based risk measures. Math. Oper. Res. 43(2):554–579.Link, Google Scholar
- [14] (2019) Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., New York), 11272–11281.Google Scholar
- [15] (2014) Law of the Iterated Logarithm (Springer, London), 509–519.Google Scholar
- [16] (2022) A Wasserstein distance approach for concentration of empirical risk estimates. J. Machine Learn. Res. 23(238):1–61.Google Scholar
- [17] (2020) Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions. Proc. Machine Learn. Systems (PMLR), 3657–3666.Google Scholar
- [18] (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- [19] (2013) Robust risk-averse stochastic multi-armed bandits. Jain S, Munos R, Stephan F, Zeugmann T, eds. Algorithmic Learning Theory (Springer, Cham, Switzerland), 218–233.Google Scholar
- [20] (2011) A finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences. Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory, vol. 19, Proceedings of Machine Learning Research Series (PMLR, Budapest, Hungary), 497–514.Google Scholar
- [21] (1990) The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab. 18(3):1269–1283.Crossref, Google Scholar
- [22] (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- [23] (2012) Risk-aversion in multi-armed bandits. Proc. 25th Internat. Conf. Neural Inform. Processing Systems, vol. 2 (Curran Associates Inc., Red Hook, NY), 3275–3283.Google Scholar
- [24] (1996) The Strong Law of Large Numbers (Springer, New York), 311–325.Google Scholar
- [25] (2019) Distributionally-aware exploration for CVaR bandits. NeurIPS 2019 Workshop Safety Robustness Decision Making RLDM 2019.Google Scholar
- [26] (2019) X-armed bandits: Optimizing quantiles, CVaR and other risks. Lee WS, Suzuki T, eds. Proc. Machine Learn. Res., vol. 101, Proceedings of Machine Learning Research Series (PMLR, Nagoya, Japan), 252–267.Google Scholar
- [27] (2014) Functional bandits. Preprint, submitted May 10, https://arxiv.org/abs/1405.2432.Google Scholar
- [28] (2015) Mean-variance and value at risk in multi-armed bandit problems. 2015 53rd Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE), 1330–1335.Google Scholar
- [29] (2016) Risk-averse multi-armed bandit problems under mean-variance measure. IEEE J. Selected Topics Signal Processing 10(6):1093–1111.Crossref, Google Scholar
- [30] (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar
- [31] (2017) Risk control of best arm identification in multi-armed bandits via successive rejects. 2017 IEEE Internat. Conf. Data Mining (ICDM) (IEEE, Los Alamitos, CA), 1147–1152.Google Scholar
- [32] (2020) Thompson sampling algorithms for mean-variance bandits. Proc. Machine Learn. Systems 2020:2645–2654.Google Scholar
- [33] (2014) Generalized risk-aversion in stochastic multi-armed bandits. Preprint, submitted May 5, https://arxiv.org/abs/1405.0833.Google Scholar

