Optimal Learning for Structured Bandits
Published Online:16 Aug 2023https://doi.org/10.1287/mnsc.2020.02108
References
- (1989) Asymptotically efficient adaptive allocation schemes for controlled iid processes: Finite parameter space. IEEE Trans. Automat. Control 34(3):258–267.Crossref, Google Scholar
- (2009) Set-Valued Analysis (Springer Science & Business Media, New York).Crossref, Google Scholar
- (2019) Contextual bandits with cross-learning. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inf. Process. Syst. (Curran Associates, Inc., Red Hook, NY), 9676–9685.Google Scholar
- (1982) Non-Linear Parametric Optimization (Springer, Wiesbaden, Germany).Crossref, Google Scholar
- (2002) A Course in Convexity, vol. 54 (American Mathematical Society, Providence, RI).Crossref, Google Scholar
- (1997) Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity (Dover Publications, Mineola, NY).Google Scholar
- (2009) Convex Optimization Theory (Athena Scientific, Belmont, MA).Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends. Machine Learning 5(1):1–122.Crossref, Google Scholar
- (2019) Multi-scale online learning: Theory and applications to online auctions and pricing. J. Mach. Learn. Res. 20:1–37. Google Scholar
- (2013) Kullback-leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41(3):1516–1541.Crossref, Google Scholar
- (2017) Relative entropy optimization and its applications. Math. Programming 161(1–2):1–32.Crossref, Google Scholar
- (2017) Minimal exploration in structured stochastic bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inf. Process. Syst. (Curran Associates, Inc., Red Hook, NY), 1763–1771.Google Scholar
- (2014) Unimodal bandits: Regret lower bounds and optimal algorithms. Internat. Conf. Machine Learn., 521–529.Google Scholar
- (2018) Perspective functions: Properties, constructions, and examples. Set-Valued Var. Anal. 26(2):247–264.Crossref, Google Scholar
- (2012) Elements of Information Theory (John Wiley & Sons, New York).Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Internat. Conf. Algorithmic Learn. Theory, 355–366.Google Scholar
- (2010) Parametric bandits: The generalized linear case. NIPS 23:586–594.Google Scholar
- (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory, 359–376.Google Scholar
- (2019) Dynamic incentive-aware learning: Robust pricing in contextual auctions. Adv. Neural Inf. Process. Syst. (NIPS, San Diego), 9756–9766.Google Scholar
- (1997) Asymptotically efficient adaptive choice of control laws incontrolled Markov chains. SIAM J. Control Optim. 35(3):715–743.Crossref, Google Scholar
- (2021) Multi-armed bandits with correlated arms. IEEE Trans. Inform. Theory 67(10):6711–6732.Crossref, Google Scholar
- (2020) A unified approach to translate classical bandit algorithms to the structured bandit setting. IEEE J. Selected Areas Inform. Theory 1(3):840–853.Crossref, Google Scholar
- (1994) Probability inequalities for sums of bounded random variables. The Collected Works of Wassily Hoeffding (Springer, New York), 409–426.Crossref, Google Scholar
- (2020) Crush optimism with pessimism: Structured bandits beyond asymptotic optimality. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inf. Process. Syst. (Curran Associates, Inc., Red Hook, NY), 6366–6376.Google Scholar
- (2012) Thompson sampling: An asymptotically optimal finite-time analysis. International Conference on Algorithmic Learning Theory (Springer, New York), 199–213.Google Scholar
- (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2014) Bounded regret for finite-armed structured bandits. Adv. Neural Inf. Process. Syst. 27:550–558.Google Scholar
- (2017) The end of optimism? An asymptotic analysis of finite-armed linear bandits. Proceedings of Machine Learning Research, vol. 54 (PMLR, Fort Lauderdale, FL), 728–737Google Scholar
- (2014) Lipschitz bandits: Regret lower bound and optimal algorithms. Conf. Learn. Theory, Proceedings of Machine Learning Research Series, vol. 35 (PMLR, New York), 975–999.Google Scholar
- (2018) Contextual pricing for Lipschitz buyers. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inf. Process. Syst. (Curran Associates, Inc., Red Hook, NY), 5643–5651.Google Scholar
- (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automat. Control. 54(12):2787–2802.Crossref, Google Scholar
- (2004) Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Amsterdam).Crossref, Google Scholar
- (2020) Online learning via offline greedy: Applications in market design and optimization. Preprint, submitted May 29, http://dx.doi.org/10.2139/ssrn.3613756.Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.Link, Google Scholar
- (2011) Contextual bandits with similarity information. Proc. 24th Annual Conf. Learn. Theory, 679–702.Google Scholar
- (2008) An online algorithm for maximizing submodular functions. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inf. Process. Syst. (Curran Associates, Inc., Red Hook, NY), 1577–1584.Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- (2020) One sample stochastic frank-wolfe. Internat. Conf. Artificial Intelligence Statist., 4012–4023.Google Scholar

