Bandits with Global Convex Constraints and Objective

Shipra Agrawal
Shipra Agrawal
http://orcid.org/0000-0003-4486-3871
Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Nikhil R. Devanur
Nikhil R. Devanur
Microsoft Research, Redmond, Washington 98052
Search for more papers by this author

Shipra Agrawal

http://orcid.org/0000-0003-4486-3871

Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;

Search for more papers by this author

Nikhil R. Devanur

Microsoft Research, Redmond, Washington 98052

Search for more papers by this author

Published Online:7 Aug 2019https://doi.org/10.1287/opre.2019.1840

Abstract

We consider a very general model for managing the exploration–exploitation trade-off, which allows global convex constraints and concave objective on the aggregate decisions over time in addition to the customary limitation on the time horizon. This model provides a natural framework to study many sequential decision-making problems with long-term convex constraints and concave utility and subsumes the classic multiarmed bandit (MAB) model and the bandits with knapsacks problem as special cases. We demonstrate that a natural extension of the upper confidence bound family of algorithms for MAB provides a polynomial time algorithm with near-optimal regret guarantees for this substantially more general model. We also provide computationally more efficient algorithms by establishing interesting connections between this problem and other well-studied problems/algorithms, such as the Blackwell approachability problem, online convex optimization, and the Frank–Wolfe technique for convex optimization. We give several concrete examples of applications, particularly in risk-sensitive revenue management under unknown demand distributions, in which this more general bandit model of sequential decision making allows for richer formulations and more efficient solutions of the problem.

Volume 67, Issue 5

September-October 2019

Pages ii-iv, 1209-1502

Article Information

Supplemental Material

Metrics

Information

Received:December 09, 2015
Accepted:November 01, 2018
Published Online:August 07, 2019

Cite as

Shipra Agrawal, Nikhil R. Devanur (2019) Bandits with Global Convex Constraints and Objective. Operations Research 67(5):1486-1502.

https://doi.org/10.1287/opre.2019.1840

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bandits with Global Convex Constraints and Objective

Abstract

Volume 67, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News