Bandits with Global Convex Constraints and Objective

Shipra Agrawal
Shipra Agrawal
http://orcid.org/0000-0003-4486-3871
Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Nikhil R. Devanur
Nikhil R. Devanur
Microsoft Research, Redmond, Washington 98052
Search for more papers by this author

Shipra Agrawal

http://orcid.org/0000-0003-4486-3871

Industrial Engineering and Operations Research, Columbia University, New York, New York 10027;

Search for more papers by this author

Nikhil R. Devanur

Microsoft Research, Redmond, Washington 98052

Search for more papers by this author

Published Online:7 Aug 2019https://doi.org/10.1287/opre.2019.1840

References

Abbasi-Yadkori Y, Dávid P, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (MIT Press, Cambridge, MA), 2312–2320.Google Scholar
Abernethy JD, Bartlett PL, Hazan E (2011) Blackwell approachability and no-regret learning are equivalent. PMLR 19:27–46.Google Scholar
Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Proc. 26th Annual ACM-SIAM Sympos. Discrete Algorithms, SODA 2015 (SIAM, Philadelphia), 1405–1424.Crossref, Google Scholar
Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2/3):235–256.Crossref, Google Scholar
Azar Y, Felge U, Feldman M, Tennenholtz M (2014) Sequential decision making with vector outcomes. Proc. 5th Conf. Innovations Theoret. Comput. Sci. ITCS ’14 (ACM, New York), 195–206.Crossref, Google Scholar
Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2012) Dynamic pricing with limited supply. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 74–91.Crossref, Google Scholar
Badanidiyuru A, Kleinberg R, Singer Y (2012) Learning on a budget: Posted price mechanisms for online procurement. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 128–145.Crossref, Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 54th Annual IEEE Sympos. Foundations Comput. Sci., FOCS 2013 (ACM, New York), 207–216.Crossref, Google Scholar
Barz C, Waldmann KH (2007) Risk-sensitive capacity control in revenue management. Math. Methods Oper. Res. 65(3):565–579.Crossref, Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Chakrabarti D, Kumar R, Radlinski F, Upfal E (2009) Mortal multi-armed bandits. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 21 (MIT Press, Cambridge, MA), 273–280.Google Scholar
Devanur NR, Jain K, Sivan B, Wilkens CA (2011) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. Proc. 12th ACM Conf. Electronic Commerce. EC ’11 (ACM, New York), 29–38.Crossref, Google Scholar
Even-Dar E, Mannor S, Mansour Y (2010) Learning with global cost in stochastic environments. COLT 2010–23rd Conf. Learn. Theory, Haifa, Israel, 80–92.Google Scholar
Feldman J, Henzinger M, Korula N, Mirrokni VS, Stein C (2010) Online stochastic packing applied to display ad allocation. Proc. 18th Annual Eur. Conf. Algorithms: Part I. ESA’10 (Springer-Verlag, Berlin), 182–194.Crossref, Google Scholar
Feng Y, Xiao B (2008) A risk-sensitive model for managing perishable products. Oper. Res. 56(5):1305–1311.Link, Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2015) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. 3 95110.Crossref, Google Scholar
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Google Scholar
Gönsch J, Hassler M (2014) Optimizing the conditional value-at-risk in revenue management. Rev. Managerial Sci. 8(4):495–521.Crossref, Google Scholar
Grötschel M, Lovász L, Alexander S (1988) Geometric Algorithms and Combinatorial Optimization (Springer-Verlag, New York).Crossref, Google Scholar
Guha S, Munagala K (2007) Approximation algorithms for budgeted learning problems. Proc. 39th Annual ACM Sympos. Theory Comput. STOC ’07 (ACM, New York), 104–113.Crossref, Google Scholar
Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. STOC ’08 (ACM, New York), 681–690.Crossref, Google Scholar
Lancaster J (2003) The financial risk of airline revenue management. J. Revenue Pricing Management 2(2):158–165.Crossref, Google Scholar
Madani O, Lizotte DJ, Greiner R (2004) The budgeted multi-armed bandit problem. Learning Theory (Springer, New York), 643–645.Crossref, Google Scholar
Mannor S, Tsitsiklis JN, Jia YY (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10:569–590.Google Scholar
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
Pandey S, Olston C (2006) Handling advertisements of unknown quality in search advertising. Adv. Neural Inform. Processing Systems 19:1065–1072.Google Scholar
Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.Crossref, Google Scholar
Singh P (2011) Risk in revenue management. Informs Analytics Magazine (May/June), http://analytics-magazine.org/risk-in-revenue-management/.Google Scholar
Singla A, Krause A (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. Proc. 22nd Internat. Conf. World Wide Web. WWW ’13 (ACM, New York), 1167–1178.Crossref, Google Scholar
Slivkins A, Vaughan JW (2014) Online decision making in crowdsourcing markets: Theoretical challenges. ACM SIGecom Exchanges 12(2):4–23.Crossref, Google Scholar
Tran-Thanh L, Chapman A, Rogers A, Jennings NR (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. on Artificial Intelligence. AAAI’12, Toronto, Ontario, 1134–1140.Google Scholar
Tran-Thanh L, Chapman A, Munoz de Cote E, Rogers A, Jennings NR (2010) Epsilon-first policies for budget-limited multi-armed bandits. 24th AAAI Conf. Artificial Intelligence, Atlanta.Google Scholar
Weatherford LR (2004) EMSR versus EMSU: Revenue or utility? J. Revenue Pricing Management 3(3):277–284.Crossref, Google Scholar
Xiong H, Xie J, Deng X (2011) Risk-averse decision making in overbooking problem. J. Oper. Res. Soc. 62(9):1655–1665.Crossref, Google Scholar
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Machine Learn., Proc. 20th Internat. Conf., Washington, DC, 928–936.Google Scholar

Volume 67, Issue 5

September-October 2019

Pages ii-iv, 1209-1502

Article Information

Supplemental Material

Metrics

Information

Received:December 09, 2015
Accepted:November 01, 2018
Published Online:August 07, 2019

Cite as

Shipra Agrawal, Nikhil R. Devanur (2019) Bandits with Global Convex Constraints and Objective. Operations Research 67(5):1486-1502.

https://doi.org/10.1287/opre.2019.1840

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bandits with Global Convex Constraints and Objective

References

Volume 67, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News