Bandits with Global Convex Constraints and Objective

Published Online:https://doi.org/10.1287/opre.2019.1840

References

  • Abbasi-Yadkori Y, Dávid P, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (MIT Press, Cambridge, MA), 2312–2320.Google Scholar
  • Abernethy JD, Bartlett PL, Hazan E (2011) Blackwell approachability and no-regret learning are equivalent. PMLR 19:27–46.Google Scholar
  • Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Proc. 26th Annual ACM-SIAM Sympos. Discrete Algorithms, SODA 2015 (SIAM, Philadelphia), 1405–1424.CrossrefGoogle Scholar
  • Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.LinkGoogle Scholar
  • Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2/3):235–256.CrossrefGoogle Scholar
  • Azar Y, Felge U, Feldman M, Tennenholtz M (2014) Sequential decision making with vector outcomes. Proc. 5th Conf. Innovations Theoret. Comput. Sci. ITCS ’14 (ACM, New York), 195–206.CrossrefGoogle Scholar
  • Babaioff M, Dughmi S, Kleinberg R, Slivkins A (2012) Dynamic pricing with limited supply. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 74–91.CrossrefGoogle Scholar
  • Badanidiyuru A, Kleinberg R, Singer Y (2012) Learning on a budget: Posted price mechanisms for online procurement. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 128–145.CrossrefGoogle Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 54th Annual IEEE Sympos. Foundations Comput. Sci., FOCS 2013 (ACM, New York), 207–216.CrossrefGoogle Scholar
  • Barz C, Waldmann KH (2007) Risk-sensitive capacity control in revenue management. Math. Methods Oper. Res. 65(3):565–579.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Chakrabarti D, Kumar R, Radlinski F, Upfal E (2009) Mortal multi-armed bandits. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 21 (MIT Press, Cambridge, MA), 273–280.Google Scholar
  • Devanur NR, Jain K, Sivan B, Wilkens CA (2011) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. Proc. 12th ACM Conf. Electronic Commerce. EC ’11 (ACM, New York), 29–38.CrossrefGoogle Scholar
  • Even-Dar E, Mannor S, Mansour Y (2010) Learning with global cost in stochastic environments. COLT 2010–23rd Conf. Learn. Theory, Haifa, Israel, 80–92.Google Scholar
  • Feldman J, Henzinger M, Korula N, Mirrokni VS, Stein C (2010) Online stochastic packing applied to display ad allocation. Proc. 18th Annual Eur. Conf. Algorithms: Part I. ESA’10 (Springer-Verlag, Berlin), 182–194.CrossrefGoogle Scholar
  • Feng Y, Xiao B (2008) A risk-sensitive model for managing perishable products. Oper. Res. 56(5):1305–1311.LinkGoogle Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2015) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Google Scholar
  • Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. 3 95110.CrossrefGoogle Scholar
  • Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Google Scholar
  • Gönsch J, Hassler M (2014) Optimizing the conditional value-at-risk in revenue management. Rev. Managerial Sci. 8(4):495–521.CrossrefGoogle Scholar
  • Grötschel M, Lovász L, Alexander S (1988) Geometric Algorithms and Combinatorial Optimization (Springer-Verlag, New York).CrossrefGoogle Scholar
  • Guha S, Munagala K (2007) Approximation algorithms for budgeted learning problems. Proc. 39th Annual ACM Sympos. Theory Comput. STOC ’07 (ACM, New York), 104–113.CrossrefGoogle Scholar
  • Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. STOC ’08 (ACM, New York), 681–690.CrossrefGoogle Scholar
  • Lancaster J (2003) The financial risk of airline revenue management. J. Revenue Pricing Management 2(2):158–165.CrossrefGoogle Scholar
  • Madani O, Lizotte DJ, Greiner R (2004) The budgeted multi-armed bandit problem. Learning Theory (Springer, New York), 643–645.CrossrefGoogle Scholar
  • Mannor S, Tsitsiklis JN, Jia YY (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10:569–590.Google Scholar
  • Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.CrossrefGoogle Scholar
  • Pandey S, Olston C (2006) Handling advertisements of unknown quality in search advertising. Adv. Neural Inform. Processing Systems 19:1065–1072.Google Scholar
  • Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.CrossrefGoogle Scholar
  • Singh P (2011) Risk in revenue management. Informs Analytics Magazine (May/June), http://analytics-magazine.org/risk-in-revenue-management/.Google Scholar
  • Singla A, Krause A (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. Proc. 22nd Internat. Conf. World Wide Web. WWW ’13 (ACM, New York), 1167–1178.CrossrefGoogle Scholar
  • Slivkins A, Vaughan JW (2014) Online decision making in crowdsourcing markets: Theoretical challenges. ACM SIGecom Exchanges 12(2):4–23.CrossrefGoogle Scholar
  • Tran-Thanh L, Chapman A, Rogers A, Jennings NR (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. on Artificial Intelligence. AAAI’12, Toronto, Ontario, 1134–1140.Google Scholar
  • Tran-Thanh L, Chapman A, Munoz de Cote E, Rogers A, Jennings NR (2010) Epsilon-first policies for budget-limited multi-armed bandits. 24th AAAI Conf. Artificial Intelligence, Atlanta.Google Scholar
  • Weatherford LR (2004) EMSR versus EMSU: Revenue or utility? J. Revenue Pricing Management 3(3):277–284.CrossrefGoogle Scholar
  • Xiong H, Xie J, Deng X (2011) Risk-averse decision making in overbooking problem. J. Oper. Res. Soc. 62(9):1655–1665.CrossrefGoogle Scholar
  • Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Machine Learn., Proc. 20th Internat. Conf., Washington, DC, 928–936.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.