Bandits with Global Convex Constraints and Objective
Published Online:7 Aug 2019https://doi.org/10.1287/opre.2019.1840
References
- (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (MIT Press, Cambridge, MA), 2312–2320.Google Scholar
- (2011) Blackwell approachability and no-regret learning are equivalent. PMLR 19:27–46.Google Scholar
- (2015) Fast algorithms for online stochastic convex programming. Proc. 26th Annual ACM-SIAM Sympos. Discrete Algorithms, SODA 2015 (SIAM, Philadelphia), 1405–1424.Crossref, Google Scholar
- (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
- (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2/3):235–256.Crossref, Google Scholar
- (2014) Sequential decision making with vector outcomes. Proc. 5th Conf. Innovations Theoret. Comput. Sci. ITCS ’14 (ACM, New York), 195–206.Crossref, Google Scholar
- (2012) Dynamic pricing with limited supply. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 74–91.Crossref, Google Scholar
- (2012) Learning on a budget: Posted price mechanisms for online procurement. Proc. 13th ACM Conf. Electronic Commerce. EC ’12 (ACM, New York), 128–145.Crossref, Google Scholar
- (2013) Bandits with knapsacks. 54th Annual IEEE Sympos. Foundations Comput. Sci., FOCS 2013 (ACM, New York), 207–216.Crossref, Google Scholar
- (2007) Risk-sensitive capacity control in revenue management. Math. Methods Oper. Res. 65(3):565–579.Crossref, Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
- (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2009) Mortal multi-armed bandits. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 21 (MIT Press, Cambridge, MA), 273–280.Google Scholar
- (2011) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. Proc. 12th ACM Conf. Electronic Commerce. EC ’11 (ACM, New York), 29–38.Crossref, Google Scholar
- (2010) Learning with global cost in stochastic environments. COLT 2010–23rd Conf. Learn. Theory, Haifa, Israel, 80–92.Google Scholar
- (2010) Online stochastic packing applied to display ad allocation. Proc. 18th Annual Eur. Conf. Algorithms: Part I. ESA’10 (Springer-Verlag, Berlin), 182–194.Crossref, Google Scholar
- (2008) A risk-sensitive model for managing perishable products. Oper. Res. 56(5):1305–1311.Link, Google Scholar
- (2015) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Google Scholar
- (1956) An algorithm for quadratic programming. Naval Res. Logist. 3 95110.Crossref, Google Scholar
- (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Google Scholar
- (2014) Optimizing the conditional value-at-risk in revenue management. Rev. Managerial Sci. 8(4):495–521.Crossref, Google Scholar
- (1988) Geometric Algorithms and Combinatorial Optimization (Springer-Verlag, New York).Crossref, Google Scholar
- (2007) Approximation algorithms for budgeted learning problems. Proc. 39th Annual ACM Sympos. Theory Comput. STOC ’07 (ACM, New York), 104–113.Crossref, Google Scholar
- (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. STOC ’08 (ACM, New York), 681–690.Crossref, Google Scholar
- (2003) The financial risk of airline revenue management. J. Revenue Pricing Management 2(2):158–165.Crossref, Google Scholar
- (2004) The budgeted multi-armed bandit problem. Learning Theory (Springer, New York), 643–645.Crossref, Google Scholar
- (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10:569–590.Google Scholar
- (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
- (2006) Handling advertisements of unknown quality in search advertising. Adv. Neural Inform. Processing Systems 19:1065–1072.Google Scholar
- (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.Crossref, Google Scholar
- (2011) Risk in revenue management. Informs Analytics Magazine (May/June), http://analytics-magazine.org/risk-in-revenue-management/.Google Scholar
- (2013) Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. Proc. 22nd Internat. Conf. World Wide Web. WWW ’13 (ACM, New York), 1167–1178.Crossref, Google Scholar
- (2014) Online decision making in crowdsourcing markets: Theoretical challenges. ACM SIGecom Exchanges 12(2):4–23.Crossref, Google Scholar
- (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. on Artificial Intelligence. AAAI’12, Toronto, Ontario, 1134–1140.Google Scholar
- (2010) Epsilon-first policies for budget-limited multi-armed bandits. 24th AAAI Conf. Artificial Intelligence, Atlanta.Google Scholar
- (2004) EMSR versus EMSU: Revenue or utility? J. Revenue Pricing Management 3(3):277–284.Crossref, Google Scholar
- (2011) Risk-averse decision making in overbooking problem. J. Oper. Res. Soc. 62(9):1655–1665.Crossref, Google Scholar
- (2003) Online convex programming and generalized infinitesimal gradient ascent. Machine Learn., Proc. 20th Internat. Conf., Washington, DC, 928–936.Google Scholar

