Differentially Private and Budget-Limited Bandit Learning over Matroids

Published Online:https://doi.org/10.1287/ijoc.2019.0903

References

  • Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. Econom. Comput. (ACM Press, New York), 989–1006.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. Proc. IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
  • Barry D, Parlange JY, Li L, Prommer H, Cunningham C, Stagnitti F (2000) Analytical approximations for real values of the Lambert W-function. Math. Comput. Simulation 53(1):95–103.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Buccapatnam S, Eryilmaz A, Shroff NB (2014) Stochastic bandits with side observations on networks. Proc. 2014 ACM Internat. Conf. Measurement Model. Comput. Systems (ACM Press, New York), 289–300.Google Scholar
  • Calinescu G, Chekuri C, Pál M, Vondrák J (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6):1740–1766.CrossrefGoogle Scholar
  • Chan THH, Shi E, Song D (2011) Private and continual release of statistics. ACM Trans. Inform. Systems Security 14(3):Article 26.CrossrefGoogle Scholar
  • Chen L, Gupta A, Li J (2016a) Pure exploration of multi-armed bandit under matroid constraints. Proc. 29th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 647–669.Google Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Proc. 30th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 151–159.Google Scholar
  • Chen W, Hu W, Li F, Li J, Liu Y, Lu P (2016b) Combinatorial multi-armed bandit with general reward functions. Proc. 30th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1659–1667.Google Scholar
  • Combes R, Jiang C, Srikant R (2015a) Bandits with budgets: Regret lower bounds and optimal algorithms. Performance Evaluation Rev. 43(1):245–257.CrossrefGoogle Scholar
  • Combes R, Shahi MSTM, Proutiere A, Lelarge M (2015b) Combinatorial bandits revisited. Proc. 29th Conf. Neural Information Processing Systems (Curran Associates, Red Hook, NY), 2116–2124.Google Scholar
  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
  • Ding W, Qin T, Zhang XD, Liu TY (2013) Multi-armed bandit with budget constraint and variable costs. Proc. 27th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 232–238.Google Scholar
  • Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Foundations Trends Theoret. Comput. Sci. 9(3–4):211–407.Google Scholar
  • Flajolet A, Jaillet P (2015) Low regret bounds for bandits with knapsacks. Preprint, submitted October 7, https://arxiv.org/abs/1510.01800.Google Scholar
  • Floudas CA, Pardalos PM (2009) Encyclopedia of Optimization (Springer, New York).CrossrefGoogle Scholar
  • Gai Y, Krishnamachari B, Jain R (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.CrossrefGoogle Scholar
  • Gan G, Ma C, Xie H (2014) Measure, Probability, and Mathematical Finance: A Problem-Oriented Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 359–376.Google Scholar
  • Grimmett G, Stirzaker D (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.CrossrefGoogle Scholar
  • IBM CPLEX (2009) IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/developerworks/downloads/ws/ilogcplex/.Google Scholar
  • Jain P, Kothari P, Thakurta A (2012) Differentially private online learning. Proc. 25th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 24.1–24.34.Google Scholar
  • Jin H, Su L, Ding B, Nahrstedt K, Borisov N (2016) Enabling privacy-preserving incentives for mobile crowd sensing systems. Proc. IEEE 36th Internat. Conf. Distributed Comput. Systems (IEEE, Piscataway, NJ), 344–353.Google Scholar
  • Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Proc. 32nd Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 1152–1161.Google Scholar
  • Kveton B, Wen Z, Ashkan A, Szepesvari C (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (PMLR, Brookline, MA), 535–543.Google Scholar
  • Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B (2014) Matroid bandits: Fast combinatorial optimization with learning. Proc. 30th Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 420–429.Google Scholar
  • McSherry FD (2009) Privacy integrated queries: An extensible platform for privacy-preserving data analysis. Proc. 2009 ACM SIGMOD Internat. Conf. Management Data (ACM Press, New York), 19–30.CrossrefGoogle Scholar
  • Megiddo N (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.LinkGoogle Scholar
  • Mishra N, Thakurta A (2015) (Nearly) optimal differentially private stochastic multi-arm bandits. Proc. 31st Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 592–601.Google Scholar
  • Mitzenmacher M, Upfal E (2005) Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Oxley J (2011) Matroid Theory (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • Talebi MS, Proutiere A (2016) An optimal algorithm for stochastic matroid bandit optimization. Proc. 2016 Internat. Conf. Autonomous Agents Multiagent Systems (ACM Press, New York), 548–556.Google Scholar
  • Talebi MS, Zou Z, Combes R, Proutière A, Johansson M (2018) Stochastic online shortest path routing: The value of feedback. IEEE Trans. Automatic Control 63(4):915–930.CrossrefGoogle Scholar
  • Tekin C, Liu M (2015) Online learning methods for networking. Foundations Trends Networking 8(4):281–409.CrossrefGoogle Scholar
  • Thakurta AG, Smith A (2013) (Nearly) optimal algorithms for private online learning in full-information and bandit settings. Proc. 27th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2733–2741.Google Scholar
  • Tossou ACY, Dimitrakakis C (2016) Algorithms for differentially private multi-armed bandits. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2087–2093.Google Scholar
  • Tran-Thanh L, Chapman A, Rogers A, Jennings NR (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1134–1140.Google Scholar
  • Tran-Thanh L, Chapman A, de Cote EM, Rogers A, Jennings NR (2010) Epsilon-first policies for budget-limited multi-armed bandits. Proc. 24th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1211–1216.Google Scholar
  • Vaswani S, Kveton B, Wen Z, Ghavamzadeh M, Lakshmanan LVS, Schmidt M (2017) Model-independent online learning for influence maximization. Proc. 34th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 3530–3539.Google Scholar
  • Wang Q, Chen W (2017) Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1161–1171.Google Scholar
  • Wen Z, Kveton B, Valko M, Vaswani S (2017) Online influence maximization under independent cascade model with semi-bandit feedback. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3026–3036.Google Scholar
  • Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Proc. 29th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 433–441.Google Scholar
  • Xia Y, Li H, Qin T, Yu N, Liu TY (2015) Thompson sampling for budgeted multi-armed bandits. Proc. 24th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3960–3966.Google Scholar
  • Xia Y, Qin T, Ma W, Yu N, Liu T (2016) Budgeted multi-armed bandits with multiple plays. Proc. 25th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2210–2216.Google Scholar
  • Yi S, Nelson PW, Ulsoy AG (2010) Time-delay Systems: Analysis and Control Using the Lambert W Function (World Scientific, Singapore).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.