Differentially Private and Budget-Limited Bandit Learning over Matroids

Kai Han
Kai Han
School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China;
Search for more papers by this author
,
Yuntian He
Yuntian He
School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China;
Search for more papers by this author
,
Alex X. Liu
Alex X. Liu
Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan 48824;
Search for more papers by this author
,
Shaojie Tang
Corresponding Author
Shaojie Tang
https://orcid.org/0000-0001-9261-5210
Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080;
Search for more papers by this author
,
He Huang
He Huang
School of Computer Science and Technology, Soochow University, Suzhou 215006, People’s Republic of China
Search for more papers by this author

School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China;

Search for more papers by this author

Yuntian He

School of Computer Science and Technology/Suzhou Institute for Advanced Study, University of Science and Technology of China, Hefei, Anhui, 23006, People’s Republic of China;

Search for more papers by this author

Alex X. Liu

Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan 48824;

Search for more papers by this author

Shaojie Tang

Corresponding Author

Shaojie Tang

https://orcid.org/0000-0001-9261-5210

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, Texas 75080;

Search for more papers by this author

He Huang

School of Computer Science and Technology, Soochow University, Suzhou 215006, People’s Republic of China

Search for more papers by this author

Published Online:16 Mar 2020https://doi.org/10.1287/ijoc.2019.0903

References

Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. Econom. Comput. (ACM Press, New York), 989–1006.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. Proc. IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
Barry D, Parlange JY, Li L, Prommer H, Cunningham C, Stagnitti F (2000) Analytical approximations for real values of the Lambert W-function. Math. Comput. Simulation 53(1):95–103.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Buccapatnam S, Eryilmaz A, Shroff NB (2014) Stochastic bandits with side observations on networks. Proc. 2014 ACM Internat. Conf. Measurement Model. Comput. Systems (ACM Press, New York), 289–300.Google Scholar
Calinescu G, Chekuri C, Pál M, Vondrák J (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6):1740–1766.Crossref, Google Scholar
Chan THH, Shi E, Song D (2011) Private and continual release of statistics. ACM Trans. Inform. Systems Security 14(3):Article 26.Crossref, Google Scholar
Chen L, Gupta A, Li J (2016a) Pure exploration of multi-armed bandit under matroid constraints. Proc. 29th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 647–669.Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Proc. 30th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 151–159.Google Scholar
Chen W, Hu W, Li F, Li J, Liu Y, Lu P (2016b) Combinatorial multi-armed bandit with general reward functions. Proc. 30th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1659–1667.Google Scholar
Combes R, Jiang C, Srikant R (2015a) Bandits with budgets: Regret lower bounds and optimal algorithms. Performance Evaluation Rev. 43(1):245–257.Crossref, Google Scholar
Combes R, Shahi MSTM, Proutiere A, Lelarge M (2015b) Combinatorial bandits revisited. Proc. 29th Conf. Neural Information Processing Systems (Curran Associates, Red Hook, NY), 2116–2124.Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
Ding W, Qin T, Zhang XD, Liu TY (2013) Multi-armed bandit with budget constraint and variable costs. Proc. 27th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 232–238.Google Scholar
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Foundations Trends Theoret. Comput. Sci. 9(3–4):211–407.Google Scholar
Flajolet A, Jaillet P (2015) Low regret bounds for bandits with knapsacks. Preprint, submitted October 7, https://arxiv.org/abs/1510.01800.Google Scholar
Floudas CA, Pardalos PM (2009) Encyclopedia of Optimization (Springer, New York).Crossref, Google Scholar
Gai Y, Krishnamachari B, Jain R (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.Crossref, Google Scholar
Gan G, Ma C, Xie H (2014) Measure, Probability, and Mathematical Finance: A Problem-Oriented Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 359–376.Google Scholar
Grimmett G, Stirzaker D (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Crossref, Google Scholar
IBM CPLEX (2009) IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/developerworks/downloads/ws/ilogcplex/.Google Scholar
Jain P, Kothari P, Thakurta A (2012) Differentially private online learning. Proc. 25th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 24.1–24.34.Google Scholar
Jin H, Su L, Ding B, Nahrstedt K, Borisov N (2016) Enabling privacy-preserving incentives for mobile crowd sensing systems. Proc. IEEE 36th Internat. Conf. Distributed Comput. Systems (IEEE, Piscataway, NJ), 344–353.Google Scholar
Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Proc. 32nd Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 1152–1161.Google Scholar
Kveton B, Wen Z, Ashkan A, Szepesvari C (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (PMLR, Brookline, MA), 535–543.Google Scholar
Kveton B, Wen Z, Ashkan A, Eydgahi H, Eriksson B (2014) Matroid bandits: Fast combinatorial optimization with learning. Proc. 30th Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 420–429.Google Scholar
McSherry FD (2009) Privacy integrated queries: An extensible platform for privacy-preserving data analysis. Proc. 2009 ACM SIGMOD Internat. Conf. Management Data (ACM Press, New York), 19–30.Crossref, Google Scholar
Megiddo N (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.Link, Google Scholar
Mishra N, Thakurta A (2015) (Nearly) optimal differentially private stochastic multi-arm bandits. Proc. 31st Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 592–601.Google Scholar
Mitzenmacher M, Upfal E (2005) Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Oxley J (2011) Matroid Theory (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Talebi MS, Proutiere A (2016) An optimal algorithm for stochastic matroid bandit optimization. Proc. 2016 Internat. Conf. Autonomous Agents Multiagent Systems (ACM Press, New York), 548–556.Google Scholar
Talebi MS, Zou Z, Combes R, Proutière A, Johansson M (2018) Stochastic online shortest path routing: The value of feedback. IEEE Trans. Automatic Control 63(4):915–930.Crossref, Google Scholar
Tekin C, Liu M (2015) Online learning methods for networking. Foundations Trends Networking 8(4):281–409.Crossref, Google Scholar
Thakurta AG, Smith A (2013) (Nearly) optimal algorithms for private online learning in full-information and bandit settings. Proc. 27th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2733–2741.Google Scholar
Tossou ACY, Dimitrakakis C (2016) Algorithms for differentially private multi-armed bandits. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2087–2093.Google Scholar
Tran-Thanh L, Chapman A, Rogers A, Jennings NR (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1134–1140.Google Scholar
Tran-Thanh L, Chapman A, de Cote EM, Rogers A, Jennings NR (2010) Epsilon-first policies for budget-limited multi-armed bandits. Proc. 24th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1211–1216.Google Scholar
Vaswani S, Kveton B, Wen Z, Ghavamzadeh M, Lakshmanan LVS, Schmidt M (2017) Model-independent online learning for influence maximization. Proc. 34th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 3530–3539.Google Scholar
Wang Q, Chen W (2017) Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1161–1171.Google Scholar
Wen Z, Kveton B, Valko M, Vaswani S (2017) Online influence maximization under independent cascade model with semi-bandit feedback. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3026–3036.Google Scholar
Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Proc. 29th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 433–441.Google Scholar
Xia Y, Li H, Qin T, Yu N, Liu TY (2015) Thompson sampling for budgeted multi-armed bandits. Proc. 24th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3960–3966.Google Scholar
Xia Y, Qin T, Ma W, Yu N, Liu T (2016) Budgeted multi-armed bandits with multiple plays. Proc. 25th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2210–2216.Google Scholar
Yi S, Nelson PW, Ulsoy AG (2010) Time-delay Systems: Analysis and Control Using the Lambert W Function (World Scientific, Singapore).Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 32, Issue 3

Summer 2020

Pages 531-853, C2

Article Information

Supplemental Material

Metrics

Information

Received:March 10, 2018
Accepted:April 08, 2019
Published Online:March 16, 2020

Cite as

Kai Han, Yuntian He, Alex X. Liu, Shaojie Tang, He Huang (2020) Differentially Private and Budget-Limited Bandit Learning over Matroids. INFORMS Journal on Computing 32(3):790-804.

https://doi.org/10.1287/ijoc.2019.0903

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Differentially Private and Budget-Limited Bandit Learning over Matroids

References

Volume 32, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News