Differentially Private and Budget-Limited Bandit Learning over Matroids
Published Online:16 Mar 2020https://doi.org/10.1287/ijoc.2019.0903
References
- (2014) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. Econom. Comput. (ACM Press, New York), 989–1006.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2013) Bandits with knapsacks. Proc. IEEE 54th Annual Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
- (2000) Analytical approximations for real values of the Lambert W-function. Math. Comput. Simulation 53(1):95–103.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2014) Stochastic bandits with side observations on networks. Proc. 2014 ACM Internat. Conf. Measurement Model. Comput. Systems (ACM Press, New York), 289–300.Google Scholar
- (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6):1740–1766.Crossref, Google Scholar
- (2011) Private and continual release of statistics. ACM Trans. Inform. Systems Security 14(3):Article 26.Crossref, Google Scholar
- (2016a) Pure exploration of multi-armed bandit under matroid constraints. Proc. 29th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 647–669.Google Scholar
- (2013) Combinatorial multi-armed bandit: General framework and applications. Proc. 30th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 151–159.Google Scholar
- (2016b) Combinatorial multi-armed bandit with general reward functions. Proc. 30th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1659–1667.Google Scholar
- (2015a) Bandits with budgets: Regret lower bounds and optimal algorithms. Performance Evaluation Rev. 43(1):245–257.Crossref, Google Scholar
- (2015b) Combinatorial bandits revisited. Proc. 29th Conf. Neural Information Processing Systems (Curran Associates, Red Hook, NY), 2116–2124.Google Scholar
- (2009) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
- (2013) Multi-armed bandit with budget constraint and variable costs. Proc. 27th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 232–238.Google Scholar
- (2014) The algorithmic foundations of differential privacy. Foundations Trends Theoret. Comput. Sci. 9(3–4):211–407.Google Scholar
- (2015) Low regret bounds for bandits with knapsacks. Preprint, submitted October 7, https://arxiv.org/abs/1510.01800.Google Scholar
- (2009) Encyclopedia of Optimization (Springer, New York).Crossref, Google Scholar
- (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.Crossref, Google Scholar
- (2014) Measure, Probability, and Mathematical Finance: A Problem-Oriented Approach (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 24th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 359–376.Google Scholar
- (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Crossref, Google Scholar
- IBM CPLEX (2009) IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/developerworks/downloads/ws/ilogcplex/.Google Scholar
- (2012) Differentially private online learning. Proc. 25th Annual Conf. Learn. Theory (PMLR, Brookline, MA), 24.1–24.34.Google Scholar
- (2016) Enabling privacy-preserving incentives for mobile crowd sensing systems. Proc. IEEE 36th Internat. Conf. Distributed Comput. Systems (IEEE, Piscataway, NJ), 344–353.Google Scholar
- (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Proc. 32nd Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 1152–1161.Google Scholar
- (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (PMLR, Brookline, MA), 535–543.Google Scholar
- (2014) Matroid bandits: Fast combinatorial optimization with learning. Proc. 30th Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 420–429.Google Scholar
- (2009) Privacy integrated queries: An extensible platform for privacy-preserving data analysis. Proc. 2009 ACM SIGMOD Internat. Conf. Management Data (ACM Press, New York), 19–30.Crossref, Google Scholar
- (1979) Combinatorial optimization with rational objective functions. Math. Oper. Res. 4(4):414–424.Link, Google Scholar
- (2015) (Nearly) optimal differentially private stochastic multi-arm bandits. Proc. 31st Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 592–601.Google Scholar
- (2005) Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2011) Matroid Theory (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- (2016) An optimal algorithm for stochastic matroid bandit optimization. Proc. 2016 Internat. Conf. Autonomous Agents Multiagent Systems (ACM Press, New York), 548–556.Google Scholar
- (2018) Stochastic online shortest path routing: The value of feedback. IEEE Trans. Automatic Control 63(4):915–930.Crossref, Google Scholar
- (2015) Online learning methods for networking. Foundations Trends Networking 8(4):281–409.Crossref, Google Scholar
- (2013) (Nearly) optimal algorithms for private online learning in full-information and bandit settings. Proc. 27th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2733–2741.Google Scholar
- (2016) Algorithms for differentially private multi-armed bandits. Proc. 30th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2087–2093.Google Scholar
- (2012) Knapsack based optimal policies for budget-limited multi-armed bandits. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1134–1140.Google Scholar
- (2010) Epsilon-first policies for budget-limited multi-armed bandits. Proc. 24th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1211–1216.Google Scholar
- (2017) Model-independent online learning for influence maximization. Proc. 34th Internat. Conf. Machine Learn. (PMLR, Brookline, MA), 3530–3539.Google Scholar
- (2017) Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1161–1171.Google Scholar
- (2017) Online influence maximization under independent cascade model with semi-bandit feedback. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3026–3036.Google Scholar
- (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Proc. 29th Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 433–441.Google Scholar
- (2015) Thompson sampling for budgeted multi-armed bandits. Proc. 24th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3960–3966.Google Scholar
- (2016) Budgeted multi-armed bandits with multiple plays. Proc. 25th Internat. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2210–2216.Google Scholar
- (2010) Time-delay Systems: Analysis and Control Using the Lambert W Function (World Scientific, Singapore).Crossref, Google Scholar

