Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

Published Online:https://doi.org/10.1287/moor.2020.1111

References

  • [1] Agarwal A , Bartlett P , Ravikumar P , Wainwright M (2012) Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Trans. Inform. Theory 58(5):3235–3249.CrossrefGoogle Scholar
  • [2] Balcan M , Blum A (2006) On a theory of learning with similarity functions. ICML, 73–80.Google Scholar
  • [3] Balcan MF , Feldman V (2013) Statistical active learning algorithms. NIPS, 1295–1303.Google Scholar
  • [4] Balcan M , Blum A , Fine S , Mansour Y (2012) Distributed learning, communication complexity and privacy. COLT, 26.1–26.22.Google Scholar
  • [5] Ball K , Carlen E , Lieb E (1994) Sharp uniform convexity and smoothness inequalities for trace norms. Inventiones Mathematicae 115(1):463–482.Google Scholar
  • [6] Bassily R , Smith A , Thakurta A (2014) Private empirical risk minimization: Efficient algorithms and tight error bounds. FOCS, 464–473.Google Scholar
  • [7] Bassily R , Nissim K , Smith AD , Steinke T , Stemmer U , Ullman J (2016) Algorithmic stability for adaptive data analysis. Proc. 48th Annual ACM Sympos. Theory Comput. (ACM, New York), 1046–1059.Google Scholar
  • [8] Belloni A , Liang T , Narayanan H , Rakhlin A (2015) Escaping the local minima via simulated annealing: Optimization of approximately convex functions. Grunwald P, Hazan E, Kale S, eds. Proc. Machine Learn. Res. (Paris), 240–265.Google Scholar
  • [9] Ben-David S , Dichterman E (1998) Learning with restricted focus of attention. J. Comput. System Sci. 56(3):277–298.CrossrefGoogle Scholar
  • [10] Ben-Tal A , Nemirovski A (2013) Lectures on modern convex optimization. Accessed July 2016, http://www2.isye.gatech.edu/∼nemirovs/.Google Scholar
  • [11] Bertsimas D , Vempala S (2004) Solving convex programs by random walks. J. ACM 51(4):540–556.CrossrefGoogle Scholar
  • [12] Blum A , Dwork C , McSherry F , Nissim K (2005) Practical privacy: The SuLQ framework. Li C, ed. Proc. PODS (ACM, New York), 128–138.Google Scholar
  • [13] Blum A , Frieze A , Kannan R , Vempala S (1997) A polynomial time algorithm for learning noisy linear threshold functions. Algorithmica 22(1/2):35–52.Google Scholar
  • [14] Blum A , Furst M , Jackson J , Kearns M , Mansour Y , Rudich S (1994) Weakly learning DNF and characterizing statistical query learning using Fourier analysis. Proc. STOC, 253–262.Google Scholar
  • [15] Braun G , Guzmán C , Pokutta S (2017) Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Inform. Theory 63(7):4709–4724.Google Scholar
  • [16] Bresler G , Gamarnik D , Shah D (2014) Structure learning of antiferromagnetic ising models. NIPS, 2852–2860.Google Scholar
  • [17] Bylander T (1994) Learning linear threshold functions in the presence of classification noise. Proc. COLT, 340–347.Google Scholar
  • [18] Cesa-Bianchi N , Conconi A , Gentile C (2004) On the generalization ability of on-line learning algorithms. IEEE Trans. Inform. Theory 50(9):2050–2057.CrossrefGoogle Scholar
  • [19] Chaudhuri K , Monteleoni C , Sarwate AD (2011) Differentially private empirical risk minimization. J. Machine Learn. Res. 12:1069–1109.Google Scholar
  • [20] Chu C , Kim S , Lin Y , Yu Y , Bradski G , Ng A , Olukotun K (2006) Map-reduce for machine learning on multicore. Proc. NIPS, 281–288.Google Scholar
  • [21] Coja-Oghlan A , Cooper C , Frieze A (2010) An efficient sparse regularity concept. SIAM J. Discrete Math. 23(4):2000–2034.CrossrefGoogle Scholar
  • [22] Dasgupta S , Kalai AT , Monteleoni C (2009) Analysis of perceptron-based active learning. J. Machine Learn. Res. 10:281–299.Google Scholar
  • [23] d’Aspremont A (2008) Smooth optimization with approximate gradient. SIAM J. Optim. 19(3):1171–1183.CrossrefGoogle Scholar
  • [24] Devolder O , Glineur F , Nesterov Y (2013) First-order methods with inexact oracle: The strongly convex case. CORE Discussion Papers 2013016, Université Catholique de Louvain. Accessed July 1, 2016, http://EconPapers.repec.org/RePEc:cor:louvco:2013016.Google Scholar
  • [25] Devolder O , Glineur F , Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146(1–2):37–75.CrossrefGoogle Scholar
  • [26] Dinur I , Nissim K (2003) Revealing information while preserving privacy. PODS, 202–210.Google Scholar
  • [27] Duchi JC , Jordan MI , Wainwright MJ (2013) Local privacy and statistical minimax rates. FOCS, 429–438.Google Scholar
  • [28] Duchi J , Jordan M , Wainwright M (2014) Privacy aware learning. J. ACM 61(6):1–57. CrossrefGoogle Scholar
  • [29] Dunagan J , Vempala S (2008) A simple polynomial-time rescaling algorithm for solving linear programs. Math. Programming 114(1):101–114.CrossrefGoogle Scholar
  • [30] Dwork C , Roth A (2014) The Algorithmic Foundations of Differential Privacy. Foundations Trends Theories Comput. Sci. 9(3–4):211–407.Google Scholar
  • [31] Dwork C , McSherry F , Nissim K , Smith A (2006) Calibrating noise to sensitivity in private data analysis. TCC, 265–284.Google Scholar
  • [32] Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth A (2014) Preserving statistical validity in adaptive data analysis. Preprint, submitted November 10, http://arxiv.org/abs/1411.2664.Google Scholar
  • [33] Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth A (2015) Generalization in adaptive data analysis and holdout reuse. Preprint, submitted June 8, http://arxiv.org/abs/1506.02629. Google Scholar
  • [34] Feige U (2002) Relations between average case complexity and approximation complexity. STOC. (ACM), 534–543.Google Scholar
  • [35] Feldman V (2008) Evolvability from learning algorithms. Proc. STOC, 619–628.Google Scholar
  • [36] Feldman V (2009) A complete characterization of statistical query learning with applications to evolvability. Proc. FOCS, 375–384.Google Scholar
  • [37] Feldman V (2016) Dealing with range anxiety in mean estimation via statistical queries. Preprint, submitted November 20, http://arxiv.org/abs/1611.06475.Google Scholar
  • [38] Feldman V , Lee H , Servedio R (2011) Lower bounds and hardness amplification for learning shallow monotone formulas. COLT. vol. 19, 273–292.Google Scholar
  • [39] Feldman V , Perkins W , Vempala S (2013) On the complexity of random satisfiability problems with planted solutions. Preprint, submitted November 19, http://arxiv.org/abs/1311.4821.Google Scholar
  • [40] Feldman V , Grigorescu E , Reyzin L , Vempala S , Xiao Y (2012) Statistical algorithms and a lower bound for detecting planted cliques. Preprint, submitted January 5, http://arxiv.org/abs/1201.1214.Google Scholar
  • [41] Fiorini S , Massar S , Pokutta S , Tiwary H , de Wolf R (2012) Linear vs. semidefinite extended formulations: Exponential separation and strong lower bounds. STOC, 95–106.Google Scholar
  • [42] Forster J (2002) A linear lower bound on the unbounded error probabilistic communication complexity. J. Comput. System Sci. 65(4):612–625.CrossrefGoogle Scholar
  • [43] Freund Y , Schapire R (1998) Large margin classification using the Perceptron algorithm. COLT, 209–217.Google Scholar
  • [44] Goldreich O (2011) Candidate one-way functions based on expander graphs. Goldreich O, ed. Studies in Complexity and Cryptography: Miscellanea on the Interplay Between Randomness and Computation, Lecture Notes in ComputerScience, vol 6650 (Springer, Berlin, Heidelberg), 76–87.Google Scholar
  • [45] Grötschel M , Lovász L , Schrijver A (1988) Geometric Algorithms and Combinatorial Optimization (Springer, Berlin).CrossrefGoogle Scholar
  • [46] Grove A , Littlestone N , Schuurmans D (1997) General convergence results for linear discriminant updates. Proc. 10th Annual Conf. Comput. Learn. Theory, 171–183.Google Scholar
  • [47] Guzmán C , Nemirovski A (2015) On lower complexity bounds for large-scale smooth convex optimization. J. Complexity 31(1):1–14.CrossrefGoogle Scholar
  • [48] Hardt M , Rothblum G (2010) A multiplicative weights mechanism for privacy-preserving data analysis. FOCS, 61–70.Google Scholar
  • [49] Hardt M , Ullman J (2014) Preventing false discovery in interactive data analysis is hard. FOCS, 454–463.Google Scholar
  • [50] Hedayat A , Wallis WD (1978) Hadamard matrices and their applications. Annals Statist. 6(6):1184–1238.Google Scholar
  • [51] Hsu D , Sabato S (2013) Approximate loss minimization with heavy tails. Preprint, submitted July 7, http://arxiv.org/abs/1307.1827.Google Scholar
  • [52] John F (1948) Extremum problems with inequalities as subsidiary conditions. Studies and Essays Presented to R. Courant (Interscience Publishers, New York), 187–204.Google Scholar
  • [53] Kakade S , Sridharan K , Tewari A (2008) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. NIPS (Curran Associates, Inc.), 793–800.Google Scholar
  • [54] Kalai AT , Vempala S (2006) Simulated annealing for convex optimization. Math. Oper. Res. 31(2):253–266.LinkGoogle Scholar
  • [55] Kallweit M , Simon H (2011) A close look to margin complexity and related parameters. COLT, 437–456.Google Scholar
  • [56] Kashin B (1977) The widths of certain finite dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR Ser. Mat, 334–351.Google Scholar
  • [57] Kasiviswanathan SP , Lee HK , Nissim K , Raskhodnikova S , Smith A (2011) What can we learn privately? SIAM J. Comput. 40(3):793–826.CrossrefGoogle Scholar
  • [58] Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J. ACM 45(6):983–1006.CrossrefGoogle Scholar
  • [59] Khachiyan LG (1996) Rounding of polytopes in the real number model of computation. Math. Oper. Res. 21(2):307–320.LinkGoogle Scholar
  • [60] Klivans A , Sherstov A (2007) Unconditional lower bounds for learning intersections of halfspaces. Machine Learn . 69(2–3):97–114.CrossrefGoogle Scholar
  • [61] Lee J , Raghavendra P , Steurer D (2015) Lower bounds on the size of semidefinite programming relaxations. STOC, 567–576.Google Scholar
  • [62] Li J , Nikolov A , Razenshteyn I , Waingarten E (2019) On mean estimation for general norms with statistical queries. Proc. Machine Learn. Res., vol. 99 (PMLR, Phoenix, AZ), 2158–2172.Google Scholar
  • [63] Littlestone N (1987) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learn . 2:285–318.CrossrefGoogle Scholar
  • [64] Lovász L , Vempala S (2006) Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. FOCS, 57–68.Google Scholar
  • [65] Lovász L , Vempala S (2006) Simulated annealing in convex bodies and an O*(n 4) volume algorithm. J. Comput. System Sci. 72(2):392–417.Google Scholar
  • [66] Lyubarskii Y , Vershynin R (2010) Uncertainty principles and vector quantization. IEEE Trans.Inform. Theory 56(7):3491–3501.Google Scholar
  • [67] Meka R , Potechin A , Wigderson A (2015) Sum-of-squares lower bounds for planted clique. STOC, 87–96.Google Scholar
  • [68] Nemirovsky A , Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (J. Wiley Sons, New York).Google Scholar
  • [69] Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o (1/k2). Soviet Math. Doklady 27(2):372–376.Google Scholar
  • [70] Novikoff A (1962) On convergence proofs on perceptrons. Proc. Sympos. Math. Theory Automata, vol. XII, 615–622.Google Scholar
  • [71] Pisier G (2016) Martingales in Banach Spaces. Cambridge Studies in Advanced Mathematics (Cambridge University Press, Cambridge, MA).Google Scholar
  • [72] Poljak B (1987) Introduction to Optimization (Optimization Software, New York).Google Scholar
  • [73] Raginsky M , Rakhlin A (2011) Information-based complexity, feedback and dynamics in convex programming. IEEE Trans. Inform. Theory 57(10):7036–7056.CrossrefGoogle Scholar
  • [74] Rockafellar R (1974) Conjugate Duality and Optimization. Regional conference series in applied mathematics (Society for Industrial and Applied Mathematics, Philadelphia).CrossrefGoogle Scholar
  • [75] Rosenblatt F (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev. 65(6):386–407.CrossrefGoogle Scholar
  • [76] Roth A , Roughgarden T (2010) Interactive privacy via the median mechanism. STOC, 765–774.Google Scholar
  • [77] Rothvoß T (2014) The matching polytope has exponential extension complexity. STOC, 263–272.Google Scholar
  • [78] Roy I , Setty STV , Kilzer A , Shmatikov V , Witchel E (2010) Airavat: Security and privacy for MapReduce. NSDI, 297–312.Google Scholar
  • [79] Schoenebeck G (2008) Linear level lasserre lower bounds for certain k-csps. FOCS, 593–602.Google Scholar
  • [80] Servedio R (1999) On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. Proc. 12th Annual Conf. Computat. Learn. Theory, 296–307.Google Scholar
  • [81] Shalev-Shwartz S , Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, MA).CrossrefGoogle Scholar
  • [82] Shalev-Shwartz S , Shamir O , Srebro N , Sridharan K (2009) Stochastic convex optimization. COLT.Google Scholar
  • [83] Sherstov AA (2008) Halfspace matrices. Comput. Complexity 17(2):149–178.CrossrefGoogle Scholar
  • [84] Shor N (2011) Nondifferentiable Optimization and Polynomial Problems (Springer, Berlin).Google Scholar
  • [85] Simon H (2007) A characterization of strong learnability in the statistical query model. Proc. Sympos. Theoretical Aspects Comput. Sci., 393–404.Google Scholar
  • [86] Srebro N , Tewari A (2010) Stochastic optimization: ICML 2010 tutorial. Accessed July 1, 2016, http://www.ttic.edu/icml2010stochopt/.Google Scholar
  • [87] Steinhardt J , Valiant G , Wager S (2016) Memory, communication, and statistical queries. COLT, 1490–1516.Google Scholar
  • [88] Steinke T , Ullman J (2015) Interactive fingerprinting codes and the hardness of preventing false discovery. COLT, 1588–1628.Google Scholar
  • [89] Studer C , Goldstein T , Yin W , Baraniuk R (2014) Democratic representations. Preprint, submitted January 15, http://arxiv.org/abs/1401.3420.Google Scholar
  • [90] Sujeeth AK , Lee H , Brown KJ , Chafi H , Wu M , Atreya AR , Olukotun K , Rompf T , Odersky M (2011) OptiML: an implicitly parallel domain specific language for machine learning. Proc. 28th Internat. Conf. Machine Learn. (ACM, New York), 609–616.Google Scholar
  • [91] Ullman J (2015) Private multiplicative weights beyond linear queries. PODS, 303–312.Google Scholar
  • [92] Valiant LG (2009) Evolvability. J. ACM 56(1):3.1–3.216.Google Scholar
  • [93] Valiant P (2014) Evolvability of real functions. ACM Trans. Comput. Theory 6(3):12.1–12.19.Google Scholar
  • [94] Wang Z , Gu Q , Liu H (2014) Statistical-Computational phase transitions in planted models: The high-dimensional setting. Proc. 31st Internat. Conf. Machine Learn. 32(2):244–252.Google Scholar
  • [95] Warner SL (1965) Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60(309):63–69.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.