Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

Vitaly Feldman
Vitaly Feldman
[email protected]
Apple, Inc., Cupertino, California 95014;
Search for more papers by this author
,
Cristóbal Guzmán
Corresponding Author
Cristóbal Guzmán
[email protected]
https://orcid.org/0000-0002-1498-2055
Institute for Mathematical and Computational Engineering, Facultad de Matemáticas & Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Region Metropolitana, 3580000 Santiago, Chile;
Search for more papers by this author
,
Santosh Vempala
Santosh Vempala
[email protected]
ANID – Millennium Science Initiative Program – Millennium Nucleus Center for the Discovery of Structures in Complex Data, 7820244 Santiago, Chile;School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332
Search for more papers by this author

Vitaly Feldman

[email protected]

Apple, Inc., Cupertino, California 95014;

Search for more papers by this author

Cristóbal Guzmán

Corresponding Author

Cristóbal Guzmán

[email protected]

https://orcid.org/0000-0002-1498-2055

Institute for Mathematical and Computational Engineering, Facultad de Matemáticas & Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Region Metropolitana, 3580000 Santiago, Chile;

Search for more papers by this author

Santosh Vempala

[email protected]

ANID – Millennium Science Initiative Program – Millennium Nucleus Center for the Discovery of Structures in Complex Data, 7820244 Santiago, Chile;School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332

Search for more papers by this author

Published Online:8 Mar 2021https://doi.org/10.1287/moor.2020.1111

References

[1] Agarwal A , Bartlett P , Ravikumar P , Wainwright M (2012) Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Trans. Inform. Theory 58(5):3235–3249.Crossref, Google Scholar
[2] Balcan M , Blum A (2006) On a theory of learning with similarity functions. ICML, 73–80.Google Scholar
[3] Balcan MF , Feldman V (2013) Statistical active learning algorithms. NIPS, 1295–1303.Google Scholar
[4] Balcan M , Blum A , Fine S , Mansour Y (2012) Distributed learning, communication complexity and privacy. COLT, 26.1–26.22.Google Scholar
[5] Ball K , Carlen E , Lieb E (1994) Sharp uniform convexity and smoothness inequalities for trace norms. Inventiones Mathematicae 115(1):463–482.Google Scholar
[6] Bassily R , Smith A , Thakurta A (2014) Private empirical risk minimization: Efficient algorithms and tight error bounds. FOCS, 464–473.Google Scholar
[7] Bassily R , Nissim K , Smith AD , Steinke T , Stemmer U , Ullman J (2016) Algorithmic stability for adaptive data analysis. Proc. 48th Annual ACM Sympos. Theory Comput. (ACM, New York), 1046–1059.Google Scholar
[8] Belloni A , Liang T , Narayanan H , Rakhlin A (2015) Escaping the local minima via simulated annealing: Optimization of approximately convex functions. Grunwald P, Hazan E, Kale S, eds. Proc. Machine Learn. Res. (Paris), 240–265.Google Scholar
[9] Ben-David S , Dichterman E (1998) Learning with restricted focus of attention. J. Comput. System Sci. 56(3):277–298.Crossref, Google Scholar
[10] Ben-Tal A , Nemirovski A (2013) Lectures on modern convex optimization. Accessed July 2016, http://www2.isye.gatech.edu/∼nemirovs/.Google Scholar
[11] Bertsimas D , Vempala S (2004) Solving convex programs by random walks. J. ACM 51(4):540–556.Crossref, Google Scholar
[12] Blum A , Dwork C , McSherry F , Nissim K (2005) Practical privacy: The SuLQ framework. Li C, ed. Proc. PODS (ACM, New York), 128–138.Google Scholar
[13] Blum A , Frieze A , Kannan R , Vempala S (1997) A polynomial time algorithm for learning noisy linear threshold functions. Algorithmica 22(1/2):35–52.Google Scholar
[14] Blum A , Furst M , Jackson J , Kearns M , Mansour Y , Rudich S (1994) Weakly learning DNF and characterizing statistical query learning using Fourier analysis. Proc. STOC, 253–262.Google Scholar
[15] Braun G , Guzmán C , Pokutta S (2017) Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Inform. Theory 63(7):4709–4724.Google Scholar
[16] Bresler G , Gamarnik D , Shah D (2014) Structure learning of antiferromagnetic ising models. NIPS, 2852–2860.Google Scholar
[17] Bylander T (1994) Learning linear threshold functions in the presence of classification noise. Proc. COLT, 340–347.Google Scholar
[18] Cesa-Bianchi N , Conconi A , Gentile C (2004) On the generalization ability of on-line learning algorithms. IEEE Trans. Inform. Theory 50(9):2050–2057.Crossref, Google Scholar
[19] Chaudhuri K , Monteleoni C , Sarwate AD (2011) Differentially private empirical risk minimization. J. Machine Learn. Res. 12:1069–1109.Google Scholar
[20] Chu C , Kim S , Lin Y , Yu Y , Bradski G , Ng A , Olukotun K (2006) Map-reduce for machine learning on multicore. Proc. NIPS, 281–288.Google Scholar
[21] Coja-Oghlan A , Cooper C , Frieze A (2010) An efficient sparse regularity concept. SIAM J. Discrete Math. 23(4):2000–2034.Crossref, Google Scholar
[22] Dasgupta S , Kalai AT , Monteleoni C (2009) Analysis of perceptron-based active learning. J. Machine Learn. Res. 10:281–299.Google Scholar
[23] d’Aspremont A (2008) Smooth optimization with approximate gradient. SIAM J. Optim. 19(3):1171–1183.Crossref, Google Scholar
[24] Devolder O , Glineur F , Nesterov Y (2013) First-order methods with inexact oracle: The strongly convex case. CORE Discussion Papers 2013016, Université Catholique de Louvain. Accessed July 1, 2016, http://EconPapers.repec.org/RePEc:cor:louvco:2013016.Google Scholar
[25] Devolder O , Glineur F , Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146(1–2):37–75.Crossref, Google Scholar
[26] Dinur I , Nissim K (2003) Revealing information while preserving privacy. PODS, 202–210.Google Scholar
[27] Duchi JC , Jordan MI , Wainwright MJ (2013) Local privacy and statistical minimax rates. FOCS, 429–438.Google Scholar
[28] Duchi J , Jordan M , Wainwright M (2014) Privacy aware learning. J. ACM 61(6):1–57. Crossref, Google Scholar
[29] Dunagan J , Vempala S (2008) A simple polynomial-time rescaling algorithm for solving linear programs. Math. Programming 114(1):101–114.Crossref, Google Scholar
[30] Dwork C , Roth A (2014) The Algorithmic Foundations of Differential Privacy. Foundations Trends Theories Comput. Sci. 9(3–4):211–407.Google Scholar
[31] Dwork C , McSherry F , Nissim K , Smith A (2006) Calibrating noise to sensitivity in private data analysis. TCC, 265–284.Google Scholar
[32] Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth A (2014) Preserving statistical validity in adaptive data analysis. Preprint, submitted November 10, http://arxiv.org/abs/1411.2664.Google Scholar
[33] Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth A (2015) Generalization in adaptive data analysis and holdout reuse. Preprint, submitted June 8, http://arxiv.org/abs/1506.02629. Google Scholar
[34] Feige U (2002) Relations between average case complexity and approximation complexity. STOC. (ACM), 534–543.Google Scholar
[35] Feldman V (2008) Evolvability from learning algorithms. Proc. STOC, 619–628.Google Scholar
[36] Feldman V (2009) A complete characterization of statistical query learning with applications to evolvability. Proc. FOCS, 375–384.Google Scholar
[37] Feldman V (2016) Dealing with range anxiety in mean estimation via statistical queries. Preprint, submitted November 20, http://arxiv.org/abs/1611.06475.Google Scholar
[38] Feldman V , Lee H , Servedio R (2011) Lower bounds and hardness amplification for learning shallow monotone formulas. COLT. vol. 19, 273–292.Google Scholar
[39] Feldman V , Perkins W , Vempala S (2013) On the complexity of random satisfiability problems with planted solutions. Preprint, submitted November 19, http://arxiv.org/abs/1311.4821.Google Scholar
[40] Feldman V , Grigorescu E , Reyzin L , Vempala S , Xiao Y (2012) Statistical algorithms and a lower bound for detecting planted cliques. Preprint, submitted January 5, http://arxiv.org/abs/1201.1214.Google Scholar
[41] Fiorini S , Massar S , Pokutta S , Tiwary H , de Wolf R (2012) Linear vs. semidefinite extended formulations: Exponential separation and strong lower bounds. STOC, 95–106.Google Scholar
[42] Forster J (2002) A linear lower bound on the unbounded error probabilistic communication complexity. J. Comput. System Sci. 65(4):612–625.Crossref, Google Scholar
[43] Freund Y , Schapire R (1998) Large margin classification using the Perceptron algorithm. COLT, 209–217.Google Scholar
[44] Goldreich O (2011) Candidate one-way functions based on expander graphs. Goldreich O, ed. Studies in Complexity and Cryptography: Miscellanea on the Interplay Between Randomness and Computation, Lecture Notes in ComputerScience, vol 6650 (Springer, Berlin, Heidelberg), 76–87.Google Scholar
[45] Grötschel M , Lovász L , Schrijver A (1988) Geometric Algorithms and Combinatorial Optimization (Springer, Berlin).Crossref, Google Scholar
[46] Grove A , Littlestone N , Schuurmans D (1997) General convergence results for linear discriminant updates. Proc. 10th Annual Conf. Comput. Learn. Theory, 171–183.Google Scholar
[47] Guzmán C , Nemirovski A (2015) On lower complexity bounds for large-scale smooth convex optimization. J. Complexity 31(1):1–14.Crossref, Google Scholar
[48] Hardt M , Rothblum G (2010) A multiplicative weights mechanism for privacy-preserving data analysis. FOCS, 61–70.Google Scholar
[49] Hardt M , Ullman J (2014) Preventing false discovery in interactive data analysis is hard. FOCS, 454–463.Google Scholar
[50] Hedayat A , Wallis WD (1978) Hadamard matrices and their applications. Annals Statist. 6(6):1184–1238.Google Scholar
[51] Hsu D , Sabato S (2013) Approximate loss minimization with heavy tails. Preprint, submitted July 7, http://arxiv.org/abs/1307.1827.Google Scholar
[52] John F (1948) Extremum problems with inequalities as subsidiary conditions. Studies and Essays Presented to R. Courant (Interscience Publishers, New York), 187–204.Google Scholar
[53] Kakade S , Sridharan K , Tewari A (2008) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. NIPS (Curran Associates, Inc.), 793–800.Google Scholar
[54] Kalai AT , Vempala S (2006) Simulated annealing for convex optimization. Math. Oper. Res. 31(2):253–266.Link, Google Scholar
[55] Kallweit M , Simon H (2011) A close look to margin complexity and related parameters. COLT, 437–456.Google Scholar
[56] Kashin B (1977) The widths of certain finite dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR Ser. Mat, 334–351.Google Scholar
[57] Kasiviswanathan SP , Lee HK , Nissim K , Raskhodnikova S , Smith A (2011) What can we learn privately? SIAM J. Comput. 40(3):793–826.Crossref, Google Scholar
[58] Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J. ACM 45(6):983–1006.Crossref, Google Scholar
[59] Khachiyan LG (1996) Rounding of polytopes in the real number model of computation. Math. Oper. Res. 21(2):307–320.Link, Google Scholar
[60] Klivans A , Sherstov A (2007) Unconditional lower bounds for learning intersections of halfspaces. Machine Learn . 69(2–3):97–114.Crossref, Google Scholar
[61] Lee J , Raghavendra P , Steurer D (2015) Lower bounds on the size of semidefinite programming relaxations. STOC, 567–576.Google Scholar
[62] Li J , Nikolov A , Razenshteyn I , Waingarten E (2019) On mean estimation for general norms with statistical queries. Proc. Machine Learn. Res., vol. 99 (PMLR, Phoenix, AZ), 2158–2172.Google Scholar
[63] Littlestone N (1987) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learn . 2:285–318.Crossref, Google Scholar
[64] Lovász L , Vempala S (2006) Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. FOCS, 57–68.Google Scholar
[65] Lovász L , Vempala S (2006) Simulated annealing in convex bodies and an O*(n 4) volume algorithm. J. Comput. System Sci. 72(2):392–417.Google Scholar
[66] Lyubarskii Y , Vershynin R (2010) Uncertainty principles and vector quantization. IEEE Trans.Inform. Theory 56(7):3491–3501.Google Scholar
[67] Meka R , Potechin A , Wigderson A (2015) Sum-of-squares lower bounds for planted clique. STOC, 87–96.Google Scholar
[68] Nemirovsky A , Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (J. Wiley Sons, New York).Google Scholar
[69] Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o (1/k2). Soviet Math. Doklady 27(2):372–376.Google Scholar
[70] Novikoff A (1962) On convergence proofs on perceptrons. Proc. Sympos. Math. Theory Automata, vol. XII, 615–622.Google Scholar
[71] Pisier G (2016) Martingales in Banach Spaces. Cambridge Studies in Advanced Mathematics (Cambridge University Press, Cambridge, MA).Google Scholar
[72] Poljak B (1987) Introduction to Optimization (Optimization Software, New York).Google Scholar
[73] Raginsky M , Rakhlin A (2011) Information-based complexity, feedback and dynamics in convex programming. IEEE Trans. Inform. Theory 57(10):7036–7056.Crossref, Google Scholar
[74] Rockafellar R (1974) Conjugate Duality and Optimization. Regional conference series in applied mathematics (Society for Industrial and Applied Mathematics, Philadelphia).Crossref, Google Scholar
[75] Rosenblatt F (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psych. Rev. 65(6):386–407.Crossref, Google Scholar
[76] Roth A , Roughgarden T (2010) Interactive privacy via the median mechanism. STOC, 765–774.Google Scholar
[77] Rothvoß T (2014) The matching polytope has exponential extension complexity. STOC, 263–272.Google Scholar
[78] Roy I , Setty STV , Kilzer A , Shmatikov V , Witchel E (2010) Airavat: Security and privacy for MapReduce. NSDI, 297–312.Google Scholar
[79] Schoenebeck G (2008) Linear level lasserre lower bounds for certain k-csps. FOCS, 593–602.Google Scholar
[80] Servedio R (1999) On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. Proc. 12th Annual Conf. Computat. Learn. Theory, 296–307.Google Scholar
[81] Shalev-Shwartz S , Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, MA).Crossref, Google Scholar
[82] Shalev-Shwartz S , Shamir O , Srebro N , Sridharan K (2009) Stochastic convex optimization. COLT.Google Scholar
[83] Sherstov AA (2008) Halfspace matrices. Comput. Complexity 17(2):149–178.Crossref, Google Scholar
[84] Shor N (2011) Nondifferentiable Optimization and Polynomial Problems (Springer, Berlin).Google Scholar
[85] Simon H (2007) A characterization of strong learnability in the statistical query model. Proc. Sympos. Theoretical Aspects Comput. Sci., 393–404.Google Scholar
[86] Srebro N , Tewari A (2010) Stochastic optimization: ICML 2010 tutorial. Accessed July 1, 2016, http://www.ttic.edu/icml2010stochopt/.Google Scholar
[87] Steinhardt J , Valiant G , Wager S (2016) Memory, communication, and statistical queries. COLT, 1490–1516.Google Scholar
[88] Steinke T , Ullman J (2015) Interactive fingerprinting codes and the hardness of preventing false discovery. COLT, 1588–1628.Google Scholar
[89] Studer C , Goldstein T , Yin W , Baraniuk R (2014) Democratic representations. Preprint, submitted January 15, http://arxiv.org/abs/1401.3420.Google Scholar
[90] Sujeeth AK , Lee H , Brown KJ , Chafi H , Wu M , Atreya AR , Olukotun K , Rompf T , Odersky M (2011) OptiML: an implicitly parallel domain specific language for machine learning. Proc. 28th Internat. Conf. Machine Learn. (ACM, New York), 609–616.Google Scholar
[91] Ullman J (2015) Private multiplicative weights beyond linear queries. PODS, 303–312.Google Scholar
[92] Valiant LG (2009) Evolvability. J. ACM 56(1):3.1–3.216.Google Scholar
[93] Valiant P (2014) Evolvability of real functions. ACM Trans. Comput. Theory 6(3):12.1–12.19.Google Scholar
[94] Wang Z , Gu Q , Liu H (2014) Statistical-Computational phase transitions in planted models: The high-dimensional setting. Proc. 31st Internat. Conf. Machine Learn. 32(2):244–252.Google Scholar
[95] Warner SL (1965) Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60(309):63–69.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 46, Issue 3

August 2021

Pages 835-1234, C2

Article Information

Metrics

Information

Received:June 23, 2018
Accepted:September 27, 2020
Published Online:March 08, 2021

Cite as

Vitaly Feldman , Cristóbal Guzmán , Santosh Vempala (2021) Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization. Mathematics of Operations Research 46(3):912-945.

https://doi.org/10.1287/moor.2020.1111

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization

References

Volume 46, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News