Trimmed Statistical Estimation via Variance Reduction

Aleksandr Aravkin
Corresponding Author
Aleksandr Aravkin
Department of Applied Mathematics, University of Washington, Seattle, Washington 98195
Search for more papers by this author
,
Damek Davis
Damek Davis
http://orcid.org/0000-0003-2105-4641
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850
Search for more papers by this author

Aleksandr Aravkin

Corresponding Author

Aleksandr Aravkin

Department of Applied Mathematics, University of Washington, Seattle, Washington 98195

Search for more papers by this author

Damek Davis

http://orcid.org/0000-0003-2105-4641

School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14850

Search for more papers by this author

Published Online:5 Jul 2019https://doi.org/10.1287/moor.2019.0992

References

[1] Abdel-Aziz Y, Karara H, Hauck M (2015) Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogrammetric Engrg. Remote Sensing 81(2):103–107.Crossref, Google Scholar
[2] Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Statist. 7(1):226–248.Crossref, Google Scholar
[3] Aravkin A, Drusvyatskiy D, van Leeuwen T (2016) Variable projection without smoothness. Working paper, University of Washington, Seattle.Google Scholar
[4] Bauschke HH, Combettes PL (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408 (Springer, New York).Crossref, Google Scholar
[5] Bolte J, Daniilidis A, Lewis A (2007) The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4):1205–1223.Crossref, Google Scholar
[6] Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Programming 146(1–2):459–494.Crossref, Google Scholar
[7] Bolte J, Daniilidis A, Lewis A, Shiota M (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.Crossref, Google Scholar
[8] Bradley PS, Mangasarian OL, Street WN (1997) Clustering via concave minimization. Mozer MC, Jordan MI, Petsche T, eds. Advances in Neural Information Processing Systems, vol. 9 (Curran Associates, Red Hook, NY), 368–374.Google Scholar
[9] Davis D (2016) The asynchronous PALM algorithm for nonsmooth nonconvex problems. Working paper, Cornell University, Ithaca, NY.Google Scholar
[10] Davis D, Drusvyatskiy D (2018) Stochastic subgradient method converges at the rate O(k-1/4) on weakly convex functions. Working paper, Cornell University, Ithaca, NY.Google Scholar
[11] Davis D, Edmunds B, Udell M (2016) The sound of APALM clapping: Faster nonsmooth nonconvex optimization with stochastic asynchronous PALM. Lee DD, Sugiyama MM, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Red Hook, NY), 226–234.Google Scholar
[12] Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 1646–1654.Google Scholar
[13] Drusvyatskiy D (2018) The proximal point method revisited. SIAG/OPT Views News 26(1):1–8.Google Scholar
[14] Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.Link, Google Scholar
[15] Drusvyatskiy D, Pacquette C (2018) Variational analysis of spectral functions simplified. J. Convex Anal. 25(1):119–134.Google Scholar
[16] Drusvyatskiy D, Ioffe AD, Lewis AS (2016) Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. Working paper, University of Washington, Seattle.Google Scholar
[17] Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96(456):1348–1360.Crossref, Google Scholar
[18] Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24(6):381–395.Crossref, Google Scholar
[19] Gao HY, Bruce AG (1997) WaveShrink with firm shrinkage. Statist. Sinica 7(4):855–874.Google Scholar
[20] Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1–2):267–305.Crossref, Google Scholar
[21] Hare W, Sagastizábal C (2009) Computing proximal points of nonconvex functions. Math. Programming 116(1):221–258.Crossref, Google Scholar
[22] Hartley R, Zisserman A (2003) Multiple View Geometry in Computer Vision (Cambridge University Press, Cambridge, UK).Google Scholar
[23] Huber PJ (2004) Robust Statistics (John Wiley & Sons, Berlin).Google Scholar
[24] Hunter JD (2007) Matplotlib: A 2D graphics environment. Comput. Sci. Engrg. 9(3):90–95.Crossref, Google Scholar
[25] Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc. 84(408):881–896.Google Scholar
[26] LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
[27] Lewis AS (1999) Nonsmooth analysis of eigenvalues. Math. Programming 84(1):1–24.Crossref, Google Scholar
[28] Lowe DG (1999) Object recognition from local scale-invariant features. Proc. 7th IEEE Internat. Conf. Comput. Vision, vol. 2 (IEEE, Piscataway, NJ), 1150–1157.Crossref, Google Scholar
[29] Luo ZQ, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46(1):157–178.Crossref, Google Scholar
[30] Ma Y, Soatto S, Kosecka J, Sastry SS (2012) An Invitation to 3-D Vision: From Images to Geometric Models, vol. 26 (Springer Science & Business Media).Google Scholar
[31] Mangasarian OL (2007) Absolute value equation solution via concave minimization. Optim. Lett. 1(1):3–8.Crossref, Google Scholar
[32] Maronna RA, Martin D, Yohai (2006) Robust Statistics, Wiley Series in Probability and Statistics (Wiley, Berlin).Crossref, Google Scholar
[33] Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2014) On the least trimmed squares estimator. Algorithmica 69(1):148–183.Crossref, Google Scholar
[34] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization (Kluwer Academic Publications, Boston).Crossref, Google Scholar
[35] Neykov NM, Müller CH (2003) Breakdown point and computation of trimmed likelihood estimators in generalized linear models. Dutter R, Filzmoser P, Gather U, Rousseeuw PJ, eds. Developments in Robust Statistics (Springer, Berlin), 277–286.Crossref, Google Scholar
[36] R Development Core Team (2008) R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna).Google Scholar
[37] Reddi SJ, Sra S, Póczos B, Smola AJ (2016) Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Lee DD, Sugiyama MM, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Red Hook, NY), 1145–1153.Google Scholar
[38] Robbins H, Siegmund D (1985) A convergence theorem for non negative almost supermartingales and some applications. Herbert Robbins Selected Papers (Springer-Verlag, New York), 111–135.Crossref, Google Scholar
[39] Rockafellar RT, Wets RJB (1998) Variational Analysis, vol. 317 (Springer-Verlag, Berlin).Crossref, Google Scholar
[40] Rousseeuw PJ (1984) Least median of squares regression. J. Amer. Statist. Assoc. 79(388):871–880.Crossref, Google Scholar
[41] Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math. Statist. Appl. 8(37):283–297.Crossref, Google Scholar
[42] Rousseeuw PJ, Van Driessen K (2006) Computing LTS regression for large data sets. Data Mining Knowledge Discovery 12(1):29–45.Crossref, Google Scholar
[43] Ruppert D, Carroll RJ (1980) Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75(372):828–838.Crossref, Google Scholar
[44] Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. Accessed June 25, 2019, http://www.vlfeat.org/.Google Scholar
[45] Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4):2057–2075.Crossref, Google Scholar
[46] Yang E, Lozano A (2015) Robust Gaussian graphical modeling with the trimmed graphical lasso. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2602–2610.Google Scholar
[47] Yang E, Lozano AC, Aravkin A (2018) A general family of trimmed estimators for robust high-dimensional data analysis. Electronic J. Statist. 12(2):3519–3553.Crossref, Google Scholar
[48] Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 45, Issue 1

February 2020

Pages 1-401, C2

Article Information

Metrics

Information

Received:February 20, 2018
Accepted:December 23, 2018
Published Online:July 05, 2019

Cite as

Aleksandr Aravkin, Damek Davis (2019) Trimmed Statistical Estimation via Variance Reduction. Mathematics of Operations Research 45(1):292-322.

https://doi.org/10.1287/moor.2019.0992

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Trimmed Statistical Estimation via Variance Reduction

References

Volume 45, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News