A New Homotopy Proximal Variable-Metric Framework for Composite Convex Minimization

Quoc Tran-Dinh
Quoc Tran-Dinh
[email protected]
https://orcid.org/0000-0002-1077-2579
The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
Search for more papers by this author
,
Ling Liang
Ling Liang
[email protected]
Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore 119076
Search for more papers by this author
,
Kim-Chuan Toh
Kim-Chuan Toh
[email protected]
Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore 119076
Search for more papers by this author

The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599

Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore 119076

Search for more papers by this author

Kim-Chuan Toh

[email protected]

Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore 119076

Search for more papers by this author

Published Online:4 Oct 2021https://doi.org/10.1287/moor.2021.1138

References

[1] Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Machine Learn. Res. 9(15):485–516.Google Scholar
[2] Bauschke HH, Combettes P (2017) Convex Analysis and Monotone Operators Theory in HilbertSspaces, 2nd ed. (Springer-Verlag, New York).Crossref, Google Scholar
[3] Beck A, Teboulle M (2009) Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Processing 18(11):2419–2434.Crossref, Google Scholar
[4] Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1):183–202.Crossref, Google Scholar
[5] Beck A, Teboulle M (2012) Smoothing and first order methods: A unified framework. SIAM J. Optim. 22(2):557–580.Crossref, Google Scholar
[6] Becker S, Fadili M (2012) A quasi-Newton proximal splitting method. Adv. Neural Inform. Processing Systems Foundation, vol. 25, 1–8.Google Scholar
[7] Becker S, Candès EJ, Grant M (2011) Templates for convex cone problems with applications to sparse signal recovery. Math. Programming. Comput. 3(3):165–218.Crossref, Google Scholar
[8] Bertsekas D (2011) Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Sra S, Nowozin S, Wright SJ, eds. Optimization for Machine Learning, 1–38.Google Scholar
[9] Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Machine Learn. 3(1):1–122.Crossref, Google Scholar
[10] Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1):120–145.Crossref, Google Scholar
[11] Dalalyan AS, Hebiri M, Meziani K, Salmon J (2013) Learning heteroscedastic models by convex programming under group sparsity. Proc. Internat. Conf. Machine Learn. (PMLR, Atlanta), 379–387.Google Scholar
[12] Davis D (2015) Convergence rate analysis of the forward-Douglas-Rachford splitting scheme. SIAM J. Optim. 25(3):1760–1786.Crossref, Google Scholar
[13] Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inform. Processing Systems, 1646–1654.Google Scholar
[14] Deuflhard P (2006) Newton Methods for Nonlinear Problems—Affine Invariance and Adaptative Algorithms, Springer Series in Computational Mathematics, vol. 35, 2nd ed. (Springer Science & Business Media).Google Scholar
[15] Devolder O, Glineur F, Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146(1–2):37–75.Crossref, Google Scholar
[16] Esser JE (2010) Primal-dual algorithm for convex models and applications to image restoration, registration and nonlocal inpainting. Unpublished PhD thesis, University of California, Los Angeles.Google Scholar
[17] Fercoq O, Richtárik P (2015) Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4):1997–2023.Crossref, Google Scholar
[18] Friedlander M, Goh G (2017) Efficient evaluation of scaled proximal operators. Electronic Trans. Numerical Anal. 46:1–22.Google Scholar
[19] Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441.Crossref, Google Scholar
[20] Giselsson P, Boyd S (2014) Diagonal scaling in Douglas-Rachford splitting and ADMM. 53rd IEEE Conf. Decision Control (IEEE, Los Angeles), 5033–5039.Google Scholar
[21] Goldfarb D, Ma S, Scheinberg K (2012) Fast alternating linearization methods of minimization of the sum of two convex functions. Math. Programming Ser. A 141(1):349–382.Google Scholar
[22] Goldstein T, Osher S (2009) The split Bregman method for l1-pegularized problems. SIAM J. Imaging Sci. 2(2):323–343.Crossref, Google Scholar
[23] Grant M, Boyd S, Ye Y (2006) Disciplined convex programming. Liberti L, Maculan N, eds. Global Optimization: From Theory to Implementation, Nonconvex Optimization and its Applications (Springer, Boston), 155–210.Crossref, Google Scholar
[24] Harchaoui Z, Juditsky A, Nemirovski A (2012) Conditional gradient algorithms for machine learning. NIPS Workshop Optim. Machine Learn. 3(1):3–2.Google Scholar
[25] Harman R, Trnovská M (2009) Approximate D-optimal designs of experiments on the convex hull of a finite set of information matrices. Math. Slovaca 59(6):693–704.Crossref, Google Scholar
[26] Hsieh CJ, Sustik M, Dhillon I, Ravikumar P (2011) Sparse inverse covariance matrix estimation using quadratic approximation. Adv. Neural Inform. Processing Systems, vol. 24, 1–18.Google Scholar
[27] Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar PK, Poldrack R (2013) BIG & QUIC: Sparse inverse covariance estimation for a million variables. NIPS, 3165–3173.Google Scholar
[28] Ivanoff S, Picard F, Rivoirard V (2016) Adaptive LASSO and group-LASSO for functional Poisson regression. J. Machine Learn. Res. 17(1):1903–1948.Google Scholar
[29] Jaggi M (2013) Revisiting Frank-Wolfe: Projection-free sparse convex optimization. J. Machine Learn. Res. 28(1):427–435.Google Scholar
[30] Jia J, Xie F, Xu L (2019) Sparse Poisson regression with penalized weighted score function. Electronic J. Statist. 13(2):2898–2920.Google Scholar
[31] Karimi S, Vavasis S (2017) IMRO: A proximal quasi-Newton method for solving ℓ1-regularized least squares problems. SIAM J. Optim. 27(2):583–615.Crossref, Google Scholar
[32] Karimireddy S, Stich S, Jaggi M (2018) Global linear convergence of Newton’s method without strong-convexity or Lipschitz gradients. Preprint, submitted June 1, https://arxiv.org/abs/1806.00413.Google Scholar
[33] Kyrillidis A, Karimi R, Tran-Dinh Q, Cevher V (2014) Scalable sparse covariance estimation via self-concordance. Proc. 28th AAAI Conf. Artificial Intelligence, Québec, Canada, 1946–1952.Google Scholar
[34] Lam XY, Marron S, Sun D, Toh KC (2018) Fast algorithms for large-scale generalized distance weighted discrimination. J. Comput. Graphical Statist. 27(2):1–12.Crossref, Google Scholar
[35] Lee J, Sun Y, Saunders M (2014) Proximal Newton-type methods for convex optimization. SIAM J. Optim. 24(3):1420–1443.Crossref, Google Scholar
[36] Lefkimmiatis S, Unser M (2013) Poisson image reconstruction with Hessian Schatten-norm regularization. IEEE Trans. Image Processing 22(11):4314–4327.Crossref, Google Scholar
[37] Li L, Toh K (2010) An inexact interior point method for λ -regularized sparse covariance selection. Math. Programming Comput. 2(3):291–315.Crossref, Google Scholar
[38] Li X, Sun D, Toh KC (2018) A highly efficient semismooth Newton augmented Lagrangian method for solving LASSO problems. SIAM J. Optim. 28(1):433–458.Crossref, Google Scholar
[39] Lichman M (2017) UCI machine learning repository. https://archive-beta.ics.uci.edu.Google Scholar
[40] Lu Z, Pong T (2013) Computing optimal experimental designs via interior point method. SIAM J. Matrix Anal. Appl. 34(4):1556–1580.Crossref, Google Scholar
[41] Marron SJ, Todd MJ, Ahn J (2007) Distance-weighted discrimination. J. Amer. Statist. Assoc. 102(480):1267–1271.Crossref, Google Scholar
[42] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87 (Kluwer Academic Publishers).Crossref, Google Scholar
[43] Nesterov Y (2005) Smooth minimization of non-smooth functions. Math. Programming 103(1):127–152.Crossref, Google Scholar
[44] Nesterov Y (2008) Accelerating the cubic regularization of Newton’s method on convex problems. Math. Programming 112:159–181.Crossref, Google Scholar
[45] Nesterov Y (2013) Gradient methods for minimizing composite objective function. Math. Programming 140(1):125–161.Crossref, Google Scholar
[46] Nesterov Y, Nemirovski A (1994) Interior-point Polynomial Algorithms in Convex Programming (SIAM).Crossref, Google Scholar
[47] Nesterov Y, Polyak B (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108(1):177–205.Crossref, Google Scholar
[48] Nesterov Y, Stich SU (2017) Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1):110–123.Crossref, Google Scholar
[49] Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. Adv. Neural Inform. Processing Systems, vol. 27, 1574–1582.Google Scholar
[50] Ostrovskii DM, Bach F (2021) Finite-sample analysis of M-estimators using self-concordance. Electronic J. Statist. 15(1):326–391.Google Scholar
[51] Parikh N, Boyd S (2013) Proximal algorithms. Foundations Trends Optim. 1(3):123–231.Google Scholar
[52] Richtárik P, Takáč M (2012) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Programming 144(1–2):1–38.Crossref, Google Scholar
[53] Rockafellar RT (1970) Convex Analysis, Princeton Mathematics Series, vol. 28 (Princeton University Press).Crossref, Google Scholar
[54] Schmidt M, Roux N, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inform. Processing Systems, vol. 24, 1–9.Google Scholar
[55] Shalev-Shwartz S, Zhang T (2013) Stochastic dual coordinate ascent methods for regularized loss minimization. J. Machine Learn. Res. 14:567–599.Google Scholar
[56] Shefi R, Teboulle M (2014) Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1):269–297.Crossref, Google Scholar
[57] Stella L, Themelis A, Patrinos P (2017) Forward–backward quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67(3):443–487.Crossref, Google Scholar
[58] Su W, Boyd S, Candes E. (2014) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. Adv. Neural Inform. Processing Systems, 2510–2518.Google Scholar
[59] Sun T, Tran-Dinh Q (2019) Generalized self-concordant functions: A recipe for Newton-type methods. Math. Programming 178(1–2):145–213.Crossref, Google Scholar
[60] Toh KC, Todd M, Tütüncü R (2010) On the implementation and usage of SDPT3—A Matlab software package for semidefinite-quadratic-linear programming. Technical Report 4, National University of Singapore.Google Scholar
[61] Tran-Dinh Q, Fercoq O, Cevher V (2018) A smooth primal-dual optimization framework for nonsmooth composite convex minimization. SIAM J. Optim. 28(1):96–134.Crossref, Google Scholar
[62] Tran-Dinh Q, Kyrillidis A, Cevher V (2014) An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4):1718–1745.Crossref, Google Scholar
[63] Tran-Dinh Q, Kyrillidis A, Cevher V (2015) Composite self-concordant minimization. J. Machine Learn. Res. 15:374–416.Google Scholar
[64] Tran-Dinh Q, Sun T, Lu S (2019) Self-concordant inclusions: A unified framework for path-following generalized Newton-type algorithms. Math. Programming 177(1–2):173–223.Crossref, Google Scholar
[65] Wright SJ (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.Crossref, Google Scholar
[66] Wright NRSJ, Figueiredo M (2009) Sparse reconstruction by separable approximation. IEEE Trans. Signal Processing 57:2479–2493.Crossref, Google Scholar
[67] Yang L, Sun D, Toh KC (2015) SDPNAL+: A majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math. Programming Comput. 7(3):331–366.Crossref, Google Scholar
[68] Zhao XY, Sun D, Toh KC (2010) A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4):1737–1765.Crossref, Google Scholar
[69] Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. B 67(2):301–320.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 1

February 2022

Pages 1-846, C2

Article Information

Metrics

Information

Received:December 23, 2018
Accepted:October 28, 2020
Published Online:October 04, 2021

Cite as

Quoc Tran-Dinh, Ling Liang, Kim-Chuan Toh (2021) A New Homotopy Proximal Variable-Metric Framework for Composite Convex Minimization. Mathematics of Operations Research 47(1):508-539.

https://doi.org/10.1287/moor.2021.1138

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A New Homotopy Proximal Variable-Metric Framework for Composite Convex Minimization

References

Volume 47, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News