Proximal Gradient Methods with Adaptive Subspace Sampling
Published Online:23 Feb 2021https://doi.org/10.1287/moor.2020.1092
References
- [1] (2012) Optimization with sparsity-inducing penalties. Foundations Trends Machine Learning 4(1):1–106.Google Scholar
- [2] (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces (Springer Science & Business Media, Berlin).Crossref, Google Scholar
- [3] (1976) On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Automatic Control 21(2):174–184.Crossref, Google Scholar
- [4] (2016) A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans. Automatic Control 61(10):2947–2957.Crossref, Google Scholar
- [5] (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3-4):231–357.Google Scholar
- [6] (1988) On the identification of active constraints. SIAM J. Numerical Anal. 25(5):1197–1211.Crossref, Google Scholar
- [7] (2008) Enhancing sparsity by reweighted ℓ 1 minimization. J. Fourier Anal. Appl. 14(5-6):877–905.Crossref, Google Scholar
- [8] (2007) Proximal thresholding algorithm for minimization over orthonormal bases. SIAM J. Optim. 18(4):1351–1376.Crossref, Google Scholar
- [9] (2011) Proximal splitting methods in signal processing. Bauschke H, Burachik R, Combettes P, Elser V, Luke D, Wolkowicz H, eds. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer Optimization and its Applications, vol. 49 (Springer, New York), 185–212.Crossref, Google Scholar
- [10] (2013) A direct algorithm for 1D total variation denoising. IEEE Signal Processing Lett. 20(11):1054–1057.Crossref, Google Scholar
- [11] (2011) Nearest neighbor based greedy coordinate descent. Adv. Neural Inform Processing Systems 24:2160–2168.Google Scholar
- [12] (1995) De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41(3):613–627.Crossref, Google Scholar
- [13] (2014) Optimality, identifiability, and sensitivity. Math. Programming 147(1-2):467–498.Crossref, Google Scholar
- [14] (2019) Model consistency for learning with mirror-stratifiable regularizers. Chaudhuri K, Sugiyama M, eds. 22nd Internat. Conf. Artificial Intelligence Statist. (Proc. of Machine Learn. Res.) 89:1236–1244.Google Scholar
- [15] (2018) Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4):2975–3000.Crossref, Google Scholar
- [16] (2015) Mind the duality gap: Safer rules for the lasso. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (MLResearchPress), 333–342.Google Scholar
- [17] (2015) Convergence analysis of prediction markets via randomized subspace descent. Adv. Neural Inform. Processing Systems 28:3034–3042.Google Scholar
- [18] (2013) Accelerated coordinate descent with adaptive coordinate frequencies. Proc. 5th Asian Conf. Machine Learn. (Proc. Machine Learn. Res.), 29:72–86.Google Scholar
- [19] (2018) Asynchronous distributed learning with sparse communications and identification. Preprint, submitted December 10, https://arxiv.org/abs/1812.03871.Google Scholar
- [20] (2018) SEGA: Variance reduction via gradient sketching. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, (Curran Associates, Montreal, Canada), 2083–2094.Google Scholar
- [21] (2004) Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2):251–266.Google Scholar
- [22] (2002) Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3):702–725.Crossref, Google Scholar
- [23] (2018) Partial smoothness and constant rank. Preprint, submitted July 9, https://arxiv.org/abs/1807.03134.Google Scholar
- [24] (2017) Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1):408–437.Crossref, Google Scholar
- [25] (2011) Adaptive coordinate descent. Proc. 13th Annu. Conf. Genetic Evolutionary Comput. (Association for Computing Machinery, New York), 885–892.Google Scholar
- [26] (2020) A distributed flexible delay-tolerant proximal gradient algorithm. SIAM J. Optim. 30(1):933–959.Crossref, Google Scholar
- [27] (2017) Adaptive sampling probabilities for non-smooth optimization. Proc. 34th Internat. Conf. Machine Learn. (Proc. Machine Learn. Res.), 70:2574–2583.Google Scholar
- [28] (2014) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2):307–337.Crossref, Google Scholar
- [29] (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.Crossref, Google Scholar
- [30] (2017) Let’s make block coordinate descent go fast: Faster greedy rules, message-passing, active-set complexity, and superlinear convergence. Preprint, submitted December 23, https://arxiv.org/abs/1712.08859.Google Scholar
- [31] (2015) Coordinate descent converges faster with the Gauss-Southwell rule than random selection. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (Proc. Machine Learn. Res.), 37:1632–1641.Google Scholar
- [32] (2013) Safe screening of non-support vectors in pathwise SVM computation. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn. (Proc. Machine Learn. Res.), 38:1382–1390.Google Scholar
- [33] (2017) Faster coordinate descent via adaptive importance sampling. Singh A, Zhu J, eds. Proc. 20th Internat. Conf. Artificial Intelligence Statist. (Proc. Machine Learn. Res.), 54:869–877.Google Scholar
- [34] (2018) Local convergence properties of SAGA/Prox-SVRG and acceleration. Proc. 35th Internat. Conf. Machine Learn. (Proc. Machine Learn. Res.), 80:4124–4132.Google Scholar
- [35] (2016) Coordinate descent with arbitrary sampling I: Algorithms and complexity. Optim. Methods Software 31(5):829–857.Crossref, Google Scholar
- [36] (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Programing 144(1-2):1–38.Crossref, Google Scholar
- [37] (2016) On optimal probabilities in stochastic coordinate descent methods. Optim. Lett. 10(6):1233–1243.Crossref, Google Scholar
- [38] (2017) Safe adaptive importance sampling. Adv. Neural Inform. Processing Systems 30:4381–4391.Google Scholar
- [39] (2019) Are we there yet? Manifold identification of gradient-related proximal methods. 22nd Internat. Conf. Artificial Intelligence Statist. (Proc. Machine Learn. Res.), 89:1110–1119.Google Scholar
- [40] (2018) A simplified view of first order methods for optimization. Math. Programming 40(1):1–30.Google Scholar
- [41] (2005) Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B. Statist. Methodological 67(1):91–108.Crossref, Google Scholar
- [42] (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3):475–494.Crossref, Google Scholar
- [43] (2015) Model selection with low complexity priors. Inform. Inference 4(3):230–287.Google Scholar
- [44] (1993) Identifiable surfaces in constrained optimization. SIAM J. Control Optim. 31(4):1063–1079.Crossref, Google Scholar
- [45] (2012) Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1):159–186.Crossref, Google Scholar
- [46] (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.Crossref, Google Scholar
- [47] (2011) Efficient methods for overlapping group lasso. Adv. Neural Inform. Processing Systems 24:352–360.Google Scholar
- [48] (2015) Stochastic optimization with importance sampling for regularized loss minimization. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (Proc. Machine Learn. Res.), 37:1–9.Google Scholar

