Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems

Published Online:https://doi.org/10.1287/moor.2024.0414

References

  • [1] Abhishek VA, Binny S, Johan TR, Nithin R, Vishal T (2022) Federated learning: Collaborative machine learning without centralized training data. Internat. J. Engrg. Technol. Management Sci. 6(5):355–359.Google Scholar
  • [2] Agarwal RP, Meehan M, O’Regan D (2001) Fixed Point Theory and Applications, vol. 141 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [3] Alacaoglu A, Tran-Dinh Q, Fercoq O, Cevher V (2017) Smooth primal-dual coordinate descent algorithms for nonsmooth convex optimization. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1–9.Google Scholar
  • [4] Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR.org), 214–223.Google Scholar
  • [5] Attouch H, Cabot A (2020) Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Programmng 184(1):243–287.CrossrefGoogle Scholar
  • [6] Attouch H, Peypouquet J, Redont P (2018) Backward–forward algorithms for structured monotone inclusions in Hilbert spaces. J. Math. Anal. Appl. 457(2):1095–1117.CrossrefGoogle Scholar
  • [7] Auslender A (1971) Méthodes numériques pour la décomposition et la minimisation de fonctions non différentiables. Numerische Mathematik 18(3):213–223.CrossrefGoogle Scholar
  • [8] Bauschke HH, Combettes P (2017) Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd ed. (Springer-Verlag, New York).CrossrefGoogle Scholar
  • [9] Beck A, Pauwels E, Sabach S (2015) The cyclic block conditional gradient method for convex optimization problems. SIAM J. Optim. 25(4):2024–2049.CrossrefGoogle Scholar
  • [10] Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust Optimization (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [11] Ben-Tal A, Margalit T, Nemirovski A (2001) The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12(1):79–108.CrossrefGoogle Scholar
  • [12] Bertsekas D, Tsitsiklis JN (1989) Parallel and Distributed Computation: Numerical Methods (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
  • [13] Böhm A (2023) Solving nonconvex-nonconcave min-max problems exhibiting weak Minty solutions. Trans. Machine Learn. Res., vol. 39 (Journal of Machine Learning Research Inc., New York).Google Scholar
  • [14] Bot RI, Sedlmayer M, Vuong PT (2023) A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs. J. Machine Learn. Res. 24(8):1–37.Google Scholar
  • [15] Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.CrossrefGoogle Scholar
  • [16] Burachik RS, Iusem A (2008) Set-Valued Mappings and Enlargements of Monotone Operators (Springer, New York).Google Scholar
  • [17] Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3): 1–27.CrossrefGoogle Scholar
  • [18] Chen Y, Lan G, Ouyang Y (2017) Accelerated schemes for a class of variational inequalities. Math. Programming 165(1):113–149.CrossrefGoogle Scholar
  • [19] Combettes PL, Eckstein J (2018) Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions. Math. Programming 168(1):645–672.CrossrefGoogle Scholar
  • [20] Combettes PL, Pesquet JC (2011) Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Springer-Verlag, Berlin), 185–212.CrossrefGoogle Scholar
  • [21] Combettes PL, Pesquet JC (2015) Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2):1221–1248.CrossrefGoogle Scholar
  • [22] Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model. Simulation 4(4):1168–1200.CrossrefGoogle Scholar
  • [23] Daskalakis C, Ilyas A, Syrgkanis V, Zeng H (2018) Training GANs with optimism. 6th Internat. Conf. Learn. Representations (ICLR 2018) (Vancouver).Google Scholar
  • [24] Davis D (2022) Variance reduction for root-finding problems. Math. Programming 197:375–410.CrossrefGoogle Scholar
  • [25] Davis D, Yin W (2017) A three-operator splitting scheme and its optimization applications. Set-Valued Variational Anal. 25(4):829–858.CrossrefGoogle Scholar
  • [26] Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 1646–1654.Google Scholar
  • [27] Diakonikolas J (2020) Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities. Conf. Learn. Theory (PMLR, New York), 1428–1451.Google Scholar
  • [28] Diakonikolas J, Daskalakis C, Jordan M (2021) Efficient methods for structured nonconvex-nonconcave min-max optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2746–2754.Google Scholar
  • [29] Drusvyatskiy D, Lewis A (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.LinkGoogle Scholar
  • [30] Du W, Xu D, Wu X, Tong H (2021) Fairness-aware agnostic federated learning. Proc. 2021 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 181–189.Google Scholar
  • [31] Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 1–2 (Springer-Verlag, Berlin).Google Scholar
  • [32] Fercoq O, Richtárik P (2015) Accelerated, parallel, and proximal coordinate descent. SIAM J. Optim. 25(4):1997–2023.CrossrefGoogle Scholar
  • [33] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 2672–2680.Google Scholar
  • [34] Halpern B (1967) Fixed points of nonexpanding maps. Bull. Amer. Math. Soc. 73(6):957–961.CrossrefGoogle Scholar
  • [35] Ho E, Rajagopalan A, Skvortsov A, Arulampalam S, Piraveenan M (2022) Game theory in defence applications: A review. Sensors 22(3):1032.CrossrefGoogle Scholar
  • [36] Hsieh CJ, Sustik MA, Dhillon IS, Ravikumar PK, Poldrack R (2013) BIG & QUIC: Sparse inverse covariance estimation for a million variables. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 3165–3173.Google Scholar
  • [37] Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, et al. (2021) Advances and open problems in federated learning. Found Trends Machine Learn. 14(1–2):1–210.Google Scholar
  • [38] Kim D (2021) Accelerated proximal point method for maximally monotone operators. Math. Programming 190:57–87.CrossrefGoogle Scholar
  • [39] Kim D, Fessler JA (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159(1–2):81–107.CrossrefGoogle Scholar
  • [40] Konečnỳ J, McMahan HB, Ramage D, Richtárik P (2016) Federated optimization: Distributed machine learning for on-device intelligence. Preprint, submitted October 8, https://arxiv.org/abs/1610.02527.Google Scholar
  • [41] Konnov I (2001) Combined Relaxation Methods for Variational Inequalities (Springer-Verlag, Berlin).CrossrefGoogle Scholar
  • [42] Korpelevich G (1976) The extragradient method for finding saddle points and other problems. Ekonomika i Matematicheskie Metody 12(4):747–756.Google Scholar
  • [43] Kotsalis G, Lan G, Li T (2022) Simple and optimal methods for stochastic variational inequalities, I: Operator extrapolation. SIAM J. Optim. 32(3):2041–2073.CrossrefGoogle Scholar
  • [44] Kuhn HW, Harsanyi J, Selten R, Weibul J, van Damme E (1996) The work of John Nash in game theory. J. Econom. Theory 69(1):153–185.CrossrefGoogle Scholar
  • [45] Lan G (2020) First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [46] Latafat P, Themelis A, Patrinos P (2019) Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Math. Programming 193:195–224.CrossrefGoogle Scholar
  • [47] Lee S, Kim D (2021) Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 16924–16936.Google Scholar
  • [48] Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37(3):50–60.CrossrefGoogle Scholar
  • [49] Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020) Federated optimization in heterogeneous networks. Dhillon I, Papailiopoulos D, Sze V, eds. Proceedings of Machine Learning and Systems, vol. 2, 429–450.Google Scholar
  • [50] Lieder F (2021) On the convergence rate of the Halpern-iteration. Optim. Lett. 15(2):405–418.CrossrefGoogle Scholar
  • [51] Lions PL, Mercier B (1979) Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numerical Anal. 16(6):964–979.CrossrefGoogle Scholar
  • [52] Luo ZQ, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46(1):157–178.CrossrefGoogle Scholar
  • [53] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Proc. Internat. Conf. Learn. Representations (ICLR) (Vancouver).Google Scholar
  • [54] Maingé PE (2021) Accelerated proximal algorithms with a correction term for monotone inclusions. Appl. Math. Optim. 84(2):2027–2061.CrossrefGoogle Scholar
  • [55] Maingé PE (2022) Fast convergence of generalized forward-backward algorithms for structured monotone inclusions. J. Convex Anal. 29:893–920.Google Scholar
  • [56] Malitsky Y (2015) Projected reflected gradient methods for monotone variational inequalities. SIAM J. Optim. 25(1):502–520.CrossrefGoogle Scholar
  • [57] Malitsky Y, Tam MK (2020) A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM J. Optim. 30(2):1451–1472.CrossrefGoogle Scholar
  • [58] Martinez N, Bertran M, Sapiro G (2020) Minimax pareto fairness: A multi objective perspective. Internat. Conf. Machine Learn. (PMLR, New York), 6755–6764.Google Scholar
  • [59] McMahan HB, Moore E, Ramage D, Hampson S, Aguera y Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence Statist. (PMLR, New York), 1273–1282.Google Scholar
  • [60] Mokhtari A, Ozdaglar A, Pattathil S (2020) A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1497–1507.Google Scholar
  • [61] Namkoong H, Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [62] Nedíc A, Ozdaglar A (2009) Distributed subgradient methods for multi-agent optimization. IEEE Trans. Automatic Control 54(1):48–61.CrossrefGoogle Scholar
  • [63] Nemirovskii A (2004) Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.CrossrefGoogle Scholar
  • [64] Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Soviet Math. Doklady 269:543–547.Google Scholar
  • [65] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87 (Kluwer Academic Publishers, Boston).CrossrefGoogle Scholar
  • [66] Nesterov Y (2007) Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Programming 109(2–3):319–344.CrossrefGoogle Scholar
  • [67] Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.CrossrefGoogle Scholar
  • [68] Nesterov Y, Stich SU (2017) Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM J. Optim. 27(1):110–123.CrossrefGoogle Scholar
  • [69] Ortega JM, Rheinboldt WC (2000) Iterative Solution of Nonlinear Equations in Several Variables (Society for Industrial and Applied Mathematics, Philadelphia).CrossrefGoogle Scholar
  • [70] Ouyang Y, Xu Y (2021) Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Math. Programming 185(1–2):1–35.CrossrefGoogle Scholar
  • [71] Peng Z, Xu Y, Yan M, Yin W (2016) ARock: An algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5):2851–2879.CrossrefGoogle Scholar
  • [72] Pethick T, Patrinos P, Fercoq O, Cevher V (2022) Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems. Proc. 10th Internat. Conf. Learn. Representations (ICLR 2022) (Virtual Conference: International Conference on Learning Representations).Google Scholar
  • [73] Peypouquet J, Sorin S (2010) Evolution equations for maximal monotone operators: Asymptotic analysis in continuous and discrete time. J. Convex Anal. (Oxford) 17(3 & 4):1113–1163.Google Scholar
  • [74] Phelps RR (2009) Convex Functions, Monotone Operators and Differentiability, vol. 1364 (Springer, New York).Google Scholar
  • [75] Popov LD (1980) A modification of the Arrow-Hurwicz method for search of saddle points. Math. Notes Acad. of Sci. USSR 28(5):845–848.Google Scholar
  • [76] Rahimian H, Mehrotra S (2019) Distributionally robust optimization: A review. Preprint, submitted August 13, https://arxiv.org/abs/1908.05659.Google Scholar
  • [77] Richtárik P, Takáč M (2016) Parallel coordinate descent methods for big data optimization. Math. Programming 156(1–2):433–484.CrossrefGoogle Scholar
  • [78] Robbins H, Siegmund D (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
  • [79] Rockafellar R (1976) Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5):877–898.CrossrefGoogle Scholar
  • [80] Rockafellar RT, Wets RJ-B, eds. (2004) Variational Analysis, vol. 317 (Springer-Verlag, Berlin).Google Scholar
  • [81] Ryu EK, Boyd S (2016) Primer on monotone operator methods. Appl. Comput. Math. 15(1):3–43.Google Scholar
  • [82] Ryu EK, Yin W (2022) Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [83] Sharma P, Panda R, Joshi G, Varshney P (2022) Federated minimax optimization: Improved convergence analyses and algorithms. Internat. Conf. Machine Learn. (PMLR, New York), 19683–19730.Google Scholar
  • [84] Sra S, Nowozin S, Wright SJ (2012) Optimization for Machine Learning (MIT Press, Cambridge, MA).Google Scholar
  • [85] Tarzanagh DA, Li M, Thrampoulidis C, Oymak S (2022) Fednest: Federated bilevel, minimax, and compositional optimization. Internat. Conf. Machine Learn. (PMLR, New York), 21146–21179.Google Scholar
  • [86] Tran-Dinh Q (2023) Extragradient-type methods with O(1/k)-convergence rates for co-hypomonotone inclusions. J. Global Optim. 89(1):197–221.CrossrefGoogle Scholar
  • [87] Tran-Dinh Q (2024) From Halpern’s fixed-point iterations to Nesterov’s accelerated interpretations for root-finding problems. Comput. Optim. Appl. 87(1):181–218.CrossrefGoogle Scholar
  • [88] Tran-Dinh Q, Liu D (2022) A new randomized primal-dual algorithm for convex optimization with optimal last iterate rates. Optim. Methods Software 38(1):184–217.CrossrefGoogle Scholar
  • [89] Tran-Dinh Q, Luo Y (2021) Halpern-type accelerated and splitting algorithms for monotone inclusions. Preprint, submitted October 15, https://arxiv.org/abs/2110.08150.Google Scholar
  • [90] Tran-Dinh Q, Pham NH, Phan DT, Nguyen LM (2021) FedDR—Randomized Douglas-Rachford splitting algorithms for nonconvex federated composite optimization. Advances in Neural Information Processing Systems, 1–39.Google Scholar
  • [91] Tseng P (2000) A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38(2):431–446.CrossrefGoogle Scholar
  • [92] Wright SJ (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.CrossrefGoogle Scholar
  • [93] Wright SJ (2017) Optimization algorithms for data analysis. IAS/Park City Mathematics Series, vol. 25 (American Mathematical Society, Providence, RI), 1–49.Google Scholar
  • [94] Yoon T, Ryu EK (2021) Accelerated algorithms for smooth convex-concave minimax problems with O(1/k2) rate on squared gradient norm. Internat. Conf. Machine Learn. (PMLR, New York), 12098–12109.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.