A Randomized Block-Coordinate Primal-Dual Method for Large-Scale Stochastic Saddle Point Problems

Published Online:https://doi.org/10.1287/ijoo.2024.0056

References

  • Alacaoglu A, Cevher V, Wright SJ (2022a) On the complexity of a practical primal-dual coordinate method. Preprint, submitted January 19, https://arxiv.org/abs/2201.07684.Google Scholar
  • Alacaoglu A, Fercoq O, Cevher V (2022b) On the convergence of stochastic primal-dual hybrid gradient. SIAM J. Optim. 32(2):1288–1318.Google Scholar
  • Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust Optimization (Princeton University Press, Princeton, NJ).Google Scholar
  • Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math. Programming 134(1):127–155.Google Scholar
  • Chambolle A, Pock T (2011) A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1):120–145.Google Scholar
  • Chambolle A, Pock T (2016) On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Programming 159(1–2):253–287.Google Scholar
  • Chambolle A, Ehrhardt MJ, Richtárik P, Schönlieb CB (2018) Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging application. SIAM J. Optim. 28(4):2783–2808.Google Scholar
  • Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):1–27.Google Scholar
  • Chen Y, Lan G, Ouyang Y (2014) Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4):1779–1814.Google Scholar
  • Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.Google Scholar
  • Ghadimi S, Lan G (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Programming 156(1–2):59–99.Google Scholar
  • Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1–2, Ser. A):267–305.Google Scholar
  • Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J. Machine Learn. Res. 12(July):2211–2268. Google Scholar
  • Hamedani EY, Aybat NS (2021) A primal-dual algorithm with line search for general convex-concave saddle point problems. SIAM J. Optim. 31(2):1299–1329.Google Scholar
  • Hamedani EY, Jalilzadeh A, Aybat NS (2023) Randomized primal-dual methods with adaptive step-sizes. Ruiz F, Dy J, van de Meent JW, eds. Proc. 26th Internat. Conf. Artificial Intelligence Statist., vol. 206 (PMLR, New York), 11185–11212.Google Scholar
  • He Y, Monteiro RD (2016) An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1):29–56.Google Scholar
  • Jalilzadeh A, Shanbhag UV, Blanchet JH, Glynn PW (2018) Optimal smoothed variable sample-size accelerated proximal methods for structured nonsmooth stochastic convex programs. Preprint, submitted March 2, https://arxiv.org/abs/1803.00718.Google Scholar
  • Juditsky A, Nemirovski A, Tauvel C (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.LinkGoogle Scholar
  • Kolossoski O, Monteiro RD (2017) An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex–concave saddle-point problems. Optim. Methods Software 32(6):1244–1272.Google Scholar
  • Lanckriet GR, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J. Machine Learn. Res. 5(Jan):27–72.Google Scholar
  • Lei J, Shanbhag UV (2022) Asynchronous variance-reduced schemes for composite non-convex stochastic optimization: Block-specific step lengths and adapted batch-sizes. Optim. Methods Software 37(1):264–294.Google Scholar
  • Namkoong H, Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY) 2208–2216.Google Scholar
  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Google Scholar
  • Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.Google Scholar
  • Peng Z, Wu T, Xu Y, Yan M, Yin W (2016) Coordinate friendly structures, algorithms and applications. Preprint, submitted January 5, https://arxiv.org/abs/1601.00863.Google Scholar
  • Richtárik P, Takáč M (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Programming 144(1–2):1–38.Google Scholar
  • Robbins H, Siegmund D (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi J, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
  • Shalev-Shwartz S, Wexler Y (2016) Minimizing the maximal loss: How and why. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 793–801.Google Scholar
  • Shivaswamy PK, Jebara T (2007) Ellipsoidal machines. Meila M, Shen X, eds. Proc. Eleventh Internat. Conf. Artificial Intelligence Statist., vol. 2 (PMLR, New York), 484–491.Google Scholar
  • Tseng P (2008) On accelerated proximal gradient methods for convex-concave optimization. Accessed January 31, 2026, http://www.mit.edu/∼dimitrib/PTseng/papers/apgm.pdf.Google Scholar
  • Xing EP, Jordan MI, Russell SJ, Ng AY (2003) Distance metric learning with application to clustering with side-information. Becker S, Thrun S, Obermayer K, eds. Adv. Neural Inform. Processing Systems, vol. 15 (Curran Associates, Inc. Red Hook, NY), 521–528.Google Scholar
  • Xu Y (2017) First-order methods for constrained convex programming based on linearized augmented Lagrangian function. Preprint, submitted November 21, https://arxiv.org/abs/1711.08020.Google Scholar
  • Xu Y (2018) Primal-dual stochastic gradient method for convex programs with many functional constraints. Preprint, submitted February 8, https://arxiv.org/abs/1802.02724.Google Scholar
  • Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3):1758–1789.Google Scholar
  • Yu H, Neely MJ (2017) A primal-dual parallel method with O(1/ϵ) convergence for constrained composite convex programs. Preprint, submitted July 30, https://arxiv.org/abs/1708.00322.Google Scholar
  • Zeng ZQ, Yu HB, Xu HR, Xie YQ, Gao J (2008) Fast training support vector machines using parallel sequential minimal optimization. Proc. 3rd Internat. Conf. Intelligent System Knowledge Engrg., vol. 1 (IEEE, New York), 997–1001.Google Scholar
  • Zhang X, Aybat NS, Gurbuzbalaban M (2022) SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Inc. Red Hook, NY), 21668–21681.Google Scholar
  • Zhang X, Aybat NS, Gürbüzbalaban M (2024) Robust accelerated primal-dual methods for computing saddle points. SIAM J. Optim. 34(1):1097–1130.Google Scholar
  • Zhang J, Wang M, Hong M, Zhang S (2023) Primal-dual first-order methods for affinely constrained multi-block saddle point problems. SIAM J. Optim. 33(2):1035–1060.Google Scholar
  • Zhao R (2022) Accelerated stochastic algorithms for convex-concave saddle-point problems. Math. Oper. Res. 47(2):1443–1473.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.