Federated Learning Under Adversarial Silence Attacks

Published Online:https://doi.org/10.1287/ijoc.2024.1017

References

  • Allouah Y, Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2023) Fixing by mixing: A recipe for optimal byzantine ML under heterogeneity. Ruiz F, Dy J, van de Meent JW, eds. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 1232–1300.Google Scholar
  • Allouah Y, Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Rizk G, Voitovych S (2024) Byzantine-robust federated learning: Impact of client subsampling and local updates. Salakhutdinov R, Kolter Z, Heller K, Weller A, Oliver N, Scarlett J, Berkenkamp F, eds. Proc. 41st Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 1078–1114.Google Scholar
  • Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199:165–214.CrossrefGoogle Scholar
  • Blanchard P, El Mhamdi EM, Guerraoui R, Stainer J (2017) Machine learning with adversaries: Byzantine tolerant gradient descent. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Boche H, Schaefer RF, Poor HV (2020) Denial-of-service attacks on communication systems: Detectability and jammer knowledge. IEEE Trans. Signal Processing 68:3754–3768.CrossrefGoogle Scholar
  • Bonomi S, Del Pozzo A, Potop-Butucaru M, Tixeuil S (2019) Approximate agreement under mobile byzantine faults. Theoret. Comput. Sci. 758:17–29.CrossrefGoogle Scholar
  • Caldas S, Duddu SMK, Wu P, Li T, Konečnỳ J, McMahan HB, Smith V, Talwalkar A (2018) LEAF: A benchmark for federated settings. Preprint, submitted December 3, https://arxiv.org/abs/1812.01097.Google Scholar
  • Chen J, Micali S (2016) Algorand. Preprint, submitted July 5, https://arxiv.org/abs/1607.01341.Google Scholar
  • Chen Y, Su L, Xu J (2017) Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Measurement Anal. Comput. Systems 1(2):1–25.Google Scholar
  • Cho YJ, Wang J, Joshi G (2022) Towards understanding biased client selection in federated learning. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 151 (PMLR, New York), 10351–10375.Google Scholar
  • Data D, Diggavi S (2021) Byzantine-resilient high-dimensional SGD with local iterations on heterogeneous data. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 2478–2488.Google Scholar
  • Farhadkhani S, Guerraoui R, Gupta N, Pinot R, Stephan J (2022) Byzantine machine learning made easy by resilient averaging of momentums. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 6246–6283.Google Scholar
  • Feng J, Xu H, Mannor S (2014) Distributed robust learning. Preprint, submitted September 21, https://arxiv.org/abs/1409.5937.Google Scholar
  • Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.CrossrefGoogle Scholar
  • Ghosh A, Maity RK, Kadhe S, Mazumdar A, Ramachandran K (2020) Communication efficient and byzantine tolerant distributed learning. Kim YH, Oggier F, Wornell G, Yu W, eds. 2020 IEEE Internat. Sympos. Inform. Theory (IEEE, Piscataway, NJ), 2545–2550.Google Scholar
  • Gu X, Huang K, Zhang J, Huang L (2021) Fast federated learning in the presence of arbitrary device unavailability. Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 12052–12064.Google Scholar
  • Hsu H, Qi H, Brown M (2019) Measuring the effects of non-identical data distribution for federated visual classification. Preprint, submitted September 13, https://arxiv.org/abs/1909.06335.Google Scholar
  • Jhunjhunwala D, Sharma P, Nagarkatti A, Joshi G (2022) FedVARP: Tackling the variance due to partial client participation in federated learning. Cussens J, Zhang K, eds. Proc. 38th Conf. Uncertainty in Artificial Intelligence, Proceedings of Machine Learning Research, vol. 180 (PMLR, New York), 906–916.Google Scholar
  • Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, et al. (2021) Advances and open problems in federated learning. Foundations Trends Machine Learn. 14(1–2):1–210. CrossrefGoogle Scholar
  • Karimireddy SP, He L, Jaggi M (2021) Learning from history for Byzantine robust optimization. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 5311–5319.Google Scholar
  • Karimireddy SP, He L, Jaggi M (2022) Byzantine-robust learning on heterogeneous datasets via bucketing. Finn C, Choi Y, Deisenroth M, eds. Internat. Conf. Learn. Representations (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Karimireddy SP, Kale S, Mohri M, Reddi S, Stich S, Suresh AT (2020) Scaffold: Stochastic controlled averaging for federated learning. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 119 (PMLR, New York), 5132–5143.Google Scholar
  • Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images, https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.Google Scholar
  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.CrossrefGoogle Scholar
  • Li X, Huang K, Yang W, Wang S, Zhang Z (2020b) On the convergence of FedAvg on non-IID data. White M, Cho K, Song D, eds. Internat. Conf. Learn. Representations (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Li T, Sahu AK, Zaheer M, Sanjabi M, Talwalkar A, Smith V (2020a) Federated optimization in heterogeneous networks. Proc. Machine Learn. Systems 2:429–450.Google Scholar
  • Lynch NA (1996) Distributed Algorithms (Morgan Kaufmann, San Mateo).Google Scholar
  • Malinovsky G, Richtárik P, Horváth S, Gorbunov E (2023) Byzantine robustness and partial participation can be achieved simultaneously: Just clip gradient differences. Preprint, submitted November 23, https://arxiv.org/abs/2311.14127.Google Scholar
  • McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. Singh A, Zhu J, eds. Proc. 20th Internat. Conf. Artificial Intelligence and Statist., Proceedings of Machine Learning Research, vol. 54 (PMLR, New York), 1273–1282.Google Scholar
  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) PyTorch: An imperative style, high-performance deep learning library. Wallach H, Larochelle H, Beygelzimer A, d’Alch’e-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Perazzone J, Wang S, Ji M, Chan KS (2022) Communication-efficient device scheduling for federated learning using stochastic optimization. Chen Y, Eryilmaz A, Widmer J, eds. IEEE INFOCOM 2022-IEEE Conf. Comput. Comm. (IEEE, Piscataway, NJ), 1449–1458.Google Scholar
  • Philippenko C, Dieuleveut A (2020) Bidirectional compression in heterogeneous settings for distributed or federated learning with partial participation: Tight convergence guarantees. Preprint, submitted June 25, https://arxiv.org/abs/2006.14591.Google Scholar
  • Pillutla K, Kakade SM, Harchaoui Z (2022) Robust aggregation for federated learning. IEEE Trans. Signal Processing 70:1142–1154.CrossrefGoogle Scholar
  • Ruan Y, Zhang X, Liang SC, Joe-Wong C (2021) Towards flexible device participation in federated learning. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 130 (PMLR, New York), 3403–3411.Google Scholar
  • Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate newton-type method. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 32 (PMLR, New York), 1000–1008.Google Scholar
  • Su L, Vaidya NH (2016) Fault-tolerant multi-agent optimization: Optimal iterative distributed algorithms. Pelc A, ed. Proc. 2016 ACM Sympos. Principles Distributed Comput. (ACM, New York), 425–434.Google Scholar
  • Su L, Xu J (2019) Securing distributed gradient descent in high dimensional statistical learning. Proc. ACM Measurement Anal. Comput. Systems 3(1):1–41.Google Scholar
  • Su L, Xiang M, Xu J, Yang P (2026) Federated learning under adversarial silence attacks. https://doi.org/10.1287/ijoc.2024.1017.cd, https://github.com/INFORMSJoC/2024.1017.Google Scholar
  • Sundaram S, Gharesifard B (2015) Consensus-based distributed optimization with malicious nodes. Beck C, Nedich A, Olshevsky A, eds. 2015 53rd Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE, Piscataway, NJ), 244–249.Google Scholar
  • Wang S, Ji M (2022) A unified analysis of federated learning with arbitrary client participation. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Inc., Red Hook, NY), 19124–19137.Google Scholar
  • Wang S, Ji M (2023) A lightweight method for tackling unknown participation probabilities in federated averaging. Preprint, submitted June 6, https://arxiv.org/abs/2306.03401.Google Scholar
  • Wang J, Liu Q, Liang H, Joshi G, Poor HV (2020b) Tackling the objective inconsistency problem in heterogeneous federated optimization. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 7611–7623.Google Scholar
  • Wang H, Yurochkin M, Sun Y, Papailiopoulos D, Khazaeni Y (2020a) Federated learning with matched averaging. White M, Cho K, Song D, eds. Internat. Conf. Learn. Representations (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Xie C, Koyejo S, Gupta I (2019) Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance. Chaudhuri K, Salakhutdinov R, eds. Proc. 36th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 97 (PMLR, New York), 6893–6901.Google Scholar
  • Yan Y, Niu C, Ding Y, Zheng Z, Tang S, Li Q, Wu F, Lyu C, Feng Y, Chen G (2023) Federated optimization under intermittent client availability. INFORMS J. Comput. 36(1):185–202.LinkGoogle Scholar
  • Yang H, Zhang X, Khanduri P, Liu J (2022) Anarchic federated learning. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 162 (PMLR, New York), 25331–25363.Google Scholar
  • Yin D, Chen Y, Kannan R, Bartlett P (2018) Byzantine-robust distributed learning: Towards optimal statistical rates. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 5650–5659.Google Scholar
  • Yuan X, Li P (2022) On convergence of FedProx: Local dissimilarity invariant bounds, non-smoothness and beyond. Oh AH, Agarwal A, Belgrave D, Cho K, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Inc., Red Hook, NY), 10752–10765.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.