Adversarial Robustness for Latent Models: Revisiting the Robust-Standard Accuracies Tradeoff

Published Online:https://doi.org/10.1287/opre.2022.0162

Over the past few years, several adversarial training methods have been proposed to improve the robustness of machine learning models against adversarial perturbations in the input. Despite remarkable progress in this regard, adversarial training is often observed to drop the standard test accuracy. This phenomenon has intrigued the research community to investigate the potential tradeoff between standard accuracy (a.k.a generalization) and robust accuracy (a.k.a robust generalization) as two performance measures. In this paper, we revisit this tradeoff for latent models and argue that this tradeoff is mitigated when the data enjoys a low-dimensional structure. In particular, we consider binary classification under two data generative models, namely Gaussian mixture model and generalized linear model, where the features data lie on a low-dimensional manifold. We develop a theory to show that the low-dimensional manifold structure allows one to obtain models that are nearly optimal with respect to both, the standard accuracy and the robust accuracy measures. We further corroborate our theory with several numerical experiments, including Mixture of Factor Analyzers (MFA) model trained on the MNIST data set.

Funding: A. Javanmard was partially supported by the Sloan Research Fellowship in mathematics, an Adobe Data Science Faculty Research Award, the National Science Foundation Career Award DMS-1844481, and the National Science Foundation Award 2311024.

Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2022.0162.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.