A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning
Abstract
In distributed computing, collaborative machine learning enables multiple clients to train a global model collaboratively. In this paper, we present two strategies for addressing fairness in collaborative machine learning based on each client generating a synthetic data set by solving a bilevel optimization problem aimed at ensuring that the global model yields fair predictions. In our first strategy, clients pass their synthetic data sets (or closely related synthetic data sets that additionally preserve differential privacy) to the server. These data sets are used by the server to train the global model using conventional machine learning techniques, eliminating the need to handle fairness-specific aggregation. This approach requires only a single communication round, maintains data privacy (when we integrate a differential privacy-preserving enhancement), and promotes fairness. Our second strategy employs a well-known federated averaging framework but where clients use their synthetic data sets to compute the required local model updates that are passed frequently to the server. For this approach, privacy is preserved by passing model parameters, and fairness is attained by using our synthetic data sets. Our two approaches (one-shot versus many-shot) are complementary to each other. Empirical results demonstrate that our methods are effective in reducing unfairness.

