A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning

Chia-Yuan Wu
Corresponding Author
Chia-Yuan Wu
[email protected]
https://orcid.org/0009-0008-2019-0960
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author
,
Frank E. Curtis
Frank E. Curtis
[email protected]
https://orcid.org/0000-0001-7214-9187
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author
,
Daniel P. Robinson
Daniel P. Robinson
[email protected]
https://orcid.org/0000-0003-0251-4227
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author

Chia-Yuan Wu

Corresponding Author

Chia-Yuan Wu

[email protected]

https://orcid.org/0009-0008-2019-0960

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Frank E. Curtis

[email protected]

https://orcid.org/0000-0001-7214-9187

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Daniel P. Robinson

[email protected]

https://orcid.org/0000-0003-0251-4227

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Published Online:29 Jan 2026https://doi.org/10.1287/ijoo.2025.0091

Abstract

In distributed computing, collaborative machine learning enables multiple clients to train a global model collaboratively. In this paper, we present two strategies for addressing fairness in collaborative machine learning based on each client generating a synthetic data set by solving a bilevel optimization problem aimed at ensuring that the global model yields fair predictions. In our first strategy, clients pass their synthetic data sets (or closely related synthetic data sets that additionally preserve differential privacy) to the server. These data sets are used by the server to train the global model using conventional machine learning techniques, eliminating the need to handle fairness-specific aggregation. This approach requires only a single communication round, maintains data privacy (when we integrate a differential privacy-preserving enhancement), and promotes fairness. Our second strategy employs a well-known federated averaging framework but where clients use their synthetic data sets to compute the required local model updates that are passed frequently to the server. For this approach, privacy is preserved by passing model parameters, and fairness is attained by using our synthetic data sets. Our two approaches (one-shot versus many-shot) are complementary to each other. Empirical results demonstrate that our methods are effective in reducing unfairness.

cover image INFORMS Journal on Optimization

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:June 30, 2025
Accepted:December 23, 2025
Published Online:January 29, 2026

Cite as

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson (2026) A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning. INFORMS Journal on Optimization 0(0).

https://doi.org/10.1287/ijoo.2025.0091

Keywords

PDF download

Available Issues

Available Issues

Available Issues

A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News