A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning

Chia-Yuan Wu
Corresponding Author
Chia-Yuan Wu
[email protected]
https://orcid.org/0009-0008-2019-0960
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author
,
Frank E. Curtis
Frank E. Curtis
[email protected]
https://orcid.org/0000-0001-7214-9187
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author
,
Daniel P. Robinson
Daniel P. Robinson
[email protected]
https://orcid.org/0000-0003-0251-4227
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Search for more papers by this author

Chia-Yuan Wu

Corresponding Author

Chia-Yuan Wu

[email protected]

https://orcid.org/0009-0008-2019-0960

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Frank E. Curtis

[email protected]

https://orcid.org/0000-0001-7214-9187

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Daniel P. Robinson

[email protected]

https://orcid.org/0000-0003-0251-4227

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015

Search for more papers by this author

Published Online:29 Jan 2026https://doi.org/10.1287/ijoo.2025.0091

References

Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. Proc. 2016 ACM SIGSAC Conf. Comput. Comm. Security (Association for Computing Machinery, New York), 308–318.Google Scholar
Buolamwini J, Gebru T (2018) Gender shades: Intersectional accuracy disparities in commercial gender classification. Proc. 1st Conf. Fairness Accountability Transparency (PMLR, New York), 77–91.Google Scholar
Calmon F, Wei D, Vinzamuri B, Natesan Ramamurthy K, Varshney KR (2017) Optimized pre-processing for discrimination prevention. von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, eds. NIPS’17: Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3995–4004.Google Scholar
Caton S, Haas C (2024) Fairness in machine learning: A survey. ACM Comput. Surveys 56(7):1–38.Google Scholar
d’Alessandro B, O’Neil C, LaGatta T (2017) Conscientious classification: A data scientist’s guide to discrimination-aware classification. Big Data 5(2):120–134.Google Scholar
Dressel J, Farid H (2018) The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4(1):eaao5580.Google Scholar
Dwork C (2006) Differential privacy. Bugliesi M, Preneel B, Sassone V, Wegener I, eds. Automata Languages Programming. ICALP 2006, Lecture Notes in Computer Science, vol. 4052 (Springer, Berlin, Heidelberg), 1–12.Google Scholar
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Foundations Trends Theoret. Comput. Sci. 9(3–4):211–407.Google Scholar
Ezzeldin YH, Yan S, He C, Ferrara E, Avestimehr AS (2023) FairFED: Enabling group fairness in federated learning. Proc. AAAI Conf. Artificial Intelligence 37(6):7494–7502.Google Scholar
Fang ML, Dhami DS, Kersting K (2022) DP-CTGAN: Differentially private medical data generation using CTGANs. Michalowski M, Abidi SSR, Abidi S, eds. Artificial Intelligence Medicine. AIME 2022, Lecture Notes in Computer Science, vol. 13263 (Springer, Cham, Switzerland), 178–188.Google Scholar
Flaticon (2023) Illustration icons. Accessed June 2, 2023, https://www.flaticon.com/.Google Scholar
Giovannelli T, Kent GD, Vicente LN (2024) Bilevel optimization with a multi-objective lower-level problem: Risk-neutral and risk-averse formulations. Optim. Methods Software 39(4):1–23.Google Scholar
Goetz J, Tewari A (2020) Federated learning via synthetic data. Preprint, submitted August 11, https://arxiv.org/abs/2008.04489.Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Comm. ACM 63(11):139–144.Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc. IEEE Conf. Comp. Vision Pattern Recognition (IEEE Computer Society, Washington, DC), 770–778.Google Scholar
Hsu TMH, Qi H, Brown M (2019) Measuring the effects of non-identical data distribution for federated visual classification. Preprint, submitted September 13, https://arxiv.org/abs/1909.06335.Google Scholar
Hu S, Goetz J, Malik K, Zhan H, Liu Z, Liu Y (2022) FedSynth: Gradient compression via synthetic data in federated learning. Preprint, submitted August 4, https://arxiv.org/abs/2204.01273.Google Scholar
Jordon J, Yoon J, Van Der Schaar M (2018) PATE-GAN: Generating synthetic data with differential privacy guarantees. Internat. Conf. Learn. Representations.Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. Flach PA, De Bie T, Cristianini N, eds. Machine Learn. Knowledge Discovery Databases. ECML PKDD 2012, Lecture Notes in Computer Science, vol. 7524 (Springer, Berlin, Heidelberg), 35–50.Google Scholar
Kim E, Bryant D, Srikanth D, Howard A (2021) Age bias in emotion detection: An analysis of facial emotion recognition performance on young, middle-aged, and older adults. Proc. 2021 AAAI/ACM Conf. AI Ethics Soc. (Association for Computing Machinery, New York), 638–644.Google Scholar
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proc. 3rd Internat. Conf. Learn. Representations (ICLR) (San Diego, CA).Google Scholar
Laal M, Laal M (2012) Collaborative learning: What is it? Procedia-Soc. Behav. Sci. 31:491–495.Google Scholar
Liu J, Cui P (2025) Data heterogeneity modeling for trustworthy machine learning. Proc. 31st ACM SIGKDD Conf. Knowledge Discovery Data Mining, vol. 2 (Association for Computing Machinery, New York), 6086–6095.Google Scholar
Lohia PK, Ramamurthy KN, Bhide M, Saha D, Varshney KR, Puri R (2019) Bias mitigation post-processing for individual and group fairness. ICASSP 2019-2019 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 2847–2851.Google Scholar
McMahan B, Ramage D (2017) Federated learning: Collaborative machine learning without centralized training data. Google AI Blog (April 6), https://research.google/blog/federated-learning-collaborative-machine-learning-without-centralized-training-data/.Google Scholar
McMahan B, Moore E, Ramage D, Hampson S, Agüera y Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence Statist. (PMLR), 1273–1282.Google Scholar
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput. Surveys 54(6):1–35.Google Scholar
Mugunthan V, Gokul V, Kagal L, Dubnov S (2021) Bias-free FedGAN: A federated approach to generate bias-free datasets. Preprint, submitted March 17, https://arxiv.org/abs/2103.09876.Google Scholar
Nocedal J, Wright SJ (2006) Numerical Optimization, Springer Series in Operations Research and Financial Engineering (Springer, New York).Google Scholar
Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453.Google Scholar
Pan Z, Wang S, Li C, Wang H, Tang X, Zhao J (2023) FedMDFG: Federated learning with multi-gradient descent and fair guidance. Proc. AAAI Conf. Artificial Intelligence 37(8):9364–9371.Google Scholar
Pessach D, Shmueli E (2022) A review on fairness in machine learning. ACM Comput. Surveys 55(3):1–44.Google Scholar
Rasouli M, Sun T, Rajagopal R (2020) FedGAN: Federated generative adversarial networks for distributed data. Preprint, submitted June 12, https://arxiv.org/abs/2006.07228.Google Scholar
Salazar T, Fernandes M, Araújo H, Abreu PH (2023) FAIR-FATE: Fair federated learning with momentum. Mikyška J, de Mulatier C, Paszynski M, Krzhizhanovskaya VV, Dongarra JJ, Sloot PM, eds. Comput. Sci. ICCS 2023, Lecture Notes in Computer Science, vol. 14073 (Springer, Cham, Switzerland), 524–538.Google Scholar
Savard G, Gauvin J (1994) The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15(5):265–272.Google Scholar
Sinha A, Malo P, Deb K (2017) A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Trans. Evolutionary Comput. 22(2):276–295.Google Scholar
Van der Laan P (2001) The 2001 census in the Netherlands: Integration of registers and surveys. Conf. Cathie Marsh Centre (Institut National de la Statistique et des Études Économiques, Montrouge, France), 1–24.Google Scholar
Wang Z, Shu K, Culotta A (2021) Enhancing model robustness and fairness with causality: A regularization approach. Preprint, submitted October 3, https://arxiv.org/abs/2110.00911.Google Scholar
Wang T, Zhu JY, Torralba A, Efros AA (2018) Dataset distillation. Preprint, submitted November 27, https://arxiv.org/abs/1811.10959.Google Scholar
Wang J, Pal A, Yang Q, Kant K, Zhu K, Guo S (2022) Collaborative machine learning: Schemes, robustness, and privacy. IEEE Trans. Neural Networks Learn. Systems 34(12):9625–9642.Google Scholar
Wightman LF (1998) LSAC national longitudinal bar passage study. LSAC Research Report Series, Law School Admission Council, Newtown, PA.Google Scholar
Xie L, Lin K, Wang S, Wang F, Zhou J (2018) Differentially private generative adversarial network. Preprint, submitted February 19, https://arxiv.org/abs/1802.06739.Google Scholar
Zafar MB, Valera I, Gomez-Rodriguez M, Gummadi KP (2019) Fairness constraints: A flexible approach for fair classification. J. Machine Learn. Res. 20(1):2737–2778.Google Scholar
Zafar MB, Valera I, Rogriguez MG, Gummadi KP (2017) Fairness constraints: Mechanisms for fair classification. Artificial Intelligence Statist. (PMLR, New York), 962–970.Google Scholar

cover image INFORMS Journal on Optimization

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:June 30, 2025
Accepted:December 23, 2025
Published Online:January 29, 2026

Cite as

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson (2026) A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning. INFORMS Journal on Optimization 0(0).

https://doi.org/10.1287/ijoo.2025.0091

Keywords

PDF download

Available Issues

Available Issues

Available Issues

A Bilevel Optimization Approach for Computing Synthetic Data to Mitigate Unfairness in Collaborative Machine Learning

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News