Implications of Data Anonymization on the Statistical Evidence of Disparity

Published Online:https://doi.org/10.1287/mnsc.2021.4028

References

  • Abowd JM, Schmutte IM (2019) An economic analysis of privacy protection and statistical accuracy as social choices. Amer. Econom. Rev. 109(1):171–202.CrossrefGoogle Scholar
  • Acquisti A, Fong C (2020) An experiment in hiring discrimination via online social networks. Management Sci. 66(3):1005–1024.LinkGoogle Scholar
  • Acquisti A, Brandimarte L, Loewenstein G (2015) Privacy and human behavior in the age of information. Science 347(6221):509–514.CrossrefGoogle Scholar
  • Adler NE, Stead WW (2015) Patients in context: EHR capture of social and behavioral determinants of health. Obstet. Gynecol. Survey 70(6):388–390.CrossrefGoogle Scholar
  • Agarwal S (2020) Trade-offs between fairness, interpretability, and privacy in machine learning. Master’s thesis, University of Waterloo, Waterloo, Canada.Google Scholar
  • Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Approximation algorithms for k-anonymity. J. Privacy Tech.Google Scholar
  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. SIGMOD’00 Proc. 2000 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 439–450.Google Scholar
  • Agresti A (2002) Categorical Data Analysis, 2nd ed. (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Ali J (2018) Validating leaked passwords with k-anonymity. Accessed April 29, 2020, https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/.Google Scholar
  • Apple Inc. (2020) Differential privacy. Accessed April 29, 2020, https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf.Google Scholar
  • Article 29 Data Protection Working Party (2014) Opinion 05/2014 on anonymisation techniques. Accessed April 29, 2020, https://ec.europa.eu/justice/article-29/documentation/.Google Scholar
  • Bagdasaryan E, Poursaeed O, Shmatikov V (2019) Differential privacy has disparate impact on model accuracy. Adv. Neural Inform. Processing Systems 32:15479–15488.Google Scholar
  • Bambauer J, Muralidhar K, Sarathy R (2013) Fool’s gold: An illustrated critique of differential privacy. Vanderbilt J. Entertainment Tech. Law 16:701–755.Google Scholar
  • Barnett A (1982) An underestimated threat to multiple regression analyses used in job discrimination cases. Indust. Relations Law J. 5(1):156–173.Google Scholar
  • Barnum P, Liden RC, DiTomaso N (1995) Double jeopardy for women and minorities: Pay differences with age. Acad. Management J. 38(3):863–880.Google Scholar
  • Bland JM, Altman DG (2000) Statistics notes. The odds ratio. BMJ 320(7247):1468.CrossrefGoogle Scholar
  • Castilla EJ (2008) Gender, race, and meritocracy in organizational careers. Amer. J. Sociol. 113(6):1479–1526.CrossrefGoogle Scholar
  • Chang KW, Prabhakaran V, Ordonez V (2019) Bias and fairness in natural language processing. Baldwin T, Carpuat M, eds. Proc. 2019 Conf. Empirical Methods Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Chaudhuri K, Monteleoni C, Sarwate AD (2011) Differentially private empirical risk minimization. J. Machine Learn. Res. 12:1069–1109.Google Scholar
  • Cohen A, Nissim K (2020) Toward formalizing the GDPR’s notion of singling out. Proc. Natl. Acad. Sci. USA 117(15):8344–8352.CrossrefGoogle Scholar
  • Cowgill B, Tucker CE (2020) Algorithmic fairness and economics. Preprint, revised September 25, 2020, http://dx.doi.org/10.2139/ssrn.3361280.Google Scholar
  • Cumming G (2009) Inference by eye: Reading the overlap of independent confidence intervals. Statist. Med. 28(2):205–220.CrossrefGoogle Scholar
  • Danziger J, Ángel Armengol de la Hoz M, Li W, Komorowski M, Octávio Deliberato R, Rush BN, Mukamal KJ, Celi L, Badawi O (2020) Temporal trends in critical care outcomes in United States minority serving hospitals. Amer. J. Respiratory Critical Care Med. 201(6):681–687.CrossrefGoogle Scholar
  • Du W, Teng Z, Zhu Z (2008) Privacy-MaxEnt: Integrating background knowledge in privacy quantification. SIGMOD’08 Proc. 2008 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 459–472.Google Scholar
  • Dwork C, Mulligan DK (2013) It’s not privacy, and it’s not fair. Stanford Law Rev. Online 66:35–40.Google Scholar
  • Dwork C, Kohli N, Mulligan D (2019) Differential privacy in practice: Expose your epsilons! J. Privacy Confidentiality 9(2):1–22.CrossrefGoogle Scholar
  • Dwork C, McSherry F, Nissim K, Smith A (2016) Calibrating noise to sensitivity in private data analysis. J. Privacy Confidentiality 7(3):17–51.CrossrefGoogle Scholar
  • Dwork C, Smith A, Steinke T, Ullman J (2017) Exposed! A survey of attacks on private data. Annual Rev. Statist. Appl. 4:61–84.CrossrefGoogle Scholar
  • Ekstrand MD, Joshaghani R, Mehrpouyan H (2018) Privacy for all: Ensuring fair and equitable privacy protections. Proc. First Conf. Fairness, Accountability and Transparency, Proceedings of Machine Learning Research, vol. 81 (Microtome Publishing, Brookline, MA), 35–47.Google Scholar
  • Éltetö Ö, Frigyes E (1968) New income inequality measures as efficient tools for causal analysis and planning. Econometrica 36(2):383–396.CrossrefGoogle Scholar
  • Erlingsson Ú, Pihur V, Korolova A (2014) RAPPOR: Randomized aggregatable privacy-preserving ordinal response. CCS’14 Proc. 2014 ACM SIGSAC Conf. Comput. Comm. Security (Association for Computing Machinery, New York), 1054–1067.Google Scholar
  • Everett RS, Wojtkiewicz RA (2002) Difference, disparity, and race/ethnic bias in federal sentencing. J. Quant. Criminol. 18(2):189–211.CrossrefGoogle Scholar
  • Finnish Social Science Data Archive (2020) Data management guidelines. Accessed April 29, 2020, https://www.fsd.tuni.fi/aineistonhallinta/en/.Google Scholar
  • Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38.CrossrefGoogle Scholar
  • Foulds J, Geumlek J, Welling M, Chaudhuri K (2016) On the theory and practice of privacy-preserving Bayesian data analysis. Ihler A, Janzing D, eds. UAI’16 Proc. 32nd Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 192–201.Google Scholar
  • Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surveys 42(4):1–53.CrossrefGoogle Scholar
  • Ganta SR, Kasiviswanathan SP, Smith A (2008) Composition attacks and auxiliary information in data privacy. KDD’08 Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 265–273.Google Scholar
  • Garaud MC (1990) Legal standards and statistical proof in Title VII litigation: In search of a coherent disparate impact model. Univ. Pennsylvania Law Rev. 139(2):455–503.CrossrefGoogle Scholar
  • Gastwirth JL (2019) The role of statistical evidence in civil cases. Annual Rev. Statist. Appl. 7:39–60.CrossrefGoogle Scholar
  • Goldwasser S, Micali S, Rackoff C (1989) The knowledge complexity of interactive proof systems. SIAM J. Comput. 18(1):186–208.CrossrefGoogle Scholar
  • Gupta VK, Mortal SC, Silveri S, Sun M, Turban DB (2020) You’re fired! Gender disparities in CEO dismissal. J. Management 46(4):560–582.CrossrefGoogle Scholar
  • Hajian S, Domingo-Ferrer J, Monreale A, Pedreschi D, Giannotti F (2015) Discrimination- and privacy-aware patterns. Data Mining Knowledge Discovery 29(6):1733–1782.CrossrefGoogle Scholar
  • Hay M, Machanavajjhala A, Miklau G, Chen Y, Zhang D (2016) Principled evaluation of differentially private algorithms using DPBench. SIGMOD’16 Proc. 2016 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 139–154.Google Scholar
  • Hebert PL, Sisk JE, Howell EA (2008) When does a difference become a disparity? Conceptualizing racial and ethnic disparities in health. Health Affairs 27(2):374–382.CrossrefGoogle Scholar
  • Holtzclaw Williams P (2011) Policy framework for rare disease health disparities. Policy Polit. Nursing Practice 12(2):114–118.CrossrefGoogle Scholar
  • Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. SIGMOD’05 Proc. 2005 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery), 37–48.Google Scholar
  • John LK, Loewenstein G, Acquisti A, Vosgerau J (2018) When and why randomized response techniques (fail to) elicit the truth. Organ. Behav. Human Decision Processes 148:101–123.CrossrefGoogle Scholar
  • Johnson N, Near JP, Song D (2018) Toward practical differential privacy for sql queries. Proc. VLDB Endowment 11(5):526–539.CrossrefGoogle Scholar
  • Kashid A, Kulkarni V, Patankar R (2015) Discrimination prevention using privacy preserving techniques. Internat. J. Comput. Appl. 120(1):45–49.Google Scholar
  • Kelley E, Moy E, Stryer D, Burstin H, Clancy C (2005) The national healthcare quality and disparities reports: An overview. Med. Care 43(3 Suppl.):I3–I8.CrossrefGoogle Scholar
  • Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. SIGMOD’11 Proc. 2011 ACM SIGMOD Internat. Conf. Management of Data (Association for Computing Machinery, New York), 193–204.Google Scholar
  • King AG (2006) Gross statistical disparities as evidence of a pattern and practice of discrimination: Statistical vs. legal significance. Labor Lawyer 22(3):271–292.Google Scholar
  • Kleinberg J, Mullainathan S (2019) Simplicity creates inequity: Implications for fairness, stereotypes, and interpretability. EC’19 Proc. 2019 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 807–808.Google Scholar
  • Li XB, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Management Sci. 59(4):796–812.LinkGoogle Scholar
  • Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. ASIACCS’12 Proc. 7th ACM Sympos. Inform. Comput. Comm. Security (Association for Computer Machinery, New York), 32–33.Google Scholar
  • Li C, Hay M, Miklau G, Wang Y (2014) A data-and workload-aware algorithm for range queries under differential privacy. Proc. VLDB Endowment 7(5):341–352.CrossrefGoogle Scholar
  • Lipton Z, McAuley J, Chouldechova A (2018) Does mitigating ML’s impact disparity require treatment disparity? Adv. Neural Inform. Processing Systems 31:8125–8135.Google Scholar
  • Macagnone M (2019) Efforts to safeguard census data could muddy federal data. Government Tech. (December 17), https://www.govtech.com/analytics/Efforts-to-Safeguard-Census-Data-Could-Muddy-Federal-Data.html.Google Scholar
  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007)ℓl-diversity: Privacy beyond k-anonymity. ACM Trans. Knowledge Discovery Data 1(1):3.CrossrefGoogle Scholar
  • May P, Garrido MM, Cassel JB, Morrison RS, Normand C (2016) Using length of stay to control for unobserved heterogeneity when estimating treatment effect on hospital costs with observational data: Issues of reliability, robustness, and usefulness. Health Services Res. 51(5):2020–2043.CrossrefGoogle Scholar
  • Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. PODS’04 Proc. Twenty-Third ACM SIGMOD-SIGACT-SIGART Sympos. Principles Database Systems (Association for Computing Machinery, New York), 223–228.Google Scholar
  • Muralidhar K, Batra D, Kirs PJ (1995) Accessibility, security, and accuracy in statistical databases: The case for the multiplicative fixed data perturbation approach. Management Sci. 41(9):1549–1564.LinkGoogle Scholar
  • Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Management Sci. 45(10):1399–1415.LinkGoogle Scholar
  • National Research Council (2004) Measuring Racial Discrimination (National Academies Press, Washington, DC).Google Scholar
  • Nolan J (2003) Stable Distributions: Models for Heavy-Tailed Data (Birkhauser, New York).CrossrefGoogle Scholar
  • Osborne C (1991) Statistical calibration: A review. Internat. Statist. Rev. 59(3):309–336.CrossrefGoogle Scholar
  • Pager D, Shepherd H (2008) The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets. Annual Rev. Sociol. 34:181–209.CrossrefGoogle Scholar
  • Pitoura E, Tsaparas P, Flouris G, Fundulaki I, Papadakos P, Abiteboul S, Weikum G (2018) On measuring bias in online information. SIGMOD Rec. 46(4):16–21.CrossrefGoogle Scholar
  • Pujol D, McKenna R, Kuppam S, Hay M, Machanavajjhala A, Miklau G (2020) Fair decision making using privacy-protected data. FAT*’20 Proc. 2020 Conf. Fairness Accountability Transparency (Association for Computing Machinery, New York), 189–199.Google Scholar
  • Rambachan A, Kleinberg J, Mullainathan S, Ludwig J (2020) An economic approach to regulating algorithms. NBER Working Paper 27111, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Rocher L, Hendrickx JM, De Montjoye YA (2019) Estimating the success of re-identifications in incomplete datasets using generative models. Nation Commun. 10(1):1–9.CrossrefGoogle Scholar
  • Rosenbaum PR (2010) Design of Observational Studies, 1st ed., Springer Series in Statistics (Springer, New York).Google Scholar
  • Ruggieri S, Hajian S, Kamiran F, Zhang X (2014) Anti-discrimination analysis using privacy attack strategies. Calders T, Esposito F, Hüllermeier E, Meo R, eds. Proc. Joint Eur. Conf. Machine Learning Knowledge Discovery Databases, Lecture Notes in Computer Science, vol. 8725 (Springer, Berlin), 694–710.Google Scholar
  • Santos-Lozada AR, Howard JT, Verdery AM (2020) How differential privacy will affect our understanding of health disparities in the United States. Proc. Natl. Acad. Sci. USA 117(24):13405–13412.Google Scholar
  • Siegel PA, Hambrick DC (2005) Pay disparities within top management groups: Evidence of harmful effects on performance of high-technology firms. Organ. Sci. 16(3):259–274.LinkGoogle Scholar
  • Steen P, Cherney B (1996) Evolution of analytical tools by Mediqual Systems, Inc. Amer. J. Med. Qual. 11(1):S15–S17.Google Scholar
  • Sweeney L (2000) Simple demographics often identify people uniquely. Health 671:1–34.Google Scholar
  • Sweeney L (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 10(05):571–588.CrossrefGoogle Scholar
  • Sweeney L (2002b) k-anonymity: A model for protecting privacy. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 10(05):557–570.CrossrefGoogle Scholar
  • Tang J, Korolova A, Bai X, Wang X, Wang X (2017) Privacy loss in Apple’s implementation of differential privacy on MacOS 10.12. Preprint, submitted September 8, https://arxiv.org/abs/1709.02753.Google Scholar
  • Templ M, Kowarik A, Meindl B (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J. Statist. Software 67(1):1–36.Google Scholar
  • Texas Department of State Health Services (2019) Texas hospital inpatient discharge public use data file. Accessed April 29, 2020, https://www.dshs.state.tx.us/thcic/hospitals/Inpatientpudf.shtm.Google Scholar
  • Traub JF, Yemini Y, Woźniakowski H (1984) The statistical security of a statistical database. ACM Trans. Database Systems 9(4):672–679.CrossrefGoogle Scholar
  • U.S. Agency for Healthcare Research and Quality (2018) Central distributor SID: Description of data elements. Accessed April 29, 2020, https://www.hcup-us.ahrq.gov/db/vars/siddistnote.jsp.Google Scholar
  • U.S. Department of Health and Human Services (2012) Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rule. Accessed April 29, 2020, https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html.Google Scholar
  • Wang YX (2018) Revisiting differentially private linear regression: Optimal and adaptive prediction & estimation in unbounded domain. Proc. 34th Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 93–103.Google Scholar
  • Xu H, Zhang N (2019) Privacy in health disparity research. Med. Care 57(Suppl. 2):S172–S175.CrossrefGoogle Scholar
  • Zhang X, Pérez-Stable EJ, Bourne PE, Peprah E, Duru OK, Breen N, Berrigan D, et al. (2017) Big data science: Opportunities and challenges to address minority health and health disparities in the 21st century. Ethnicity Disparities 27(2):95–106.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.