Implications of Data Anonymization on the Statistical Evidence of Disparity
Published Online:4 Jun 2021https://doi.org/10.1287/mnsc.2021.4028
References
- (2019) An economic analysis of privacy protection and statistical accuracy as social choices. Amer. Econom. Rev. 109(1):171–202.Crossref, Google Scholar
- (2020) An experiment in hiring discrimination via online social networks. Management Sci. 66(3):1005–1024.Link, Google Scholar
- (2015) Privacy and human behavior in the age of information. Science 347(6221):509–514.Crossref, Google Scholar
- (2015) Patients in context: EHR capture of social and behavioral determinants of health. Obstet. Gynecol. Survey 70(6):388–390.Crossref, Google Scholar
- (2020) Trade-offs between fairness, interpretability, and privacy in machine learning. Master’s thesis, University of Waterloo, Waterloo, Canada.Google Scholar
- (2005) Approximation algorithms for k-anonymity. J. Privacy Tech.Google Scholar
- (2000) Privacy-preserving data mining. SIGMOD’00 Proc. 2000 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 439–450.Google Scholar
- (2002) Categorical Data Analysis, 2nd ed. (John Wiley & Sons, New York).Crossref, Google Scholar
- (2018) Validating leaked passwords with k-anonymity. Accessed April 29, 2020, https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/.Google Scholar
- Apple Inc. (2020) Differential privacy. Accessed April 29, 2020, https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf.Google Scholar
- Article 29 Data Protection Working Party (2014) Opinion 05/2014 on anonymisation techniques. Accessed April 29, 2020, https://ec.europa.eu/justice/article-29/documentation/.Google Scholar
- (2019) Differential privacy has disparate impact on model accuracy. Adv. Neural Inform. Processing Systems 32:15479–15488.Google Scholar
- (2013) Fool’s gold: An illustrated critique of differential privacy. Vanderbilt J. Entertainment Tech. Law 16:701–755.Google Scholar
- (1982) An underestimated threat to multiple regression analyses used in job discrimination cases. Indust. Relations Law J. 5(1):156–173.Google Scholar
- (1995) Double jeopardy for women and minorities: Pay differences with age. Acad. Management J. 38(3):863–880.Google Scholar
- (2000) Statistics notes. The odds ratio. BMJ 320(7247):1468.Crossref, Google Scholar
- (2008) Gender, race, and meritocracy in organizational careers. Amer. J. Sociol. 113(6):1479–1526.Crossref, Google Scholar
- (2019) Bias and fairness in natural language processing. Baldwin T, Carpuat M, eds. Proc. 2019 Conf. Empirical Methods Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
- (2011) Differentially private empirical risk minimization. J. Machine Learn. Res. 12:1069–1109.Google Scholar
- (2020) Toward formalizing the GDPR’s notion of singling out. Proc. Natl. Acad. Sci. USA 117(15):8344–8352.Crossref, Google Scholar
- (2020) Algorithmic fairness and economics. Preprint, revised September 25, 2020, http://dx.doi.org/10.2139/ssrn.3361280.Google Scholar
- (2009) Inference by eye: Reading the overlap of independent confidence intervals. Statist. Med. 28(2):205–220.Crossref, Google Scholar
- (2020) Temporal trends in critical care outcomes in United States minority serving hospitals. Amer. J. Respiratory Critical Care Med. 201(6):681–687.Crossref, Google Scholar
- (2008) Privacy-MaxEnt: Integrating background knowledge in privacy quantification. SIGMOD’08 Proc. 2008 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 459–472.Google Scholar
- (2013) It’s not privacy, and it’s not fair. Stanford Law Rev. Online 66:35–40.Google Scholar
- (2019) Differential privacy in practice: Expose your epsilons! J. Privacy Confidentiality 9(2):1–22.Crossref, Google Scholar
- (2016) Calibrating noise to sensitivity in private data analysis. J. Privacy Confidentiality 7(3):17–51.Crossref, Google Scholar
- (2017) Exposed! A survey of attacks on private data. Annual Rev. Statist. Appl. 4:61–84.Crossref, Google Scholar
- (2018) Privacy for all: Ensuring fair and equitable privacy protections. Proc. First Conf. Fairness, Accountability and Transparency, Proceedings of Machine Learning Research, vol. 81 (Microtome Publishing, Brookline, MA), 35–47.Google Scholar
- (1968) New income inequality measures as efficient tools for causal analysis and planning. Econometrica 36(2):383–396.Crossref, Google Scholar
- (2014) RAPPOR: Randomized aggregatable privacy-preserving ordinal response. CCS’14 Proc. 2014 ACM SIGSAC Conf. Comput. Comm. Security (Association for Computing Machinery, New York), 1054–1067.Google Scholar
- (2002) Difference, disparity, and race/ethnic bias in federal sentencing. J. Quant. Criminol. 18(2):189–211.Crossref, Google Scholar
- Finnish Social Science Data Archive (2020) Data management guidelines. Accessed April 29, 2020, https://www.fsd.tuni.fi/aineistonhallinta/en/.Google Scholar
- (1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38.Crossref, Google Scholar
- (2016) On the theory and practice of privacy-preserving Bayesian data analysis. Ihler A, Janzing D, eds. UAI’16 Proc. 32nd Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 192–201.Google Scholar
- (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surveys 42(4):1–53.Crossref, Google Scholar
- (2008) Composition attacks and auxiliary information in data privacy. KDD’08 Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 265–273.Google Scholar
- (1990) Legal standards and statistical proof in Title VII litigation: In search of a coherent disparate impact model. Univ. Pennsylvania Law Rev. 139(2):455–503.Crossref, Google Scholar
- (2019) The role of statistical evidence in civil cases. Annual Rev. Statist. Appl. 7:39–60.Crossref, Google Scholar
- (1989) The knowledge complexity of interactive proof systems. SIAM J. Comput. 18(1):186–208.Crossref, Google Scholar
- (2020) You’re fired! Gender disparities in CEO dismissal. J. Management 46(4):560–582.Crossref, Google Scholar
- (2015) Discrimination- and privacy-aware patterns. Data Mining Knowledge Discovery 29(6):1733–1782.Crossref, Google Scholar
- (2016) Principled evaluation of differentially private algorithms using DPBench. SIGMOD’16 Proc. 2016 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery, New York), 139–154.Google Scholar
- (2008) When does a difference become a disparity? Conceptualizing racial and ethnic disparities in health. Health Affairs 27(2):374–382.Crossref, Google Scholar
- (2011) Policy framework for rare disease health disparities. Policy Polit. Nursing Practice 12(2):114–118.Crossref, Google Scholar
- (2005) Deriving private information from randomized data. SIGMOD’05 Proc. 2005 ACM SIGMOD Internat. Conf. Management Data (Association for Computing Machinery), 37–48.Google Scholar
- (2018) When and why randomized response techniques (fail to) elicit the truth. Organ. Behav. Human Decision Processes 148:101–123.Crossref, Google Scholar
- (2018) Toward practical differential privacy for sql queries. Proc. VLDB Endowment 11(5):526–539.Crossref, Google Scholar
- (2015) Discrimination prevention using privacy preserving techniques. Internat. J. Comput. Appl. 120(1):45–49.Google Scholar
- (2005) The national healthcare quality and disparities reports: An overview. Med. Care 43(3 Suppl.):I3–I8.Crossref, Google Scholar
- (2011) No free lunch in data privacy. SIGMOD’11 Proc. 2011 ACM SIGMOD Internat. Conf. Management of Data (Association for Computing Machinery, New York), 193–204.Google Scholar
- (2006) Gross statistical disparities as evidence of a pattern and practice of discrimination: Statistical vs. legal significance. Labor Lawyer 22(3):271–292.Google Scholar
- (2019) Simplicity creates inequity: Implications for fairness, stereotypes, and interpretability. EC’19 Proc. 2019 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 807–808.Google Scholar
- (2013) Class-restricted clustering and microperturbation for data privacy. Management Sci. 59(4):796–812.Link, Google Scholar
- (2012) On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. ASIACCS’12 Proc. 7th ACM Sympos. Inform. Comput. Comm. Security (Association for Computer Machinery, New York), 32–33.Google Scholar
- (2014) A data-and workload-aware algorithm for range queries under differential privacy. Proc. VLDB Endowment 7(5):341–352.Crossref, Google Scholar
- (2018) Does mitigating ML’s impact disparity require treatment disparity? Adv. Neural Inform. Processing Systems 31:8125–8135.Google Scholar
- (2019) Efforts to safeguard census data could muddy federal data. Government Tech. (December 17), https://www.govtech.com/analytics/Efforts-to-Safeguard-Census-Data-Could-Muddy-Federal-Data.html.Google Scholar
- (2007)ℓl-diversity: Privacy beyond k-anonymity. ACM Trans. Knowledge Discovery Data 1(1):3.Crossref, Google Scholar
- (2016) Using length of stay to control for unobserved heterogeneity when estimating treatment effect on hospital costs with observational data: Issues of reliability, robustness, and usefulness. Health Services Res. 51(5):2020–2043.Crossref, Google Scholar
- (2004) On the complexity of optimal k-anonymity. PODS’04 Proc. Twenty-Third ACM SIGMOD-SIGACT-SIGART Sympos. Principles Database Systems (Association for Computing Machinery, New York), 223–228.Google Scholar
- (1995) Accessibility, security, and accuracy in statistical databases: The case for the multiplicative fixed data perturbation approach. Management Sci. 41(9):1549–1564.Link, Google Scholar
- (1999) A general additive data perturbation method for database security. Management Sci. 45(10):1399–1415.Link, Google Scholar
- National Research Council (2004) Measuring Racial Discrimination (National Academies Press, Washington, DC).Google Scholar
- (2003) Stable Distributions: Models for Heavy-Tailed Data (Birkhauser, New York).Crossref, Google Scholar
- (1991) Statistical calibration: A review. Internat. Statist. Rev. 59(3):309–336.Crossref, Google Scholar
- (2008) The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets. Annual Rev. Sociol. 34:181–209.Crossref, Google Scholar
- (2018) On measuring bias in online information. SIGMOD Rec. 46(4):16–21.Crossref, Google Scholar
- (2020) Fair decision making using privacy-protected data. FAT*’20 Proc. 2020 Conf. Fairness Accountability Transparency (Association for Computing Machinery, New York), 189–199.Google Scholar
- (2020) An economic approach to regulating algorithms. NBER Working Paper 27111, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2019) Estimating the success of re-identifications in incomplete datasets using generative models. Nation Commun. 10(1):1–9.Crossref, Google Scholar
- (2010) Design of Observational Studies, 1st ed., Springer Series in Statistics (Springer, New York).Google Scholar
- (2014) Anti-discrimination analysis using privacy attack strategies. Calders T, Esposito F, Hüllermeier E, Meo R, eds. Proc. Joint Eur. Conf. Machine Learning Knowledge Discovery Databases, Lecture Notes in Computer Science, vol. 8725 (Springer, Berlin), 694–710.Google Scholar
- (2020) How differential privacy will affect our understanding of health disparities in the United States. Proc. Natl. Acad. Sci. USA 117(24):13405–13412.Google Scholar
- (2005) Pay disparities within top management groups: Evidence of harmful effects on performance of high-technology firms. Organ. Sci. 16(3):259–274.Link, Google Scholar
- (1996) Evolution of analytical tools by Mediqual Systems, Inc. Amer. J. Med. Qual. 11(1):S15–S17.Google Scholar
- (2000) Simple demographics often identify people uniquely. Health 671:1–34.Google Scholar
- (2002a) Achieving k-anonymity privacy protection using generalization and suppression. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 10(05):571–588.Crossref, Google Scholar
- (2002b) k-anonymity: A model for protecting privacy. Internat. J. Uncertainty Fuzziness Knowledge-Based Systems 10(05):557–570.Crossref, Google Scholar
- (2017) Privacy loss in Apple’s implementation of differential privacy on MacOS 10.12. Preprint, submitted September 8, https://arxiv.org/abs/1709.02753.Google Scholar
- (2015) Statistical disclosure control for micro-data using the R package sdcMicro. J. Statist. Software 67(1):1–36.Google Scholar
- Texas Department of State Health Services (2019) Texas hospital inpatient discharge public use data file. Accessed April 29, 2020, https://www.dshs.state.tx.us/thcic/hospitals/Inpatientpudf.shtm.Google Scholar
- (1984) The statistical security of a statistical database. ACM Trans. Database Systems 9(4):672–679.Crossref, Google Scholar
- U.S. Agency for Healthcare Research and Quality (2018) Central distributor SID: Description of data elements. Accessed April 29, 2020, https://www.hcup-us.ahrq.gov/db/vars/siddistnote.jsp.Google Scholar
- U.S. Department of Health and Human Services (2012) Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rule. Accessed April 29, 2020, https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html.Google Scholar
- (2018) Revisiting differentially private linear regression: Optimal and adaptive prediction & estimation in unbounded domain. Proc. 34th Conf. Uncertainty Artificial Intelligence (AUAI Press, Arlington, VA), 93–103.Google Scholar
- (2019) Privacy in health disparity research. Med. Care 57(Suppl. 2):S172–S175.Crossref, Google Scholar
- (2017) Big data science: Opportunities and challenges to address minority health and health disparities in the 21st century. Ethnicity Disparities 27(2):95–106.Crossref, Google Scholar

