Anonymizing and Sharing Medical Text Records
Published Online:12 Apr 2017https://doi.org/10.1287/isre.2016.0676
References
- Aggarwal CC, Yu PS, eds. (2008) Privacy-Preserving Data Mining: Models and Algorithms (Springer, New York).Crossref, Google Scholar
- (1994) Fast algorithms for mining association rules in large databases. Proc. 20th Internat. Conf. Very Large Databases (Morgan Kaufmann, San Francisco), 487–499.Google Scholar
- (2000) Privacy-preserving data mining. Proc. 2000 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 439–450.Crossref, Google Scholar
- (2009) A survey of Web clustering engines. ACM Comput. Surveys 41(3):Article 17.Crossref, Google Scholar
- (2008) What is the electronic health record? Carter JH, ed. Electronic Health Records: A Guide for Clinicians and Administrators, 2nd ed. (ACP Press, Philadelphia), 3–20.Google Scholar
- (2005) Managing information security and privacy in healthcare data mining: State of the art. Chen H, Fuller SS, Friedman C, Hersh W, eds. Medical Informatics: Knowledge Management and Data Mining in Biomedicine (Springer, New York), 95–137.Crossref, Google Scholar
- (1995) Support-vector networks. Machine Learn. 20(3):273–297.Crossref, Google Scholar
- (1982) Data swapping: A technique for disclosure control. J. Statist. Planning Inference 6(1):73–85.Crossref, Google Scholar
- Department of Health and Human Services (DHHS) (2000) Standards for privacy of individually identifiable health information. Federal Register 65(250):82462–82829.Google Scholar
- (1989) The risk of disclosure for microdata. J. Bus. Econom. Statist. 7(2):201–217.Google Scholar
- (2008) A software tool for removing patient identifying information from clinical documents. J. Amer. Medical Informatics Assoc. 15(5):601–610.Crossref, Google Scholar
- (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surveys 42(4):Article 14.Crossref, Google Scholar
- (2009) An integrated framework for de-identifying unstructured medical data. Data Knowledge Engrg. 68(12): 1441–1451.Crossref, Google Scholar
- (2007) Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Inform. Systems Res. 18(1):23–41.Link, Google Scholar
- Health Information Technology for Economic and Clinical Health Act (HITECH Act) (2009) Title XIII of Division A and Title IV of Division B of the American Recovery and Reinvestment Act of 2009 (ARRA) (Pub. L. 111-5). https://www.healthit.gov/sites/default/files/hitech_act_excerpt_from_arra_with_index.pdf.Google Scholar
- (2012) Mining electronic health records: Towards better research applications and clinical care. Nature Rev. Genetics 13(6):395–405.Crossref, Google Scholar
- (2012) Symmetric nonnegative matrix factorization for graph clustering. Proc. 12th SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 106–117.Crossref, Google Scholar
- (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 282–289.Google Scholar
- (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791.Crossref, Google Scholar
- (2001) Algorithms for non-negative matrix factorization. Dietterich TG, Tresp V, eds. Advances in Neural Information Processing Systems, Vol. 13 (MIT Press, Cambridge, MA),556–562.Google Scholar
- (2007) t-Closeness: Privacy beyond k-anonymity and l-diversity. Proc. 23rd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC),106–115.Crossref, Google Scholar
- (2011) Protecting privacy against record linkage disclosure: A bounded swapping approach for numeric data. Inform. Systems Res. 22(4):774–789.Link, Google Scholar
- (2013) Class-restricted clustering and microperturbation for data privacy. Management Sci. 59(4):796–812.Link, Google Scholar
- (2006) l-Diversity: Privacy beyond k-anonymity. Proc. 22nd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC), 24–35.Crossref, Google Scholar
- (2012) Generating shareable statistical databases for business value: Multiple imputation with multimodal perturbation. Inform. Systems Res. 23(2):559–574.Link, Google Scholar
- (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inform. Systems Res. 16(3):256–270.Link, Google Scholar
- (2008) Extracting information from textual documents in the electronic health record: A review of recent research. Geissbuhler A, Kulikowski C, eds. IMIA Yearbook of Medical Informatics 2008 (Schattauer Publishers, Stuttgart, Germany), 128–144.Google Scholar
- (2010) Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Res. Methodology 10:Article 70.Crossref, Google Scholar
- (2011) Strategies for maintaining patient privacy in i2b2. J. Amer. Medical Informatics Assoc. 18(Suppl 1):i103–i108.Crossref, Google Scholar
- (2010) Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J. Amer. Medical Informatics Assoc. 17(2):124–130.Crossref, Google Scholar
- Office for Civil Rights (OCR) (2012) Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Department of Health and Human Services, Washington, DC, http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#protected.Google Scholar
- (1998) Fast training of support vector machines using sequential minimal optimization. Schölkopf B, Burges C, Smola AJ, eds. Advances in Kernel Methods—Support Vector Learning (MIT Press, Cambridge, MA), 185–209.Google Scholar
- (2006) Ensemble based systems in decision making. IEEE Circuits Systems Magazine 6(3):21–45.Crossref, Google Scholar
- (2002) Maintaining data privacy in association rule mining. Proc. 28th Very Large Data Base Conf. (Morgan Kaufmann, San Francisco), 682–693.Crossref, Google Scholar
- (2007) Toward a national framework for the secondary use of health data: An American Medical Informatics Association white paper. J. Amer. Medical Informatics Assoc. 14(1):1–9.Crossref, Google Scholar
- (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J. Amer. Medical Informatics Assoc. 17(5):507–513.Crossref, Google Scholar
- (2002) Machine learning in automated text categorization. ACM Comput. Surveys 34(1):1–47.Crossref, Google Scholar
- (2012) An introduction to conditional random fields. Foundations Trends Machine Learn. 4(4):267–373.Crossref, Google Scholar
- (2002) k-Anonymity: A model for protecting privacy. Internat. J. Uncertainty, Fuzziness Knowledge-based Systems 10(5):557–570.Crossref, Google Scholar
- (2003) Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl):S75–S83.Google Scholar
- (2007) Evaluating the state-of-the-art in automatic de-identification. J. Amer. Medical Informatics Assoc. 14(5):550–563.Crossref, Google Scholar
- (2007) Rapidly retargetable approaches to de-identification in medical records. J. Amer. Medical Informatics Assoc. 14(5):564–573.Crossref, Google Scholar
- (1992) Stacked generalization. Neural Networks 5(2):241–259.Crossref, Google Scholar
- (2003) Biomedical databases: Protecting privacy and promoting research. Trends Biotechnology 21(3):113–116.Crossref, Google Scholar
- (2003) Document clustering based on non-negative matrix factorization. Proc. 26th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 267–273.Crossref, Google Scholar

