Anonymizing and Sharing Medical Text Records

Xiao-Bai Li
Corresponding Author
Xiao-Bai Li
[email protected]
http://orcid.org/0000-0001-8009-8439
Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854
Search for more papers by this author
,
Jialun Qin
Jialun Qin
[email protected]
Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854
Search for more papers by this author

Corresponding Author

Xiao-Bai Li

Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854

Search for more papers by this author

Jialun Qin

[email protected]

Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854

Search for more papers by this author

Published Online:12 Apr 2017https://doi.org/10.1287/isre.2016.0676

References

Aggarwal CC, Yu PS, eds. (2008) Privacy-Preserving Data Mining: Models and Algorithms (Springer, New York).Crossref, Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. Proc. 20th Internat. Conf. Very Large Databases (Morgan Kaufmann, San Francisco), 487–499.Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proc. 2000 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 439–450.Crossref, Google Scholar
Carpineto C, Osinski S, Romano G, Weiss D (2009) A survey of Web clustering engines. ACM Comput. Surveys 41(3):Article 17.Crossref, Google Scholar
Carter JH (2008) What is the electronic health record? Carter JH, ed. Electronic Health Records: A Guide for Clinicians and Administrators, 2nd ed. (ACP Press, Philadelphia), 3–20.Google Scholar
Cooper T, Collman J (2005) Managing information security and privacy in healthcare data mining: State of the art. Chen H, Fuller SS, Friedman C, Hersh W, eds. Medical Informatics: Knowledge Management and Data Mining in Biomedicine (Springer, New York), 95–137.Crossref, Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn. 20(3):273–297.Crossref, Google Scholar
Dalenius T, Reiss SP (1982) Data swapping: A technique for disclosure control. J. Statist. Planning Inference 6(1):73–85.Crossref, Google Scholar
Department of Health and Human Services (DHHS) (2000) Standards for privacy of individually identifiable health information. Federal Register 65(250):82462–82829.Google Scholar
Duncan GT, Lambert D (1989) The risk of disclosure for microdata. J. Bus. Econom. Statist. 7(2):201–217.Google Scholar
Friedlin FJ, McDonald CJ (2008) A software tool for removing patient identifying information from clinical documents. J. Amer. Medical Informatics Assoc. 15(5):601–610.Crossref, Google Scholar
Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surveys 42(4):Article 14.Crossref, Google Scholar
Gardner J, Xiong L (2009) An integrated framework for de-identifying unstructured medical data. Data Knowledge Engrg. 68(12): 1441–1451.Crossref, Google Scholar
Garfinkel R, Gopal R, Thompson S (2007) Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Inform. Systems Res. 18(1):23–41.Link, Google Scholar
Health Information Technology for Economic and Clinical Health Act (HITECH Act) (2009) Title XIII of Division A and Title IV of Division B of the American Recovery and Reinvestment Act of 2009 (ARRA) (Pub. L. 111-5). https://www.healthit.gov/sites/default/files/hitech_act_excerpt_from_arra_with_index.pdf.Google Scholar
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: Towards better research applications and clinical care. Nature Rev. Genetics 13(6):395–405.Crossref, Google Scholar
Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. Proc. 12th SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 106–117.Crossref, Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 282–289.Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791.Crossref, Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Dietterich TG, Tresp V, eds. Advances in Neural Information Processing Systems, Vol. 13 (MIT Press, Cambridge, MA),556–562.Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-Closeness: Privacy beyond k-anonymity and l-diversity. Proc. 23rd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC),106–115.Crossref, Google Scholar
Li X-B, Sarkar S (2011) Protecting privacy against record linkage disclosure: A bounded swapping approach for numeric data. Inform. Systems Res. 22(4):774–789.Link, Google Scholar
Li X-B, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Management Sci. 59(4):796–812.Link, Google Scholar
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: Privacy beyond k-anonymity. Proc. 22nd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC), 24–35.Crossref, Google Scholar
Melville N, McQuaid M (2012) Generating shareable statistical databases for business value: Multiple imputation with multimodal perturbation. Inform. Systems Res. 23(2):559–574.Link, Google Scholar
Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inform. Systems Res. 16(3):256–270.Link, Google Scholar
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: A review of recent research. Geissbuhler A, Kulikowski C, eds. IMIA Yearbook of Medical Informatics 2008 (Schattauer Publishers, Stuttgart, Germany), 128–144.Google Scholar
Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH (2010) Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Res. Methodology 10:Article 70.Crossref, Google Scholar
Murphy SN, Gainer V, Mendis M, Churchill S, Kohane I (2011) Strategies for maintaining patient privacy in i2b2. J. Amer. Medical Informatics Assoc. 18(Suppl 1):i103–i108.Crossref, Google Scholar
Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I (2010) Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J. Amer. Medical Informatics Assoc. 17(2):124–130.Crossref, Google Scholar
Office for Civil Rights (OCR) (2012) Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Department of Health and Human Services, Washington, DC, http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#protected.Google Scholar
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Schölkopf B, Burges C, Smola AJ, eds. Advances in Kernel Methods—Support Vector Learning (MIT Press, Cambridge, MA), 185–209.Google Scholar
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Systems Magazine 6(3):21–45.Crossref, Google Scholar
Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. Proc. 28th Very Large Data Base Conf. (Morgan Kaufmann, San Francisco), 682–693.Crossref, Google Scholar
Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE (2007) Toward a national framework for the secondary use of health data: An American Medical Informatics Association white paper. J. Amer. Medical Informatics Assoc. 14(1):1–9.Crossref, Google Scholar
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J. Amer. Medical Informatics Assoc. 17(5):507–513.Crossref, Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput. Surveys 34(1):1–47.Crossref, Google Scholar
Sutton C, McCallum A (2012) An introduction to conditional random fields. Foundations Trends Machine Learn. 4(4):267–373.Crossref, Google Scholar
Sweeney L (2002) k-Anonymity: A model for protecting privacy. Internat. J. Uncertainty, Fuzziness Knowledge-based Systems 10(5):557–570.Crossref, Google Scholar
Tan A, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl):S75–S83.Google Scholar
Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J. Amer. Medical Informatics Assoc. 14(5):550–563.Crossref, Google Scholar
Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L (2007) Rapidly retargetable approaches to de-identification in medical records. J. Amer. Medical Informatics Assoc. 14(5):564–573.Crossref, Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259.Crossref, Google Scholar
Wylie JE, Mineau GP (2003) Biomedical databases: Protecting privacy and promoting research. Trends Biotechnology 21(3):113–116.Crossref, Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. Proc. 26th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 267–273.Crossref, Google Scholar

cover image Information Systems Research

Volume 28, Issue 2

June 2017

Pages iii-vi, 203-449

Article Information

Metrics

Information

Received:June 12, 2014
Accepted:August 28, 2016
Published Online:April 12, 2017

Cite as

Xiao-Bai Li, Jialun Qin (2017) Anonymizing and Sharing Medical Text Records. Information Systems Research 28(2):332-352.

https://doi.org/10.1287/isre.2016.0676

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Anonymizing and Sharing Medical Text Records

References

Volume 28, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News