Anonymizing and Sharing Medical Text Records

Published Online:https://doi.org/10.1287/isre.2016.0676

References

  • Aggarwal CC, Yu PS, eds. (2008) Privacy-Preserving Data Mining: Models and Algorithms (Springer, New York).CrossrefGoogle Scholar
  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. Proc. 20th Internat. Conf. Very Large Databases (Morgan Kaufmann, San Francisco), 487–499.Google Scholar
  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. Proc. 2000 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 439–450.CrossrefGoogle Scholar
  • Carpineto C, Osinski S, Romano G, Weiss D (2009) A survey of Web clustering engines. ACM Comput. Surveys 41(3):Article 17.CrossrefGoogle Scholar
  • Carter JH (2008) What is the electronic health record? Carter JH, ed. Electronic Health Records: A Guide for Clinicians and Administrators, 2nd ed. (ACP Press, Philadelphia), 3–20.Google Scholar
  • Cooper T, Collman J (2005) Managing information security and privacy in healthcare data mining: State of the art. Chen H, Fuller SS, Friedman C, Hersh W, eds. Medical Informatics: Knowledge Management and Data Mining in Biomedicine (Springer, New York), 95–137.CrossrefGoogle Scholar
  • Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn. 20(3):273–297.CrossrefGoogle Scholar
  • Dalenius T, Reiss SP (1982) Data swapping: A technique for disclosure control. J. Statist. Planning Inference 6(1):73–85.CrossrefGoogle Scholar
  • Department of Health and Human Services (DHHS) (2000) Standards for privacy of individually identifiable health information. Federal Register 65(250):82462–82829.Google Scholar
  • Duncan GT, Lambert D (1989) The risk of disclosure for microdata. J. Bus. Econom. Statist. 7(2):201–217.Google Scholar
  • Friedlin FJ, McDonald CJ (2008) A software tool for removing patient identifying information from clinical documents. J. Amer. Medical Informatics Assoc. 15(5):601–610.CrossrefGoogle Scholar
  • Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surveys 42(4):Article 14.CrossrefGoogle Scholar
  • Gardner J, Xiong L (2009) An integrated framework for de-identifying unstructured medical data. Data Knowledge Engrg. 68(12): 1441–1451.CrossrefGoogle Scholar
  • Garfinkel R, Gopal R, Thompson S (2007) Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Inform. Systems Res. 18(1):23–41.LinkGoogle Scholar
  • Health Information Technology for Economic and Clinical Health Act (HITECH Act) (2009) Title XIII of Division A and Title IV of Division B of the American Recovery and Reinvestment Act of 2009 (ARRA) (Pub. L. 111-5). https://www.healthit.gov/sites/default/files/hitech_act_excerpt_from_arra_with_index.pdf.Google Scholar
  • Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: Towards better research applications and clinical care. Nature Rev. Genetics 13(6):395–405.CrossrefGoogle Scholar
  • Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. Proc. 12th SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 106–117.CrossrefGoogle Scholar
  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 282–289.Google Scholar
  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791.CrossrefGoogle Scholar
  • Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Dietterich TG, Tresp V, eds. Advances in Neural Information Processing Systems, Vol. 13 (MIT Press, Cambridge, MA),556–562.Google Scholar
  • Li N, Li T, Venkatasubramanian S (2007) t-Closeness: Privacy beyond k-anonymity and l-diversity. Proc. 23rd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC),106–115.CrossrefGoogle Scholar
  • Li X-B, Sarkar S (2011) Protecting privacy against record linkage disclosure: A bounded swapping approach for numeric data. Inform. Systems Res. 22(4):774–789.LinkGoogle Scholar
  • Li X-B, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Management Sci. 59(4):796–812.LinkGoogle Scholar
  • Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-Diversity: Privacy beyond k-anonymity. Proc. 22nd IEEE Internat. Conf. Data Engrg. (IEEE Computer Society, Washington, DC), 24–35.CrossrefGoogle Scholar
  • Melville N, McQuaid M (2012) Generating shareable statistical databases for business value: Multiple imputation with multimodal perturbation. Inform. Systems Res. 23(2):559–574.LinkGoogle Scholar
  • Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inform. Systems Res. 16(3):256–270.LinkGoogle Scholar
  • Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: A review of recent research. Geissbuhler A, Kulikowski C, eds. IMIA Yearbook of Medical Informatics 2008 (Schattauer Publishers, Stuttgart, Germany), 128–144.Google Scholar
  • Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH (2010) Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Medical Res. Methodology 10:Article 70.CrossrefGoogle Scholar
  • Murphy SN, Gainer V, Mendis M, Churchill S, Kohane I (2011) Strategies for maintaining patient privacy in i2b2. J. Amer. Medical Informatics Assoc. 18(Suppl 1):i103–i108.CrossrefGoogle Scholar
  • Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, Kohane I (2010) Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J. Amer. Medical Informatics Assoc. 17(2):124–130.CrossrefGoogle Scholar
  • Office for Civil Rights (OCR) (2012) Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Department of Health and Human Services, Washington, DC, http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#protected.Google Scholar
  • Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Schölkopf B, Burges C, Smola AJ, eds. Advances in Kernel Methods—Support Vector Learning (MIT Press, Cambridge, MA), 185–209.Google Scholar
  • Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Systems Magazine 6(3):21–45.CrossrefGoogle Scholar
  • Rizvi SJ, Haritsa JR (2002) Maintaining data privacy in association rule mining. Proc. 28th Very Large Data Base Conf. (Morgan Kaufmann, San Francisco), 682–693.CrossrefGoogle Scholar
  • Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE (2007) Toward a national framework for the secondary use of health data: An American Medical Informatics Association white paper. J. Amer. Medical Informatics Assoc. 14(1):1–9.CrossrefGoogle Scholar
  • Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. J. Amer. Medical Informatics Assoc. 17(5):507–513.CrossrefGoogle Scholar
  • Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput. Surveys 34(1):1–47.CrossrefGoogle Scholar
  • Sutton C, McCallum A (2012) An introduction to conditional random fields. Foundations Trends Machine Learn. 4(4):267–373.CrossrefGoogle Scholar
  • Sweeney L (2002) k-Anonymity: A model for protecting privacy. Internat. J. Uncertainty, Fuzziness Knowledge-based Systems 10(5):557–570.CrossrefGoogle Scholar
  • Tan A, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl):S75–S83.Google Scholar
  • Uzuner O, Luo Y, Szolovits P (2007) Evaluating the state-of-the-art in automatic de-identification. J. Amer. Medical Informatics Assoc. 14(5):550–563.CrossrefGoogle Scholar
  • Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L (2007) Rapidly retargetable approaches to de-identification in medical records. J. Amer. Medical Informatics Assoc. 14(5):564–573.CrossrefGoogle Scholar
  • Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259.CrossrefGoogle Scholar
  • Wylie JE, Mineau GP (2003) Biomedical databases: Protecting privacy and promoting research. Trends Biotechnology 21(3):113–116.CrossrefGoogle Scholar
  • Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. Proc. 26th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 267–273.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.