Research Note—Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation
Published Online:16 Jun 2011https://doi.org/10.1287/isre.1110.0361
References
- , Doyle P., Lane J., Theeuwes J. J. M., Zayatz L. V. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (2001) (North-Holland, Amsterdam) 215–278Google Scholar
- Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project. (2006) . http://www.hks.harvard.edu/inequality/seminar/papers/Abowd07.pdfGoogle Scholar
- Predicting social security numbers from public data. Proc. National Acad. Sci. (2009) 106(27):10975–10980Crossref, Google Scholar
- Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (2004) (John Wiley & Sons, New York) Google Scholar
- Information preserving statistical obfuscation. Statist. Comput. (2003) 13(4):321–327Crossref, Google Scholar
- A data swapping technique for generating synthetic samples: A method for disclosure control. Res. Official Statist. (2002) 6:35–64Google Scholar
- Measurement Error in Nonlinear Models: A Modern Perspective (2006) 2nd ed.(Chapman & Hall/CRC, Boca Raton, FL) Crossref, Google Scholar
- Data-swapping: A technique for disclosure control. J. Statist. Planning Inference (1982) 6(1):73–85Crossref, Google Scholar
- The dark side of customer analytics. Harvard Bus. Rev. (2007) 85(5):37–48Google Scholar
- Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowledge Data Engrg. (2002) 14(1):189–201Crossref, Google Scholar
- Comparing SDC methods for microdata on the basis of information loss and disclosure risk. Proc. ETK-NTTS (2001) (Eurostat, Luxembourg) 807–825Google Scholar
- Primary commodity exports and civil war. J. Conflict Resolution (2004) 49(4):483–507Crossref, Google Scholar
- , Domingo-Ferrer J., Torra V. Data swapping: Variations on a theme. Privacy in Statistical Databases (2004) (Springer, Berlin/Heidelberg) Crossref, Google Scholar
- Masking procedures for microdata disclosure limitation. J. Official Statist (1993) 9(2):383–406Google Scholar
- Johnny 2: A user test of key continuity management with S/MIME and outlook express. SOUPS '05: 2005 Sympos. Usable Privacy and Security (2005) Pittsburgh:13–24Crossref, Google Scholar
- Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Inform. Systems Res. (2007) 18(1):23–41Link, Google Scholar
- Secrecy, flagging, and paranoia, adoption criteria in encrypted email. CHI '06: SIGHI Conf. Human Factors Comput. Systems (2006) Montréal, Québec, Canada:591–600Crossref, Google Scholar
- Multiply imputed synthetic data files. Official Statist. Res. Ser. (2007) 1:1–45Google Scholar
- Marginal distributions of finite mixtures of multivariate normal distributions. J. Japan Statist. Soc. (2001) 31(2):187–191Crossref, Google Scholar
- Design science in information systems research. MIS Quart. (2004) 28(1):75–105Crossref, Google Scholar
- Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics (2008) 4(8):1–9Crossref, Google Scholar
- Multiple imputation for multivariate data with missing and below-threshold measurements: Time-series concentrations of pollutants in the Arctic. Biometrics (2001) 57(1):22–33Crossref, Google Scholar
- Survival analysis using auxiliary variables via non-parametric multiple imputation. Statist. Medicine (2006) 25:3503–3517Crossref, Google Scholar
- Making public use, synthetic files of longitudinal establishment data. Proc. Internat. Comparative Anal. Enterprise (Micro) Data Conf. (2006) Chicago:1–10Google Scholar
- Characterizing bird migration phenology using data from standardized monitoring at bird observatories. Climate Res. (2007) 35(1–2):59–77Crossref, Google Scholar
- Perturbation of numerical confidential data via skew-t distributions. Management Sci. (2010) 56(2):318–333Link, Google Scholar
- Privacy protection in data mining: A perturbation approach for categorical data. Inform. System Res. (2006) 17(3):254–270Link, Google Scholar
- A data distribution by probability distribution. ACM Trans. Database Systems (1985) 10(3):395–411Crossref, Google Scholar
- Mining the Web: Transforming Customer Data into Customer Value (2002) (John Wiley & Sons, New York) Google Scholar
- Maximizing accuracy of shared databases when concealing sensitive patterns. Inform. Systems Res. (2005) 16(3):256–270Link, Google Scholar
- Controlled data swapping for masking public use microdata sets. (1996) . 96/04. U.S. Census Bureau Research Report, 1–27. http://www.census.gov/srd/papers/pdf/rr99-4.pdfGoogle Scholar
- Data shuffling—A new masking approach for numerical data. Management Sci. (2006) 52(5):658–670Link, Google Scholar
- , Domingo-Ferrer J., Franconi L. Why swap when you can shuffle? A comparison of the proximity swap and data shuffle for numeric data. Privacy in Statistical Databases (2006) (Springer Verlag, Berlin) 164–176Crossref, Google Scholar
- How to break anonymity of the Netflix prize data set. Comput. Sci. (2006) . http://arxiv.org/abs/cs/0610105v2Google Scholar
- Myths and fallacies of “Personally Identifiable Information”. Comm. ACM (2010) 53(6):24–26Crossref, Google Scholar
- NRCAccess to Research Data in the 21st Century: An Ongoing Dialogue Among Interested Parties Report of a Workshop (2002) (National Academy Press, Washington, DC) Google Scholar
- Disclosure risk and disclosure avoidance for microdata. J. Bus. Econom. Statist. (1988) 6:487–500Google Scholar
- Data needs for consumer and retail firm studies. Annual Meeting of Amer. Agricultural Econom. Assoc. (2007) Portland, ORCrossref, Google Scholar
- Multiple imputation for statistical disclosure limitation. J. Official Statist. (2003) 19(1):1–16Google Scholar
- Practical data-swapping: The first steps. Proc. IEEE Sympos. Security and Privacy (1980) (IEEE, Piscataway, NJ) 38–43Crossref, Google Scholar
- Satisfying disclosure restrictions with synthetic data sets. J. Official Statist. (2002) 18(4):531–543Google Scholar
- Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. J. Roy. Statist. Soc. (2005) 168(1):185–205Crossref, Google Scholar
- The Bayesian bootstrap. Ann. Statist. (1981) 9(1):130–134Crossref, Google Scholar
- Discussion: Statistical disclosure limitation. J. Official Statist. (1993) 9(2):461–468Google Scholar
- Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Amer. Statist. Assoc. (1986) 81(394):366–374Crossref, Google Scholar
- Handling missing values when applying classification models. J. Maching Learn. Res. (2007) 8:1625–1657Google Scholar
- Analysis of Incomplete Multivariate Data (1997) (Chapman & Hall, London) Crossref, Google Scholar
- On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica (1992) 46(1):21–32Crossref, Google Scholar
- , Doyle P., Lane J., Theeuwes J. J. M., Zayatz L. V. Information explosion. Confidentiality, Disclosure, and Data Access (2001) (Urban Institute, Washington DC) 43–74Google Scholar
- sdcMicro: A new flexible R-package for the generation of anonymized microdata: Design issues and new methods. (2006) . http://cran.r-project.org/web/packages/sdcmicro/vignettes/sdcmicropaper.pdfGoogle Scholar
- A modified random perturbation method for database security. ACM Trans. Database Systems (1994) 19(1):47–63Crossref, Google Scholar
- Why Johnny can't encrypt: A usability evaluation of PGP 5.0. SSYM '99: 8th Conf. USENIX Security (1999) (USENIX Association, Berkeley, CA) 14–14Google Scholar
- Generalized Additive Models: An Introduction with R (2006) (Chapman & Hall/CRC, London) Crossref, Google Scholar
- Selectively acquiring customer information: A new data acquisition problem and an active-learning based solution. Management Sci. (2006) 52(5):697–712Link, Google Scholar

