Research Note—Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation

Nigel Melville
Nigel Melville
[email protected]
Stephen M. Ross School of Business, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author
,
Michael McQuaid
Michael McQuaid
[email protected]
School of Information, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author

Nigel Melville

[email protected]

Stephen M. Ross School of Business, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Michael McQuaid

[email protected]

School of Information, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Published Online:16 Jun 2011https://doi.org/10.1287/isre.1110.0361

References

Abowd J., Woodcock S. D., Doyle P., Lane J., Theeuwes J. J. M., Zayatz L. V. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (2001) (North-Holland, Amsterdam) 215–278Google Scholar
Abowd J., Stinson M., Benedetto G. Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project. (2006) . http://www.hks.harvard.edu/inequality/seminar/papers/Abowd07.pdfGoogle Scholar
Acquisti A., Gross R. Predicting social security numbers from public data. Proc. National Acad. Sci. (2009) 106(27):10975–10980Crossref, Google Scholar
Berry M., Linoff G.Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management (2004) (John Wiley & Sons, New York) Google Scholar
Burridge J. Information preserving statistical obfuscation. Statist. Comput. (2003) 13(4):321–327Crossref, Google Scholar
Carlson M., Salabasis M. A data swapping technique for generating synthetic samples: A method for disclosure control. Res. Official Statist. (2002) 6:35–64Google Scholar
Carroll R., Ruppert D., Stefanski L. A., Crainiceanu C.Measurement Error in Nonlinear Models: A Modern Perspective (2006) 2nd ed.(Chapman & Hall/CRC, Boca Raton, FL) Crossref, Google Scholar
Dalenius T., Reiss S. P. Data-swapping: A technique for disclosure control. J. Statist. Planning Inference (1982) 6(1):73–85Crossref, Google Scholar
Davenport T. H., Harris J. G., Jones G. L., Lemon K. N., Norton D. The dark side of customer analytics. Harvard Bus. Rev. (2007) 85(5):37–48Google Scholar
Domingo-Ferrer J., Mateo-Sanz J. M. Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowledge Data Engrg. (2002) 14(1):189–201Crossref, Google Scholar
Domingo-Ferrer J., Mateo-Sanz J. M., Torra V. Comparing SDC methods for microdata on the basis of information loss and disclosure risk. Proc. ETK-NTTS (2001) (Eurostat, Luxembourg) 807–825Google Scholar
Fearon J. Primary commodity exports and civil war. J. Conflict Resolution (2004) 49(4):483–507Crossref, Google Scholar
Fienberg S. E., McIntyre J., Domingo-Ferrer J., Torra V. Data swapping: Variations on a theme. Privacy in Statistical Databases (2004) (Springer, Berlin/Heidelberg) Crossref, Google Scholar
Fuller W. Masking procedures for microdata disclosure limitation. J. Official Statist (1993) 9(2):383–406Google Scholar
Garfinkel S., Miller R. Johnny 2: A user test of key continuity management with S/MIME and outlook express. SOUPS '05: 2005 Sympos. Usable Privacy and Security (2005) Pittsburgh:13–24Crossref, Google Scholar
Garfinkel R., Gopal R., Thompson S. Releasing individually identifiable microdata with privacy protection against stochastic threat: An application to health information. Inform. Systems Res. (2007) 18(1):23–41Link, Google Scholar
Gaw S., Felten E., Fernandex-Kelly P. Secrecy, flagging, and paranoia, adoption criteria in encrypted email. CHI '06: SIGHI Conf. Human Factors Comput. Systems (2006) Montréal, Québec, Canada:591–600Crossref, Google Scholar
Graham P., Penny R. Multiply imputed synthetic data files. Official Statist. Res. Ser. (2007) 1:1–45Google Scholar
Henna J. Marginal distributions of finite mixtures of multivariate normal distributions. J. Japan Statist. Soc. (2001) 31(2):187–191Crossref, Google Scholar
Hevner A., March S., Park J. Design science in information systems research. MIS Quart. (2004) 28(1):75–105Crossref, Google Scholar
Homer N., Szelinger S., Redman M., Duggan D., Tembe W., Muehling J., Pearson J. V., Stephan D. A., Nelson S. F., Craig D. W. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics (2008) 4(8):1–9Crossref, Google Scholar
Hopke P., Liu C., Rubin D. B. Multiple imputation for multivariate data with missing and below-threshold measurements: Time-series concentrations of pollutants in the Arctic. Biometrics (2001) 57(1):22–33Crossref, Google Scholar
Hsu C.-H., Taylor J., Murray S., Commenges D. Survival analysis using auxiliary variables via non-parametric multiple imputation. Statist. Medicine (2006) 25:3503–3517Crossref, Google Scholar
Kinney S. K., Reiter J. P. Making public use, synthetic files of longitudinal establishment data. Proc. Internat. Comparative Anal. Enterprise (Micro) Data Conf. (2006) Chicago:1–10Google Scholar
Knudsen E., Linden A., Ergon T., Jonzen N., Vik J. O., Knape J., Roer J. E., Stenseth N. Characterizing bird migration phenology using data from standardized monitoring at bird observatories. Climate Res. (2007) 35(1–2):59–77Crossref, Google Scholar
Lee S., Genton M. G., Arellano-Valle R. B. Perturbation of numerical confidential data via skew-t distributions. Management Sci. (2010) 56(2):318–333Link, Google Scholar
Li X.-B., Sarkar S. Privacy protection in data mining: A perturbation approach for categorical data. Inform. System Res. (2006) 17(3):254–270Link, Google Scholar
Liew C. K., Choi U. J., Lic C. J. A data distribution by probability distribution. ACM Trans. Database Systems (1985) 10(3):395–411Crossref, Google Scholar
Linoff G., Berry M.Mining the Web: Transforming Customer Data into Customer Value (2002) (John Wiley & Sons, New York) Google Scholar
Menon S., Sarkar S., Mukherjee S. Maximizing accuracy of shared databases when concealing sensitive patterns. Inform. Systems Res. (2005) 16(3):256–270Link, Google Scholar
Moore R. A. Controlled data swapping for masking public use microdata sets. (1996) . 96/04. U.S. Census Bureau Research Report, 1–27. http://www.census.gov/srd/papers/pdf/rr99-4.pdfGoogle Scholar
Muralidhar K., Sarathy R. Data shuffling—A new masking approach for numerical data. Management Sci. (2006) 52(5):658–670Link, Google Scholar
Muralidhar K., Sarathy R., Dandekar R., Domingo-Ferrer J., Franconi L. Why swap when you can shuffle? A comparison of the proximity swap and data shuffle for numeric data. Privacy in Statistical Databases (2006) (Springer Verlag, Berlin) 164–176Crossref, Google Scholar
Narayanan A., Shmatikov V. How to break anonymity of the Netflix prize data set. Comput. Sci. (2006) . http://arxiv.org/abs/cs/0610105v2Google Scholar
Narayanan A., Shmatikov V. Myths and fallacies of “Personally Identifiable Information”. Comm. ACM (2010) 53(6):24–26Crossref, Google Scholar
NRCAccess to Research Data in the 21st Century: An Ongoing Dialogue Among Interested Parties Report of a Workshop (2002) (National Academy Press, Washington, DC) Google Scholar
Paass G. Disclosure risk and disclosure avoidance for microdata. J. Bus. Econom. Statist. (1988) 6:487–500Google Scholar
Perloff J., Denbaly M. Data needs for consumer and retail firm studies. Annual Meeting of Amer. Agricultural Econom. Assoc. (2007) Portland, ORCrossref, Google Scholar
Raghunathan T. E., Reiter J. P., Rubin D. B. Multiple imputation for statistical disclosure limitation. J. Official Statist. (2003) 19(1):1–16Google Scholar
Reiss S. P. Practical data-swapping: The first steps. Proc. IEEE Sympos. Security and Privacy (1980) (IEEE, Piscataway, NJ) 38–43Crossref, Google Scholar
Reiter J. P. Satisfying disclosure restrictions with synthetic data sets. J. Official Statist. (2002) 18(4):531–543Google Scholar
Reiter J. P. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. J. Roy. Statist. Soc. (2005) 168(1):185–205Crossref, Google Scholar
Rubin D. B. The Bayesian bootstrap. Ann. Statist. (1981) 9(1):130–134Crossref, Google Scholar
Rubin D. B. Discussion: Statistical disclosure limitation. J. Official Statist. (1993) 9(2):461–468Google Scholar
Rubin D. B., Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Amer. Statist. Assoc. (1986) 81(394):366–374Crossref, Google Scholar
Saar-Tsechansky M., Provost F. Handling missing values when applying classification models. J. Maching Learn. Res. (2007) 8:1625–1657Google Scholar
Schafer J. L.Analysis of Incomplete Multivariate Data (1997) (Chapman & Hall, London) Crossref, Google Scholar
Skinner C. J. On identification disclosure and prediction disclosure for microdata. Statistica Neerlandica (1992) 46(1):21–32Crossref, Google Scholar
Sweeney L., Doyle P., Lane J., Theeuwes J. J. M., Zayatz L. V. Information explosion. Confidentiality, Disclosure, and Data Access (2001) (Urban Institute, Washington DC) 43–74Google Scholar
Templ M. sdcMicro: A new flexible R-package for the generation of anonymized microdata: Design issues and new methods. (2006) . http://cran.r-project.org/web/packages/sdcmicro/vignettes/sdcmicropaper.pdfGoogle Scholar
Tendick P., Matloff N. A modified random perturbation method for database security. ACM Trans. Database Systems (1994) 19(1):47–63Crossref, Google Scholar
Whitten A., Tygar J. Why Johnny can't encrypt: A usability evaluation of PGP 5.0. SSYM '99: 8th Conf. USENIX Security (1999) (USENIX Association, Berkeley, CA) 14–14Google Scholar
Wood S. N.Generalized Additive Models: An Introduction with R (2006) (Chapman & Hall/CRC, London) Crossref, Google Scholar
Zheng Z., Padmanabhan B. Selectively acquiring customer information: A new data acquisition problem and an active-learning based solution. Management Sci. (2006) 52(5):697–712Link, Google Scholar

cover image Information Systems Research

Volume 23, Issue 2

June 2012

Pages 287-598

Article Information

Supplemental Material

Metrics

Information

Received:September 30, 2008
Published Online:June 16, 2011

Cite as

Nigel Melville, Michael McQuaid, (2011) Research Note—Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation. Information Systems Research 23(2):559-574.

https://doi.org/10.1287/isre.1110.0361

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Research Note—Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation

References

Volume 23, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News