Record Matching in Data Warehouses: A Decision Model for Data Consolidation

References

  • Anahory S., Murray D.Data Warehousing in the Real World (1997) (Addison-Wesley, Reading, MA) Google Scholar
  • Batini C., Lenzerini M., Navathe S. B. A comparative analysis of methodologies for database schema integration. ACM Comput. Surveys (1986) 18(4):323–364CrossrefGoogle Scholar
  • Belin T. R., Rubin D. B. A method for calibrating false-match rates in record linkage. J. Amer. Statist. Assoc. (1995) 90(430):694–707CrossrefGoogle Scholar
  • Bell G. B., Sethi A. Matching records in a national medical patient index. Comm. ACM (2001) 44(9):83–88CrossrefGoogle Scholar
  • Berge C. Balanced matrices. Math. Programming (1972) 2(1):19–31CrossrefGoogle Scholar
  • Berge C.Hypergraphs (1989) (North Holland, New York) Google Scholar
  • Bischoff J., Alexander T.Data Warehouse: Practical Advice from the Experts (1997) (Prentice Hall, Upper Saddle River, NJ) Google Scholar
  • Bright M. W., Hurson A. R., Pakzad S. Automated resolution of semantic heterogeneity in multidatabases. ACM Trans. Database Systems (1994) 19(2):212–253CrossrefGoogle Scholar
  • Chatterjee A., Segev A. Rule based joins in heterogeneous databases. Dec. Support Systems (1995) 13(1):313–333CrossrefGoogle Scholar
  • Cheeseman P., Kelly J., Self M., Stutz J., Taylor W., Friedman D. Autoclass: A Bayesian classification system. Proc. Fifth Internat. Conf. on Machine Learning (1988) (Ann Arbor, MI)54–64CrossrefGoogle Scholar
  • Copas J. B., Hilton F. J. Record linkage: Statistical models for matching computer records. J. Royal Statist. Soc. (1990) 153(3):287–320CrossrefGoogle Scholar
  • Dayal U., Hwang H.-Y. View definition and generalization for database integration in a multidatabase system. IEEE Trans. Software Engrg. (1984) SE-10(6):628–645CrossrefGoogle Scholar
  • Dey D., Sarkar S. A probabilistic relational model and algebra. ACM Trans. Database Systems (1996) 21(3):339–369CrossrefGoogle Scholar
  • Dey D., Sarkar S. Modifications of uncertain data: A Bayesian framework for belief revision. Inform. Systems Res. (2000) 11(1):1–16LinkGoogle Scholar
  • Dey D., Sarkar S., De P. A probabilistic decision model for entity matching in heterogeneous databases. Management Sci. (1998) 44(10):1379–1395LinkGoogle Scholar
  • Fang D., Hammer J., McLeod D. The identification and resolution of semantic heterogeneity in multidatabase systems. Proc. First Internat. Workshop on Interoperability in Multidatabase Systems (1991) Kyoto, JapanCrossrefGoogle Scholar
  • Fellegi I. P., Sunter A. B. A theory of record linkage. Amer. Statist. Assoc. J. (1969) 64:1183–1210CrossrefGoogle Scholar
  • Hernández M. A., Stolfo S. J. The merge/purge problem for large databases. Proc. 1995 ACM SIGMOD Conference. (1995) (San Jose, CA)127–138CrossrefGoogle Scholar
  • Hurwicz M. Take your data to the cleaners. Byte (1997) 22(1):97–102Google Scholar
  • Inmon W. H. Managing the data warehouse environment. Data Management Rev. (1996) 6(2). 8Google Scholar
  • Jarke M., Lenzerini M., Vassiliou Y., Vassiliadis P.Fundamentals of Data Warehouse (2000) (Springer, New York) CrossrefGoogle Scholar
  • Jaro M. A. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Amer. Statist. Assoc. (1989) 84(406):414–420CrossrefGoogle Scholar
  • Kelley R. P. Blocking considerations for record linkage under conditions of uncertainty. Proc. Section on Social Statistics. (1984) (American Statistical Association, Alexandria, VA) 602–605Google Scholar
  • Kim W., Choi I., Gala S., Scheevel M. On resolving semantic heterogeneity in multidatabase systems. Distributed and Parallel Databases (1993) 1(3):251–279CrossrefGoogle Scholar
  • Kononenko I., Bratco I. Information-based evaluation criterion for classifier's performance. Machine Learning (1991) 6:67–80Google Scholar
  • Kuhn H. W. The Hungarian method for the assignment algorithm. Naval Res. Logist. Quart. (1955) 1(1–2):83–97CrossrefGoogle Scholar
  • Larsen M. D., Rubin D. B. Iterative automated record linkage using mixture models. J. Amer. Statist. Assoc. (2001) 96(453):32–41CrossrefGoogle Scholar
  • Larson J. A., Navathe S. B., Elmasri R. A theory of attribute equivalence in databases with application to schema integration. IEEE Trans. Software Engrg. (1989) 15(4):449–463CrossrefGoogle Scholar
  • Lawler E.Combinatorial Optimization: Networks and Matroids (1976) (Holt, Rinehart, and Winston, New York) Google Scholar
  • Li W.-S., Clifton C. Semantic integration in heterogeneous databases using neural networks. Proc. 20th Internat. Conf. on Very Large Data Bases. (1994) (Santiago, Chile)1–12Google Scholar
  • Lovász L. Normal hypergraphs and perfect graph conjecture. Discrete Math. (1972) 2:253–267CrossrefGoogle Scholar
  • McCallum A., Nigam K., Ungar L. H. Efficient clustering of high-dimensional data sets with application to reference matching. Proc. Sixth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining. (2000) (Boston, MA)1–12CrossrefGoogle Scholar
  • Nigam K., McCallum A. K., Thrun S., Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine Learning (2000) 39(2/3):103–134CrossrefGoogle Scholar
  • Papadimitriou C. H., Steiglitz K.Combinatorial Optimization: Algorithms and Complexity (1982) (Prentice Hall, Englewood Cliffs, NJ) Google Scholar
  • Rusinkiewicz M., Sheth A., Karabatis G. Specifying inter-database dependencies in a multidatabase environment. IEEE Comput. (1991) 24(12):46–53CrossrefGoogle Scholar
  • Schrijver A.Theory of Linear and Integer Programming (1986) (Wiley, New York) Google Scholar
  • Tepping B. J. A model for optimal linkage of records. J. Amer. Statist. Assoc. (1968) 63:1321–1332CrossrefGoogle Scholar
  • Ventrone V., Heiler S. Semantic heterogeneity as a result of domain evolution. ACM SIGMOD Record (1991) 20(4):16–20CrossrefGoogle Scholar
  • Wang Y. R., Madnick S. The interdatabase instance identification problem in integrating autonomous systems. Proc. Fifth Internat. Conf. on Data Engrg. (1989) (Los Angeles, CA)46–55CrossrefGoogle Scholar
  • Winkler W. E., Alvey W., Kliss B. Exact matching lists of businesses: Blocking, subfield identification, information theory. Record Linkage Techniques 1985 (1985) 227–241U.S. Internal Revenue Service Publication 1299Google Scholar
  • Winkler W. E. Advanced methods for record linkage. Proc. Section on Survey Res. Methods (1994) (American Statistical Association, Alexandria, VA) 467–472Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.