Record Matching in Data Warehouses: A Decision Model for Data Consolidation
Published Online:1 Apr 2003https://doi.org/10.1287/opre.51.2.240.12779
References
- Data Warehousing in the Real World (1997) (Addison-Wesley, Reading, MA) Google Scholar
- A comparative analysis of methodologies for database schema integration. ACM Comput. Surveys (1986) 18(4):323–364Crossref, Google Scholar
- A method for calibrating false-match rates in record linkage. J. Amer. Statist. Assoc. (1995) 90(430):694–707Crossref, Google Scholar
- Matching records in a national medical patient index. Comm. ACM (2001) 44(9):83–88Crossref, Google Scholar
- Balanced matrices. Math. Programming (1972) 2(1):19–31Crossref, Google Scholar
- Hypergraphs (1989) (North Holland, New York) Google Scholar
- Data Warehouse: Practical Advice from the Experts (1997) (Prentice Hall, Upper Saddle River, NJ) Google Scholar
- Automated resolution of semantic heterogeneity in multidatabases. ACM Trans. Database Systems (1994) 19(2):212–253Crossref, Google Scholar
- Rule based joins in heterogeneous databases. Dec. Support Systems (1995) 13(1):313–333Crossref, Google Scholar
- Autoclass: A Bayesian classification system. Proc. Fifth Internat. Conf. on Machine Learning (1988) (Ann Arbor, MI)54–64Crossref, Google Scholar
- Record linkage: Statistical models for matching computer records. J. Royal Statist. Soc. (1990) 153(3):287–320Crossref, Google Scholar
- View definition and generalization for database integration in a multidatabase system. IEEE Trans. Software Engrg. (1984) SE-10(6):628–645Crossref, Google Scholar
- A probabilistic relational model and algebra. ACM Trans. Database Systems (1996) 21(3):339–369Crossref, Google Scholar
- Modifications of uncertain data: A Bayesian framework for belief revision. Inform. Systems Res. (2000) 11(1):1–16Link, Google Scholar
- A probabilistic decision model for entity matching in heterogeneous databases. Management Sci. (1998) 44(10):1379–1395Link, Google Scholar
- The identification and resolution of semantic heterogeneity in multidatabase systems. Proc. First Internat. Workshop on Interoperability in Multidatabase Systems (1991) Kyoto, JapanCrossref, Google Scholar
- A theory of record linkage. Amer. Statist. Assoc. J. (1969) 64:1183–1210Crossref, Google Scholar
- The merge/purge problem for large databases. Proc. 1995 ACM SIGMOD Conference. (1995) (San Jose, CA)127–138Crossref, Google Scholar
- Take your data to the cleaners. Byte (1997) 22(1):97–102Google Scholar
- Managing the data warehouse environment. Data Management Rev. (1996) 6(2). 8Google Scholar
- Fundamentals of Data Warehouse (2000) (Springer, New York) Crossref, Google Scholar
- Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Amer. Statist. Assoc. (1989) 84(406):414–420Crossref, Google Scholar
- Blocking considerations for record linkage under conditions of uncertainty. Proc. Section on Social Statistics. (1984) (American Statistical Association, Alexandria, VA) 602–605Google Scholar
- On resolving semantic heterogeneity in multidatabase systems. Distributed and Parallel Databases (1993) 1(3):251–279Crossref, Google Scholar
- Information-based evaluation criterion for classifier's performance. Machine Learning (1991) 6:67–80Google Scholar
- The Hungarian method for the assignment algorithm. Naval Res. Logist. Quart. (1955) 1(1–2):83–97Crossref, Google Scholar
- Iterative automated record linkage using mixture models. J. Amer. Statist. Assoc. (2001) 96(453):32–41Crossref, Google Scholar
- A theory of attribute equivalence in databases with application to schema integration. IEEE Trans. Software Engrg. (1989) 15(4):449–463Crossref, Google Scholar
- Combinatorial Optimization: Networks and Matroids (1976) (Holt, Rinehart, and Winston, New York) Google Scholar
- Semantic integration in heterogeneous databases using neural networks. Proc. 20th Internat. Conf. on Very Large Data Bases. (1994) (Santiago, Chile)1–12Google Scholar
- Normal hypergraphs and perfect graph conjecture. Discrete Math. (1972) 2:253–267Crossref, Google Scholar
- Efficient clustering of high-dimensional data sets with application to reference matching. Proc. Sixth ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining. (2000) (Boston, MA)1–12Crossref, Google Scholar
- Text classification from labeled and unlabeled documents using EM. Machine Learning (2000) 39(2/3):103–134Crossref, Google Scholar
- Combinatorial Optimization: Algorithms and Complexity (1982) (Prentice Hall, Englewood Cliffs, NJ) Google Scholar
- Specifying inter-database dependencies in a multidatabase environment. IEEE Comput. (1991) 24(12):46–53Crossref, Google Scholar
- Theory of Linear and Integer Programming (1986) (Wiley, New York) Google Scholar
- A model for optimal linkage of records. J. Amer. Statist. Assoc. (1968) 63:1321–1332Crossref, Google Scholar
- Semantic heterogeneity as a result of domain evolution. ACM SIGMOD Record (1991) 20(4):16–20Crossref, Google Scholar
- The interdatabase instance identification problem in integrating autonomous systems. Proc. Fifth Internat. Conf. on Data Engrg. (1989) (Los Angeles, CA)46–55Crossref, Google Scholar
- , Alvey W., Kliss B. Exact matching lists of businesses: Blocking, subfield identification, information theory. Record Linkage Techniques 1985 (1985) 227–241U.S. Internal Revenue Service Publication 1299Google Scholar
- Advanced methods for record linkage. Proc. Section on Survey Res. Methods (1994) (American Statistical Association, Alexandria, VA) 467–472Google Scholar

