2005-09-01Konferenzveröffentlichung DOI: 10.1007/11547686
Declarative Data Fusion
Syntax, Semantics, and Implementation
Mathematisch-Naturwissenschaftliche Fakultät II
In today's integrating information systems data fusion, i.e., the merging of multiple tuples about the same real-world object into a single tuple, is left to ETL tools and other specialized software. While much attention has been paid to architecture, query languages, and query execution, the final step of actually fusing data from multiple sources into a consistent and homogeneous set is often ignored. This paper states the formal problem of data fusion in relational databases and discusses which parts of the problem can already be solved with standard Sql. To bridge the final gap, we propose the SQL Fuse By statement and define its syntax and semantics. A first implementation of the statement in a prototypical database system shows the usefulness and feasibility of the new operator.