Object Identification Quality
Humboldt-Universität zu Berlin
Research and industry has tackled the object identification problem of data integration in many different ways. This paper presents a framework, that allows the evaluation of competing approaches. To this end, complexity measures and data characteristics are introduced, which reflect the hardness of a given object identification problem. All characteristics can be estimated by use of simple SQL queries and simple calculations. Following the principle of benchmark definitions we specify a test framework. It consists of a test database and its characteristics, quality criteria, and a test specification. Adequate measures needed for the correctness criterion of the benchmark are given. A running example of the Berlin Online Apartment-Advertisements database (BOA) illustrates the approach. The BOA-database is freely available at www.wiwiss.fu-berlin.de/lenz/boa/.