Show simple item record

2006-01-01Buch DOI: 10.18452/2468
Relationship-Based Duplicate Detection
dc.contributor.authorWeis, Melanie
dc.contributor.authorNaumann, Felix
dc.date.accessioned2017-06-15T17:11:16Z
dc.date.available2017-06-15T17:11:16Z
dc.date.created2006-12-07
dc.date.issued2006-01-01
dc.identifier.issn0863-095X
dc.identifier.urihttp://edoc.hu-berlin.de/18452/3120
dc.description.abstractRecent work both in the relational and the XML world have shown that the efficacy and efficiency of duplicate detection is enhanced by regarding relationships between ancestors and descendants. We present a novel comparison strategy that uses relationships but disposes of the strict bottom-up and topdown approaches proposed for hierarchical data. Instead, pairs of objects at any level of the hierarchy are compared in an order that depends on their relationships: Objects with many dependants influence many other duplicity-decisions and thus it should be decided early if they are duplicates themselves. We apply this ordering strategy to two algorithms. RECONA allows to re-examine an object if its influencing neighbors turn out to be duplicates. Here ordering reduces the number of such re-comparisons. ADAMA is more efficient by not allowing any re-comparison. Here the order minimizes the number of mistakes made.eng
dc.language.isoeng
dc.publisherHumboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Informatik
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc004 Informatik
dc.titleRelationship-Based Duplicate Detection
dc.typebook
dc.identifier.urnurn:nbn:de:kobv:11-10071454
dc.identifier.doihttp://dx.doi.org/10.18452/2468
dc.subject.dnb28 Informatik, Datenverarbeitung
local.edoc.pages20
local.edoc.type-nameBuch
local.edoc.container-typeseries
local.edoc.container-type-nameSchriftenreihe
local.edoc.container-year2006
dc.identifier.zdb2942054-4
bua.series.nameInformatik-Berichte
bua.series.issuenumber2006,205

Show simple item record