Logo of Humboldt-Universität zu BerlinLogo of Humboldt-Universität zu Berlin
edoc-Server
Open-Access-Publikationsserver der Humboldt-Universität
de|en
Header image: facade of Humboldt-Universität zu Berlin
View Item 
  • edoc-Server Home
  • Schriftenreihen und Sammelbände
  • Fakultäten und Institute der HU
  • Institut für Informatik
  • Informatik-Berichte
  • View Item
  • edoc-Server Home
  • Schriftenreihen und Sammelbände
  • Fakultäten und Institute der HU
  • Institut für Informatik
  • Informatik-Berichte
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
View Item 
  • edoc-Server Home
  • Schriftenreihen und Sammelbände
  • Fakultäten und Institute der HU
  • Institut für Informatik
  • Informatik-Berichte
  • View Item
  • edoc-Server Home
  • Schriftenreihen und Sammelbände
  • Fakultäten und Institute der HU
  • Institut für Informatik
  • Informatik-Berichte
  • View Item
2006-01-01Buch DOI: 10.18452/2468
Relationship-Based Duplicate Detection
Weis, Melanie
Naumann, Felix
Recent work both in the relational and the XML world have shown that the efficacy and efficiency of duplicate detection is enhanced by regarding relationships between ancestors and descendants. We present a novel comparison strategy that uses relationships but disposes of the strict bottom-up and topdown approaches proposed for hierarchical data. Instead, pairs of objects at any level of the hierarchy are compared in an order that depends on their relationships: Objects with many dependants influence many other duplicity-decisions and thus it should be decided early if they are duplicates themselves. We apply this ordering strategy to two algorithms. RECONA allows to re-examine an object if its influencing neighbors turn out to be duplicates. Here ordering reduces the number of such re-comparisons. ADAMA is more efficient by not allowing any re-comparison. Here the order minimizes the number of mistakes made.
Files in this item
Thumbnail
205.pdf — Adobe PDF — 732.6 Kb
MD5: 78886ff28538bd8e7eb382f1d2b40bee
Cite
BibTeX
EndNote
RIS
InCopyright
Details
DINI-Zertifikat 2019OpenAIRE validatedORCID Consortium
Imprint Policy Contact Data Privacy Statement
A service of University Library and Computer and Media Service
© Humboldt-Universität zu Berlin
 
DOI
10.18452/2468
Permanent URL
https://doi.org/10.18452/2468
HTML
<a href="https://doi.org/10.18452/2468">https://doi.org/10.18452/2468</a>