Logo of Humboldt-Universität zu BerlinLogo of Humboldt-Universität zu Berlin
edoc-Server
Open-Access-Publikationsserver der Humboldt-Universität
de|en
Header image: facade of Humboldt-Universität zu Berlin
View Item 
  • edoc-Server Home
  • Artikel und Monographien
  • Zweitveröffentlichungen
  • View Item
  • edoc-Server Home
  • Artikel und Monographien
  • Zweitveröffentlichungen
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
All of edoc-ServerCommunity & CollectionTitleAuthorSubjectThis CollectionTitleAuthorSubject
PublishLoginRegisterHelp
StatisticsView Usage Statistics
View Item 
  • edoc-Server Home
  • Artikel und Monographien
  • Zweitveröffentlichungen
  • View Item
  • edoc-Server Home
  • Artikel und Monographien
  • Zweitveröffentlichungen
  • View Item
2005-04-01Konferenzveröffentlichung DOI: 10.18452/9202
Schema Matching using Duplicates
Bilke, Alexander
Naumann, Felix
Mathematisch-Naturwissenschaftliche Fakultät II
Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.
Files in this item
Thumbnail
2424X0HVRpwc.pdf — Adobe PDF — 139.1 Kb
MD5: 4048ead631037a7181e5d9d7cda8c9ca
Cite
BibTeX
EndNote
RIS
InCopyright
Details
DINI-Zertifikat 2019OpenAIRE validatedORCID Consortium
Imprint Policy Contact Data Privacy Statement
A service of University Library and Computer and Media Service
© Humboldt-Universität zu Berlin
 
DOI
10.18452/9202
Permanent URL
https://doi.org/10.18452/9202
HTML
<a href="https://doi.org/10.18452/9202">https://doi.org/10.18452/9202</a>