| edoc-Server der Humboldt-Universität zu Berlin |
| Autor(en): | Heiko Müller; Johann Christoph Freytag; Ulf Leser | Titel: | On the Distance of Databases |
| Erscheinungsjahr: | 2006 |
| Erschienen in: |
Informatik-Berichte 199 ISSN: 0863-095X |
| Volltext: | pdf (urn:nbn:de:kobv:11-10062551) |
| Fachgebiet(e): | Informatik |
| Herausgeber: | Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Informatik |
| Metadatenexport:
|
Endnote Bibtex |
| print on demand:
|
|
| Diese Seite taggen:
|
| Abstract (eng): | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| We study the novel problem of efficiently computing the update distance for a pair of relational databases. In analogy to the edit distance of strings, we define the update distance of two databases as the minimal number of set-oriented insert, delete and modification operations necessary to transform one database into the other. We show how this distance can be computed by traversing a search space of database instances connected by update operations. This insight leads to a family of algorithms that compute the update distance or approximations of it. In our experiments we observed that a simple heuristic performs surprisingly well in most considered cases. Our motivation for studying distance measures for databases stems from the field of scientific databases. There, replicas of a single database are often maintained at different sites, which typically leads to (accidental or planned) divergence of their content. To re-create a consistent view, these differences must be resolved. Such an effort requires an understanding of the process that produced them. We found that minimal update sequences are a proper representation of systematic errors, thus giving valuable clues to domain experts responsible for conflict resolution. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Zugriffsstatistik:
Bei Formatversionen eines Dokuments, die aus mehreren Dateien bestehen (insbesondere HTML), wird jeweils der monatlich höchste Zugriffswert auf eine der Dateien (Kapitel) des Dokuments angezeigt. Um die detaillierten Zugriffszahlen zu sehen, fahren Sie bitte mit dem Mauszeiger über die einzelnen Balken des Diagramms. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gesamtzahl der Zugriffe seit May 2011:
|