XStruct
dc.contributor.author | Hegewald, Jan | |
dc.contributor.author | Naumann, Felix | |
dc.contributor.author | Weis, Melanie | |
dc.date.accessioned | 2017-06-17T00:23:16Z | |
dc.date.available | 2017-06-17T00:23:16Z | |
dc.date.created | 2006-07-05 | |
dc.date.issued | 2006-04-01 | |
dc.identifier.uri | http://edoc.hu-berlin.de/18452/9866 | |
dc.description.abstract | XML is the de facto standard format for data exchange on the Web. While it is fairly simple to generate XML data, it is a complex task to design a schema and then guarantee that the generated data is valid according to that schema. As a consequence much XML data does not have a schema or is not accompanied by its schema. In order to gain the benefits of having a schema - efficient querying and storage of XML data, semantic verification, data integration, etc.- this schema must be extracted. In this paper we present an automatic technique, XStruct, for XML Schema extraction. Based on ideas of [5], XStruct extracts a schema for XML data by applying several heuristics to deduce regular expressions that are 1-unambiguous and describe each element’s contents correctly but generalized to a reasonable degree. Our approach features several advantages over known techniques: XStruct scales to very large documents (beyond 1GB) both in time and memory consumption; it is able to extract a general, complete, correct, minimal, and understandable schema for multiple documents; it detects datatypes and attributes. Experiments confirm these features and properties. | eng |
dc.language.iso | eng | |
dc.publisher | Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Metadata | eng |
dc.subject | XML Schema | eng |
dc.subject.ddc | 004 Informatik | |
dc.title | XStruct | |
dc.type | conferenceObject | |
dc.identifier.urn | urn:nbn:de:kobv:11-10065894 | |
dc.identifier.doi | http://dx.doi.org/10.18452/9214 | |
local.edoc.type-name | Konferenzveröffentlichung | |
local.edoc.container-type | conference | |
local.edoc.container-type-name | Konferenz | |
local.edoc.container-year | 2006 | |
dc.description.version | Peer Reviewed | |
dc.description.event | Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, 2006, pp 81-81, 22nd International Conference on Data Engineering Workshops (ICDEW 06), Atlanta, Georgia, USA, 03.04.2006 - 07.04.2006 | |
dcterms.bibliographicCitation.doi | 10.1109/ICDEW.2006.166 | |
dcterms.bibliographicCitation.booktitle | 22nd International Conference on Data Engineering Workshops (ICDEW'06) | |
dcterms.bibliographicCitation.booktitle | 22nd International Conference on Data Engineering Workshops (ICDEW'06) | |
dcterms.bibliographicCitation.booktitle | Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006 | |
dcterms.bibliographicCitation.originalpublishername | IEEE Computer Society | |
dcterms.bibliographicCitation.originalpublisherplace | Atlanta, Georgia, USA | |
dcterms.bibliographicCitation.pagestart | 81 | |
dcterms.bibliographicCitation.pageend | 81 | |
bua.department | Mathematisch-Naturwissenschaftliche Fakultät II |