Eva Müller, Uwe Klosa, Peter Hansson, Stefan Andersson, Erik Siira: Using XML for Long-term Preservation
Using XML for Long-term Preservation
Experiences from the DiVA Project
Eva Müller
Uwe Klosa
Peter Hansson
Stefan Andersson
Erik Siira

Uppsala University Library, Electronic Publishing Centre

eva.muller@ub.uu.se
uwe.klosa@ub.uu.se
peter.hansson@ub.uu.se
stefan.andersson@ub.uu.se
erik.siira@ub.uu.se

Box 510, 75 120 Uppsala, Sweden

http://publications.uu.se

Keywords:
long-term preservation, XML, XML Schema, DiVA, DiVA Document Format, DiVA Archive, URN, URN:NBN

Abstract

One of the objectives of the DiVA project is to explore the possibility of using XML as a format for long-term preservation. For this reason, the practical use of XML in different parts of the system was evaluated before deciding on the design.

The DiVA Document Format - defined by an XML schema - has been developed to describe the inter-relationships amongst the various data elements and processes, and to support long-term preservation of the actual documents.

XML Schema provides a means for defining the structure, content and semantics of XML documents. It is an XML based alternative to the XML Document Type Definition (DTD). Because one of the primary reasons for using XML was to support long-term preservation, the most popular DTDs for documents: DocBook and TEI were evaluated. Limitations regarding metadata descriptions were found in both of these DTDs, so the decision to develop a new structure for DiVA, using XML schema, was made. This schema combines the DocBook Schema (derived from the DocBook DTD) for the textual parts of the document with the internal schema for all metadata (bibliographic and administrative data).

Using the DiVA Document Format for content management and inter-process communication, several applications were developed. Some of their purposes are essential for long-term preservation:

Currently the file-archives for long-term preservation contain the original full-text file in various formats and the DiVA Document Format file, which contains all the metadata about the document. Furthermore the DiVA Document Format file contains all parts of the full-text file that can be converted into XML. In the future it might be possible to transfer the whole full-text into XML, in which case the file-archives would contain only DiVA Document Format files.


Table of Contents

Front pageUsing XML for Long-term Preservation
Preface Preface
1 XML as Long-term Preservation Format
1.1XML Schema
1.2Comparison of DocBook and TEI
1.3DiVA Document Format
2 Long-term Preservation in the DiVA Project
2.1Uniform Resource Name (URN) and National Bibliographic Number (NBN)
2.2The DiVA Archive
3 Conclusions
Appendix A Appendix

Table of Figures

Figure 1: Structure of the DiVA Archive
Figure 2: Graphical representation of the complex type personType
Figure 3: Graphical representation of the complex type organisationType


© This publication and its compilation in form and content is copyrighted. Every realization which is not explicitly allowed by copyright law requires a written agreement. Especially, this holds for reprography and processing / storing by electronic systems.

ETD Proceeding DTD
HTML - Version create: Tue May 20 15:50:59 2003