Eva Müller, Uwe Klosa, Peter Hansson, Stefan Andersson, Erik Siira: Using XML for Long-term Preservation

Preface

DiVA - Digitala vetenskapliga arkivet (DiVA Archive) - is a comprehensive description of a searchable archive containing all documents, which are published in an electronic form at Uppsala University in Sweden. Other Swedish universities are also co-operating in the project within the DiVA framework. One part of this archive is the database containing theses published at Uppsala University from 1998 to date.

In September 2000 an Electronic Publishing Centre was established at Uppsala University Library. Its primary assignment was a project in which technical solutions, and a well-functioning workflow, for electronic posting and full-text publication of doctoral theses, essays, working papers and other types of scientific publications were to be created.

The first phase of the project was completed in 2002 and the result was the DiVA Publishing System - a system for electronic publishing of different types of publications.

One of the goals has been to create a long-term archive containing all digital documents published at Uppsala University. The assignment involves both technical and organisational issues. Developer team faced with many questions. How can the loss of data be avoided? What kind of descriptive and administrative metadata is useful for archiving? What is the appropriate metadata format for long time preservation? How important is the layout of the objects and how is it to be handled? How can images and formulas be handled?

Because of those questions, XML was discussed early on as a format for storing descriptive and administrative metadata, as well as for the complete content of the documents. XML represents a format that is easy to restore and understand by both humans and machines.

This paper will describe the current status of the XML implementation in DiVA Archive and the surrounding applications and why XML is an important format for long-term preservation.



© This publication and its compilation in form and content is copyrighted. Every realization which is not explicitly allowed by copyright law requires a written agreement. Especially, this holds for reprography and processing / storing by electronic systems.

ETD Proceeding DTD
HTML - Version create: Tue May 20 15:50:59 2003