Yahia Bakelli, Sabrina Benrahmoun: Long-term preservation of ETDs in Algeria |
Regarding the way in which the current CERIST ETD system is operating we must notice that theses files are stored in PDF format in two separated hard disks. Original floppy disks and CDROMs are kept into boxes and cupboards as given by students. Moreover and even if Bibliographic recorded are entered respecting UNIMARC but they were saved into a proper format of .THE generated by the SYNGEB software. These observations lead us to ask some of questions regarding the future use of these stored files:
PDF format is highly recommended as a delivery format but what would guarantee the independency of the archived files from future Adobe business plan?
Actually it consists more on predictable problems than on questions. So what we would like to argue is the fact that the archiving module of the CERIST ETD system must be redesigned in order to avoid all these constraints. Some procedures and tools must be integrated into this ETD system in order to make media safe all the time and their files permanently readable and independent from the evolution of machines; plate-forms and softwares. Also ETDs content must be archived in a way to be manipulated and reused directly without need to preliminary operations. It must also be saved in an economic way that gives possibility to deliver the same content in different forms and contexts (full text database; OPAC, Internet portal; digital library...) and for different user‘s profiles.
What we learn from international experiences is the necessity to distinguish between two main levels:
a) Conservation of the Digital media itself.
b) Preservation of data and content of the ETD.
Currently we are operating three set of experimentations into a sample of submitted electronic theses. The sample is about 430 media (30% of the whole collection). Tests and experimentations are concerning the two mentioned levels. However we are mainly focusing on the second one i.e. preservation of content:
Do we have to opt for the well formed XML Files or Valid Xml Files? The first kind of XML has the advantage to be simple to produce and economic for a massive workflow chains but it presents the inconvenient to decrease possibilities of later automated manipulations. The second kind of XML files is of course better but needs many manual corrections and more time before the save of the file into the archive.
Now we are comparing two existed XML DTDs:
a) The DiML developed at the Humboldt University of Berlin (http://edoc.hu-berlin.de/diml/), and which is adapted from the DTD developed in 1985 at Virginia Tech (http://etd.vt.edu/).
b) The TeiLite DTD as adopted by certain ETDs chains such those of the Presses de l‘Université de Montreal (Canada) and Université Lumière Lyons2 (France) within the Cyberthèses chain.
This comparative study of DTDs is based on the following parameters:
As the digital content archiving is concerning not only the full text of theses but also their metadata, we are generating a metadata of the chosen sample of ETDs. In this way we decided to adopt the model of the ETDMS of Virginia Tech. An adapted DC metadata which we are generating in XML format. One of the most interesting results of this test is the demonstration of the feasibility of this standard not only for texts in Latin languages but also for Arabic texts.
As another output of our experimentations we are designing a naming scheme of the dissertations collection to serve as a protocol of how files must be stored in directories. This scheme must take into consideration the adopted codification system (see section 2. c). However we have ambition to go beyond the simple class of disciplines<1>. In this way the URN handle-server technique is currently applied for the CERIST ETDs sample.
| Footnotes: | |
|---|---|
|
1, for technology and pure sciences, 2 for medicine and life sciences and 3 for human and social sciences. |
© This publication and its compilation in form and content is copyrighted. Every realization which is not explicitly allowed by copyright law requires a written agreement. Especially, this holds for reprography and processing / storing by electronic systems.
|
ETD Proceeding DTD |
HTML - Version create: Fri May 16 14:25:03 2003 |