Kelsey Libner: CFP: Call for preservation!

3. Discussion of selected results

3.1.1 File formats: A closer look at Portable Document Format (PDF)

3.1.1.1 PDF and Adobe Systems Incorporated

Adobe‘s commitment to maintaining PDF as an open and published standard mitigates concerns about its copyright on the standard. However the format should not be viewed, necessarily, as the format of record over the very long term. Mark Ockerbloom writes:

As data formats go, PDF is particularly likely to be supported for a long time, and to spawn migration paths... Even so, it is likely that PDF will one day be superseded by another format. It may be a successor format (as PDF is to Postscript), or it may be a completely different format that users prefer over PDF. Hence, it is necessary to have migration strategies planned for PDF. (Ockerbloom, 2001)

Despite these long-term concerns, PDF is valuable in that it provides a relatively faithful rendering of a page and document across a range of platforms. To borrow the phrase of Michael J. Patrick of Ansyr Technology Corporation, PDF has good “paper fidelity“. At the same time, because the imaging model describes contents in an abstract way<4>, the format allows much more than simple rendering of page images. It allows text search; integration of multimedia objects; hyperlinks within and outside the document; access to content using alternative reading devices; and, using Adobe InDesign, export to XML. PDF may also contain raster images such as TIFF files. With the ability either to render pages from abstract descriptions or display flat page images, PDF works - for the moment - as a convenient transitional format from theses and dissertations on paper to those in digital form.

3.1.1.2 In search of an archival PDF format

The underlying model of PDF, based on the PostScript Page Description Language, has been relatively long in development and is not likely to change significantly. This foundation of the format seems fairly solid from a preservation perspective. It also offers the possibility of future export to another format. However recent changes to the specification concerning annotation, highlighting, digital signatures, and object transparency - plus the possibility of future changes - raise concerns about using PDF as the format of record.

A committee under the joint auspices of The Association for Suppliers of Printing, Publishing and Converting Technologies, and the Association for Information and Image Management, International has been formed to develop a standard archival format of PDF called PDF/A. The stated goal is to “develop an International standard that defines the use of the Portable Document Format (PDF) for archiving and preserving documents“ (NPES/AIIM, 2002). Unless and until a viable archival-PDF standard emerges, preservation plans must account for the limitations of PDF as we know it - what we might conservatively call non-archival PDF.

3.1.1.3 Archival storage of the main document

In a paper based on a presentation given at the ARMA 2000 conference, Steve Gilheany argues for the retention of a range of different formats: the native-format document, the vector-based PDF document, and TIFF files (Gilheany, 2000, cited in Teper and Kraemer, 2002). Each has advantages and disadvantages:

Table 1: Advantages and disadvantages

Format

Pros and cons

Word, LaTeX or other native format

Pro: Can be edited. Because it is the original form of the document, it does not contain conversion anomalies.

Con: Relatively short lifespan.

PDF-vector

Pro: More durable than Word format. Contains machine-readable text allowing search, document rendering on multiple devices, and greater accessibility than TIFF files.

Con: Subject to changes in PDF specification. Migration may change document formatting.

TIFF files

Pro: Extremely simple and durable. A de facto standard for digital masters (see Kenney, Rieger, & Entlich, 2003).

Con: File is “flat“, sacrificing functionality such as non-image multimedia files. Larger file size.

Questions to consider

3.1.2 File formats: Policies on supplementary files

Anecdotal evidence from the survey suggests that the percentages of ETDs submitted in formats other than PDF, HTML, and JPEG run quite low (for one institution, about two percent). Policies on supplementary files can be put into two general classes, „conservative“ and „liberal“. Under a „conservative“ policy, a limited number of alternative file formats is accepted but a strong commitment is made to preserving all files. Under a „liberal“ policy a broader range of file formats is accepted but the preservation commitment may vary for different formats.

Questions to consider


Footnotes:

<4>

“A high-level imaging model enables applications to describe the appearance of pages containing text, graphical shapes, and sampled images in terms of abstract graphical elements rather than directly in terms of device pixels.“ (Adobe Systems Incorporated, 2001, p. 10).

<5>

See the Policy Guide of the Digital Repository Service at http://hul.harvard.edu/ois/systems/drs/policyguide.html.



© This publication and its compilation in form and content is copyrighted. Every realization which is not explicitly allowed by copyright law requires a written agreement. Especially, this holds for reprography and processing / storing by electronic systems.

ETD Proceeding DTD
HTML - Version create: Mon May 19 13:44:33 2003