Session A: Changes in University Organisation and Structure
Stefan Gradmann: Reducing White Noise

4. Semantic noise reduction

The core field of noise reduction is the semantic information level, since actual relevance of information required for focusing, filtering and aggregation of information can only be determined on this level. The examples given below illustrate possible lines of action in this area and indicate two of the potential players that have to interact in order to reduce information overload (or also prevent overload from being generated).

4.1 The role of Libraries

Apart from acquiring content and making this content available for their users libraries always systematically been concerned with content selection and thus with semantic focusing of information systems. This concern with content selection was never exclusively motivated by the sparseness of material resources: a prominent objective of this activity always has been to distinguish potentially relevant items from clearly irrelevant information and furthermore to aggregate potentially relevant material by means of subject indexing, classification and the creation of bibliographies. One of the most important activities in this respect was the cataloguing of information items, in other words: generation of metadata. Libraries thus have a long tradition in fine-granularity noise reduction and content focusing - but in the world of printed documents only, with hardly any expertise as to the only just emerging techniques that are relevant for electronic content.

However, the traditional strengths and competences of libraries can be made to contribute efficiently to the goal of noise reduction in the changed information paradigm of networked electronic information resources provided these libraries extend their expertise with new techniques of work and can be made to co-operate intensely with the new players in this rapidly changing infrastructure.

One of the traditional areas of librarian competence most heavily affected by this need is the process of metadata generation. Cataloguing as traditionally practiced in libraries is a time-consuming and expensive activity even in its original context of printed publications. This process becomes completely inappropriate and impossible to sustain with the advent of networked electronic information resources: the traditional cataloguing approach has not the slightest chance to catch up with these rapidly proliferating bits of information and even if this was theoretically possible no institution could pay the price of such an attempt.

On the other hand, metadata are an excellent antidote for white noise information, among the most efficient means for enhancing retrieval precision and thus reducing information overload potential. Unfortunately, metadata - as far as these are already available - are seldom used by popular search engines, mainly because of past abuse of HTML meta tags by commercial players wishing to ensure a prominent position of their pages in the result set ranking algorithms of these search engines. New, sustainable strategies for generating metadata are thus required, and means have to be sought to massively increase the use of meta-information in the WWW infrastructure. One way to do so is to massively involve the producers of electronic information in the process of metadata generation, which itself must be substantially simplified in comparison with traditional bibliographic standards and at same time ensure a minimum of coherence and consistency in metadata output by producing meta-information complying to an institutional quality policy and which is certified by this institution.

At Hamburg University, we try to put such concepts to work in two ways, both of them heavily involving our libraries but - unlike in the former cataloguing models - in close co-operation with our scientific staff:

However, such approaches - even if they may substantially improve the information ecology - only apply to ’syntactic‘ metadata (the equivalents of bibliographic descriptions) and do not solve the problem of semantic aggregation (assignment of subject indexing terms and/or keywords as well as classification), which cannot be done intellectually/manually for the multitude of resources concerned. Furthermore, the two steps sketched above only concern resources we produce ourselves at Hamburg University: external information resources integrated into our systems do not necessarily comply to our internal standards: they may contain metadata complying to different - even superior - standards or even no metadata at all.

Currently, we therefore are preparing a project supposed to deliver tools for the use by librarians and scientific staff in two ways:

detection, extraction and - if needed - on the fly conversion of metadata present in external document resources for ingesting these together with the document resources themselves into our information system

automated generation of ’semantic‘ metadata (keywords, lexical clusters) using lexicon-based approaches for semantic extraction, aggregation and filtering combined with morphological normalization techniques in order to produce controlled vocabulary associated to documents ingested in our repositories as well as automatically generated abstracting information.

In preparing this project, we build heavily on existing know-how and work already done, especially in the domain of metadata extraction, by such institutions as UKOLN (’DC-DOT‘) or OCLC. Partners with strong experience in language engineering will supply the linguistic techniques needed for semantic aggregation procedures. Furthermore, ongoing national and international projects and initiatives are closely monitored, and be it only to do as little work locally as possible. Examples of such external initiatives are the German Carmen project or the Open Archives Initiative, but also current work being done by scientific publishers. Our specific task is to pull these elements together and to define a consistent information policy for our university integrating these and other technological approaches into an institution-wide strategy for white noise reduction and for preventing information overload.

4.2 The Centre for Media Competence

Institutional players such as libraries and computer centres thus can do a lot to reduce white noise in information services, but these efforts of very moderate use only, as long as the users of such services are not aware of the problem and have not been taught to use and generate information resources efficiently themselves. The building of a centre for media competence currently under way at Hamburg University - among other concerns - is a reaction to this aspect of the problem. To some extent, the centre for media competence is conceived as a ’traditional‘ multi-media centre providing the relevant infrastructure for use and production of multi-media resources and for archiving and preserving multi-media content.

A very strong emphasis, however, is put on building competence within the academic user community and thus enabling these students and searchers to make efficient use of multi-media resources in their learning and teaching work with specific emphasis on the building of efficient environments for multi-media based tele-teaching and tele-learning. Multimedia can do a lot of harm in adding tremendous volumes of white noise to the information ecology of an academic institution. It would therefore not have been sufficient to conceive our multi-media centre as a mere institution for generation an accumulation of multimedia content: the aspect of user education is seen as a key factor for such an institution to make a useful contribution to the university‘s information infrastructure instead of just setting up yet another powerful means for information pollution.



© This publication and its compilation in form and content is copyrighted. Every realization which is not explicitly allowed by copyright law requires a written agreement. Especially, this holds for reprography and processing / storing by electronic systems.

EUNIS Proceeding DTD Version 1.0
HTML - Version create: Fri Mar 23 14:32:52 2001