Modeling institutional research data repositories using the DCAT3 Data Catalog Vocabulary
A case study on TUdatalib
Philosophische Fakultät
Semantic Web and Linked Data technologies might solve issues originating from
research data being published by independent providers. For maximum
benefit from these technologies, metadata should be provided as standardized
as possible. The Data Catalog Vocabulary (DCAT) is a W3C recommendation
of potential value for Linked Data exposure of research data metadata.
The suitability of DCAT for institutional research data repositories was investigated
using the TUdatalib repository as study case. A model for TUdatalib
metadata was developed based on the analysis of selected resources and
guided by a draft of DCAT 3. The model allowed for providing the essential
information about the repository structure and contents indicating suitability
of the vocabulary and, conceptually, should permit automated data conversion
from the repository system to DCAT 3. A loss of expressiveness comes from
the omission of dataset series. Conformance with DCAT 3 class definitions
led to a highly complex model, thus creating challenges with actual technical
realizations. A comparative study revealed simpler models to be used at two
other repositories, but implementation of the TUdatalib or a similar model
would have potential to improve alignment to DCAT specifications.
DCAT 3 was observed to be a promising option for Linked Data exposure of
institutional research data repository metadata and the TUdatalib model might
serve towards developing a general DCAT 3 application profile for institutional
and other research data repositories.