Implementing Linguistic Query Languages Using LoToS
A linguistic database is a collection of texts where sentences and words are annotated with linguistic information, such as part of speech, morphology, and syntactic sentence structure. While early linguistic databases focused on word annotations, and later also on parse-trees of sentences (so-called treebanks), the recent years have seen a growing interest in richly annotated corpora of historic texts that include not only syntactic annotations but further complex annotations, such as alignments between related text layers. This raises the issue of efficiently querying such complex structured linguistic databases. We present a generic approach for defining domain-specific query languages that we use in developing a query language for richly annotated historic corpora. In our approach, a query language is defined as a set of predicates. A query in form of a logic rule is translated by our LoToS query compiler into a single, possibly deeply nested SQL query. In contrast to previous approaches, the annotation structures that can be queried need not be trees but can also form DAGs, or, for a restricted class of recursive queries, arbitrary graphs. To this end, LoToS offers an operator for computing transitive closures using the recursive capabilities of modern database systems. We believe that this is the first approach to use modern SQL capabilities for evaluating recursive predicates in logic-based query languages.
Files in this item