2005-03-29Buch DOI: 10.18452/2437
A Query Language for Biological Networks
Many areas of modern biology are concerned with the management, storage, visualization, comparison, and analysis of networks. For instance, networks are used to model signal transduction and metabolic pathways, gene regulation, and interaction of molecules in general. A large number of databases have emerged that collect and provide information on cellular networks and protein interaction. However, most users and applications are not concerned with entire databases, but search for specific subsets of the data. For these purposes, it is essential to be able to describe the desired sub-network as specific as necessary and as simple as possible. Despite the increased importance of network data in biology, there still exists no proper language for describing and retrieving specific parts of a network. In this paper, we introduce the pathway query language (PQL) for retrieving specific parts of large, complex networks. The language is based on a simple graph data model with extensions reflecting properties of biological objects. PQL queries match arbitrary subgraphs in the database based on node properties and paths between nodes. PQL is a powerful language, being able to express graph isomorphism. A specific feature is that the result of a query is de-coupled from the matched subgraph. Thus, a query may require a certain structure in the database to exist, but return a different subgraph. Furthermore, the result of a PQL query itself is a graph and can be used in further queries, which allows for query composition, query nesting, and graph views, features well known from relational databases. PQL is easy to learn for everybody with a basic knowledge of SQL. It is implemented on top of a relational database. A query is compiled into a stored procedure which returns the resulting graph in temporary tables. All computation is performed by relational queries, thus exploiting the capabilities of modern database systems in terms of query optimization and memory management. The code is for free available from the author.
Dateien zu dieser Publikation
Is Part Of Series: Informatik-Berichte - 187, ISSN:0863-095X