Approximate tree embedding for querying XML data
Mathematisch-Naturwissenschaftliche Fakultät II
Querying heterogeneous collections of data-centric XML documents requires a combination of database languages and concepts used in information retrieval, in particular similarity search and ranking. In this paper we present an approach to find approximate answers to formal user queries. We reduce the problem of answering queries against XML document collections to the well-known unordered tree inclusion problem. We extend this problem to an optimization problem by applying a cost model to the embeddings. Thereby we are able to determine how close parts of the XML document match a user query. We present an efficient algorithm that finds all approximate matches and ranks them according to their similarity to the query.
Notes