下載/瀏覽

Download Report

Transcript 下載/瀏覽

An Efficient XML Query
Processing Based on
Combining T-Bitmap and Index
Techniques
Yin-Fu Huang and Shing-Hang Wang
Graduate School ofComputer Science and Information Engineering
National Yunlin University ofScience and Technology, Taiwan
[email protected], [email protected]
報告者:劉芸如
2010/3/4
1
Outline


Introduction
Query Processing




Experiments



XML Storage Model
Preprocessing
Query Processing Algorithm
Experiments on XMark
Experiments on Synthesis Documents
Conclusions
2
Introduction





According to their properties, these algorithms could
be classified two types such as navigation and
structure join.
Navigation is to travel XML documents and find the
answers.
structure join is based on the numbering scheme to
determine element relationships.
In the paper, for an XML document, we build TBitmap for each element to filter out useless nodes.
Besides, two indices called tag index and value
index are built to improve the search efficiency while
processing the ancestor-descendant axis and value
nodes.
3
XML Storage Model





Here, we parse an XML document and build relevant
indices based on the storage model with a DOM-like
interface.
To facilitate query processing, two kinds of
information are considered in the storage model; i.e.,
containing code and T-Bitmap.
assign each node a containing code in the storage
model, as shown in Figure 2.
According to the T-Bitmap value of a structure node,
we can acquire what descendant nodes are beneath
the node.
T-Bitmap is a bit string, and each bit position
corresponds to a distinct tag name.
4
5
Preprocessing




To assign a containing code for each
node, we use depth-first search to
traverse an XML document tree.
To label T-Bitmap of each structure
node.
Next, we compute T-Bitmap of a node
by combing T-Bitmap of all children.
Here, we use bit-wise "OR" operators to
combine them, and get all signatures of
6
descendant nodes.




two kinds of indices to be built in the storage model.
One is a tag index, and another is a value index.
We utilize the well-known B+-tree algorithm to
construct a tag index tree where the start value in a
containing code is used as an index key.
In the tag index tree for "TEL", an internal node has
m keys and m+1 pointers pointing at the next level,
whereas a leaf node with the format [start, end,
pointer] represents multiple "TEL“ tags within a
document.
The (start, end) value is a containing code, and the
corresponding pointer points at where data is stored.
7
8
Query Processing Algorithm




FetchNode() is to fetch nodes according to
the specified query.
OutputANS() is to output all the query results
in an XML document.
it utilizes a stack technique to accelerate the
output.
ReturnNode() is to check fetched nodes and
return them in sequence to OutputANS().
9
Figure 4. Algorithm structure
10
Figure 5. Navigation using T-Bitmap
11
Experiments
on XMark
12
Experiments on Synthesis
Documents
Figure 7. Execution time under different fanout
13
14
Conclusions



The algorithm is expected to facilitate
reducing the number of accessed nodes,
thereby improving the query performance
on XML documents.
To achieve this goal, T-Bitmap, tag index,
and value index are built for XML
documents to filter out the nodes not
contributing to final results.
The results show that actually our method
has better performances than others.
15