Transcript 下載/瀏覽
An Efficient XML Query Processing Based on Combining T-Bitmap and Index Techniques Yin-Fu Huang and Shing-Hang Wang Graduate School ofComputer Science and Information Engineering National Yunlin University ofScience and Technology, Taiwan [email protected], [email protected] 報告者:劉芸如 2010/3/4 1 Outline Introduction Query Processing Experiments XML Storage Model Preprocessing Query Processing Algorithm Experiments on XMark Experiments on Synthesis Documents Conclusions 2 Introduction According to their properties, these algorithms could be classified two types such as navigation and structure join. Navigation is to travel XML documents and find the answers. structure join is based on the numbering scheme to determine element relationships. In the paper, for an XML document, we build TBitmap for each element to filter out useless nodes. Besides, two indices called tag index and value index are built to improve the search efficiency while processing the ancestor-descendant axis and value nodes. 3 XML Storage Model Here, we parse an XML document and build relevant indices based on the storage model with a DOM-like interface. To facilitate query processing, two kinds of information are considered in the storage model; i.e., containing code and T-Bitmap. assign each node a containing code in the storage model, as shown in Figure 2. According to the T-Bitmap value of a structure node, we can acquire what descendant nodes are beneath the node. T-Bitmap is a bit string, and each bit position corresponds to a distinct tag name. 4 5 Preprocessing To assign a containing code for each node, we use depth-first search to traverse an XML document tree. To label T-Bitmap of each structure node. Next, we compute T-Bitmap of a node by combing T-Bitmap of all children. Here, we use bit-wise "OR" operators to combine them, and get all signatures of 6 descendant nodes. two kinds of indices to be built in the storage model. One is a tag index, and another is a value index. We utilize the well-known B+-tree algorithm to construct a tag index tree where the start value in a containing code is used as an index key. In the tag index tree for "TEL", an internal node has m keys and m+1 pointers pointing at the next level, whereas a leaf node with the format [start, end, pointer] represents multiple "TEL“ tags within a document. The (start, end) value is a containing code, and the corresponding pointer points at where data is stored. 7 8 Query Processing Algorithm FetchNode() is to fetch nodes according to the specified query. OutputANS() is to output all the query results in an XML document. it utilizes a stack technique to accelerate the output. ReturnNode() is to check fetched nodes and return them in sequence to OutputANS(). 9 Figure 4. Algorithm structure 10 Figure 5. Navigation using T-Bitmap 11 Experiments on XMark 12 Experiments on Synthesis Documents Figure 7. Execution time under different fanout 13 14 Conclusions The algorithm is expected to facilitate reducing the number of accessed nodes, thereby improving the query performance on XML documents. To achieve this goal, T-Bitmap, tag index, and value index are built for XML documents to filter out the nodes not contributing to final results. The results show that actually our method has better performances than others. 15