Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral • processing of large XML data using XSLT with optimal memory complexity • formal.

Download Report

Transcript Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral • processing of large XML data using XSLT with optimal memory complexity • formal.

Streaming Processing of Large XML Data
Jana Dvořáková, Filip Zavoral
• processing of large XML data using XSLT
with optimal memory complexity
• formal model / implementation framework
• analyzer, SSXT / BUXT transformer
SSXT - streaming transducer
• Simple Streaming Xml Transducer
• no backward axis, no predicates, no variables
• order-preserving
• branch-disjoint
•  stack / document depth
• BUXT - Buffering Transducer
Xord framework - Analyzer
Analyzer
XSLT & XSD: virtually applies templates to schema
all possible node sequences are processed
regexp
all possible node sequences selected by XPath expressions
possible reading orders of the elements
names
sequence of element names in the order they are called
represents the processing order of the elements
SSXT Transformer
• Polymorphic stack
– two types of transformation states - DFA & CC
– related to current document level
• sequence of deterministic finite automata states
–
–
–
–
concurrent evaluation of XPath expressions
single DFA for each expression
start-tag → DFA transition
final state → template call
• cycle configuration
– template and template call being processed
Evaluation & Comparison
Memory consumption (MB) of SSXT algorithm and tree-based XSLT
processors for input XML data of different size
DBLP.xml ≈ 700 MB
1300
20
168
92
18
1250
1252
16
14
1200
1200
12
1164
1150
10
8
1100
1104
6
4
2
0
MB
1068
1050
1000
10K
30K
Saxon
100K
Xerces
300K
LibXslt
1M
SSXT
10M
elements
950
8
50
100
150
200
Future work
• Future work
– buffering transformer optimizations and evaluation
– multipass streaming algorithms
– overcoming some restrictions to XSLT constructs