Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral • processing of large XML data using XSLT with optimal memory complexity • formal.
Download ReportTranscript Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral • processing of large XML data using XSLT with optimal memory complexity • formal.
Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral • processing of large XML data using XSLT with optimal memory complexity • formal model / implementation framework • analyzer, SSXT / BUXT transformer SSXT - streaming transducer • Simple Streaming Xml Transducer • no backward axis, no predicates, no variables • order-preserving • branch-disjoint • stack / document depth • BUXT - Buffering Transducer Xord framework - Analyzer Analyzer XSLT & XSD: virtually applies templates to schema all possible node sequences are processed regexp all possible node sequences selected by XPath expressions possible reading orders of the elements names sequence of element names in the order they are called represents the processing order of the elements SSXT Transformer • Polymorphic stack – two types of transformation states - DFA & CC – related to current document level • sequence of deterministic finite automata states – – – – concurrent evaluation of XPath expressions single DFA for each expression start-tag → DFA transition final state → template call • cycle configuration – template and template call being processed Evaluation & Comparison Memory consumption (MB) of SSXT algorithm and tree-based XSLT processors for input XML data of different size DBLP.xml ≈ 700 MB 1300 20 168 92 18 1250 1252 16 14 1200 1200 12 1164 1150 10 8 1100 1104 6 4 2 0 MB 1068 1050 1000 10K 30K Saxon 100K Xerces 300K LibXslt 1M SSXT 10M elements 950 8 50 100 150 200 Future work • Future work – buffering transformer optimizations and evaluation – multipass streaming algorithms – overcoming some restrictions to XSLT constructs