Natix - Al Akhawayn University

Download Report

Transcript Natix - Al Akhawayn University

Natix
Done by
Asmaa Hassanain
CSC 5370
Dr. Hachim Haddoutti
12/8/2003
Contents
XML data management Techniques
What is Natix
Natix Architecture
Storage Layer: Logical Data Model
Mapping between XML and the Logical Model
XML page Interpreter Storage Formater
XML segment mapping for large trees
Index Structures
Natix Physical Algebra
Example Plans
To do...
CSC 5370 XML and Data Management
2
XML data management
Techniques
 Map data to relational database
 But:
 Store data as a plain text file
 Unnormalized relations
 But:
Data as
centric
view: Large number
 Storedata
objects
 Need to parse the entire file for
of
tables
 But:
processing every query
 Designing
Nativecentric
XML database
 Document
all
OOD systems are view:
not enough
systemsinformantion
from
scratch
a single
data item
developed to in
provide
efficient
querying
capabilities
(e.g.
CLOB)
CSC 5370 XML and Data Management
3
Natix
CSC 5370 XML and Data Management
4
What is Natix?
 Natix is a native XML Repository
 Proposed by Kanne and Moerkotte at
University of Mannheim (Germany)
 Natix requires Linux to run (kernel
2.2.16 or later, or 2.4.*), with CODA
support enabled in the kernel.
 Still under development
CSC 5370 XML and Data Management
5
Natix Architecture
CSC 5370 XML and Data Management
6
Natix Architecture
Binding Layer: map between the Natix
Engine Interface and different
application interfaces
CSC 5370 XML and Data Management
7
Natix Architecture
e. g. NatixFS:
 File system interface – Natix can be mounted like
an ordinary file system
 Allows to view XML tree as a file system tree
 Importing a document – just copy it to a
directory, e.g. cp bib.xml /natix
 Exporting a document – just open it, e.g.
more /natix/bib.xml
 Removing a document – just delete a file, e.g.
rm /natix/bib.xml
 XPath expressions – just use it as file name, e.g.
more /natix/{%%title}
CSC 5370 XML and Data Management
8
Natix Architecture
Service Layer: Provides all DBMS
functionality required in addition to
simple storage and retrieval
 Natix Engine Interface
 Query execution engine
 Query compiler
 Transaction manager
 Object manager
CSC 5370 XML and Data Management
9
Natix Architecture
Natix Engine Interface:
 The interface through which the
database services communicate with
each other and with applications
 provides a unified facade to specify
requests to the database system.
CSC 5370 XML and Data Management
10
Natix Architecture
Query compiler: translates queries
expressed in XML query languages
into optimized query execution plans
CSC 5370 XML and Data Management
11
Natix Architecture
Query execution engine: evaluates
queries
 Interprets the plan passed by the
query compiler
 Able to execute all queries
expressible in a typical XML query
language like XQuery
CSC 5370 XML and Data Management
12
Natix Architecture
Transaction management : contains
classes that provide ACIDstyle
transactions + Components for
recovery
 adapt the ARIES protocol for
recovery
 For synchronization, an S2PLbased
scheduler is introduced
CSC 5370 XML and Data Management
13
Natix Architecture
Storage Layer: manages all persistent data
structures and their transfer between main
and secondary memory .
 contains classes for efficient XML storage,
indexes and metadata storage.
 manages the storage of the recovery log and
controls the transfer of data between main
and secondary storage.
 accesses raw disks or file system files and
provides a memory space divided into
segments, which are a linear collection of
equal-sized pages.
CSC 5370 XML and Data Management
14
Storage Layer: Logical Data Model
Logical Data Model: logical tree
 New nodes can be inserted as children
or siblings of existing nodes
 Any node can be removed
 Individual documents are represented
as ordered trees
CSC 5370 XML and Data Management
15
Mapping between XML and the
Logical Model
A small wrapper class is used to map the
XML model with its node types and
attributes to a simple tree model and vice
versa:
 Elements are mapped one to one to tree
nodes of Logical Data Model
 Atributes are mapped to child nodes of an
additional attribute container child node
 The name of referenced entities are
retained in special internal nodes
CSC 5370 XML and Data Management
16
XML page Interpreter Storage
Formater
 The logical data tree is partitioned
into subtrees
 Each sudtree is stored in a single
record of variable lenght
 Each record contains a pointer to
the record containing the parent
node and the document identifier
CSC 5370 XML and Data Management
17
XML page Interpreter Storage
Formater
 Subtrees of original XML document are
stored together in a single physical record
 clusters connected subtrees of the
document tree into large records and
represents intra-record references
differently from inter-record references
 The inner structure of the subtrees is
retained
CSC 5370 XML and Data Management
18
XML segment mapping for large
trees
Proxy nodes refer to connected subtrees not stored in
the same record
Helper aggregate nodes group together a subset of
children of a node
CSC 5370 XML and Data Management
19
Index Structures
Natix uses two Index Structures:
 Full text index framework
(inverted
files): store lists of document
Index
 eXtended Access Support Relation
references
to indicate
 Map
search
terms in
to which
list identifier and
List
Manager
 Preserves the parent/child, ancestor/
documents
search
terms
appear
store
these
mappings
persistenly

Maps
the
list
identifiers
to
the
actual
descandant,
and
preceding/following
FragmentedList

lists
(managing
the
directory
offor
the

Provides
the
main
interface
relationships
between
nodes
Lists
are
divided
to
fragments
that
fit the
on a
ContextDescription
inverted file)
page to
+combined
linked
+
can
be
traversed
user
work together
with
inverted
files
The XASR
with
a
full
text

Establishes the actual representation
sequentially
index
a powerful
in provides
which data
is storedmethod
in a list
It manages
all the fragments
of one list and
to search
on contentens
of nodes
control insertions and deletions on this list
CSC 5370 XML and Data Management
20
Natix Physical Algebra
‘Let’, ‘for’, ‘where’ and ‘return’ in
XQuery are supported
‘Select’, ‘map’, ‘join’, ‘grouping’ and
‘sort’ operations are performed by
standard algebraic operators
borrowed from relational context
‘D-join’ and ‘unary and binary
grouping’ are borrowed from the
object oriented context
CSC 5370 XML and Data Management
21
Natix Physical Algebra
Scan operations: e. g. ExpressionScan
ExpressionScan: generates a tuple containing
the root of the document identified by its
name by evaluating a given expression
UnnestMap is used to generate variable bindings
for XPath expressions
e.g./a//b/c  UnnestMap$4=child($3,c)(
UnnestMap$3=desc($2,b)(
UnnestMap$2=child($1,a)([$1])))
‘BA-Map’, ‘FL-Map’, ’Groupify-GroupApply’ and
‘NGroupify-NGroupApply’ are use to construct
the XML result
CSC 5370 XML and Data Management
22
Example Plans (1):
This query retrieves the
title and the year for all
recent books
CSC 5370 XML and Data Management
23
Example Plans (2):
CSC 5370 XML and Data Management
24
To do...
 Support for functions inside XPath
expressions
 Cannot import DTDs as of now
 Support for different character encodings
 Support for XML namespaces
preparing for the launch of the first
full commercial end-user release of
Natix that may support all these
features
CSC 5370 XML and Data Management
25
Questions ?
CSC 5370 XML and Data Management
26
References
Natix: A Technology Overview:
http://pi3.informatik.unimannheim.de/publications.html#79
Efficient storage of XML data:
http://pi3.informatik.unimannheim.de/publications.html#79
Anatomy of a Natix XML base Management
System:
http://pi3.informatik.unimannheim.de/publications.html#79
Alebraic XML Construction and its Optimization
in Natix:
http://pi3.informatik.unimannheim.de/publications.html#79
Data ex machina:
www.dataexmachina.de/natix.html
CSC 5370 XML and Data Management
27
Thank You
CSC 5370 XML and Data Management
28