BinX – A Tool for Binary File Access eDIKT project team Ted Wen

Download Report

Transcript BinX – A Tool for Binary File Access eDIKT project team Ted Wen

e-Science Data Information and Knowledge Transformation
BinX – A Tool for Binary File
Access
eDIKT project team
Ted Wen [email protected]
Robert Carroll [email protected]
What is BinX?
 Binary in XML
– Annotation language
 Using XML
 Descriptive
 Low-level
– Software components
 BinX library
 Generic utilities
 API
www.edikt.org
How and Why BinX is used
Special
Application
Program
01010101010
0101010101
0101010101
10101010100
01000010111
01010101010
10101010110
<dataset>
……
</dataset>
Application
Program
Application
Program
BinX
Library
Application
Program
www.edikt.org
e-Science Data Information and Knowledge Transformation
The BinX Language
Annotating a binary data stream
Mark up data types
Mark up sequences
Mark up arrays
Complex structures
Primitive Data Types
 Mark up data types
FF 7F
7F FF FF FF
1
2
00 00 C8 42
42 C8 00 00
3
4
1. <short-16 byteOrder=“littleEndian”> 32767</short-16>
2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>
3. <float-32 byteOrder=“littleEndian”>100.0</float-32>
4. <float-32 byteOrder=“bigEndian”>100.0</float-32>
www.edikt.org
Abstract “struct” types
 Mark up a sequence
Screen descriptor in GIF:
Screen width: unsigned short;
Screen height: unsigned short;
Packed field: a byte
Background colour index: byte
Pixel aspect ratio: byte
<struct>
<unsignedShort-16 />
<unsignedShort-16 />
<byte-8 />
<byte-8 />
<byte-8 />
</struct>
www.edikt.org
Abstract “array” types
 Mark up an array
A 2-dimensional array
containing 10-by-100,
32-bit integers
<arrayFixed>
<integer-32 />
<dim indexTo=“99”>
<dim indexTo=“9” />
</dim>
</ arrayFixed >
www.edikt.org
Embedded abstract types
 Complex structures
<struct>
<short-16 />
<arrayFixed>
<byte-8 />
<dim indexTo=“7” />
</arrayFixed>
<struct>
<integer-32 />
<float-32 />
<double-64 />
</struct>
</struct>
www.edikt.org
User-defined metadata
 Label the data types and structures
<struct varName=“Data Sample”>
<short-16 varName=“ID” />
<arrayFixed varName=“List of 10 complex numbers”>
<struct varName=“Complex”>
<float-32 varName=“Real” />
<float-32 varName=“Imaginary” />
</struct>
<dim indexTo=“9” />
</arrayFixed>
</struct>
www.edikt.org
Reusable type definitions
 Define macros for reuse
<definitions>
<defineType typeName=“FourCC”>
<arrayFixed>
<character-8 />
<dim count=“4” />
</arrayFixed>
</defineType>
</definitions>
<struct varName=“Wave_Header”>
<useType typeName=“FourCC” varName=“Keyword” />
<integer-32 varName=“Chunk_Size” />
</struct>
www.edikt.org
Linking to binary data
 Reference the binary data file
<definitions>
<defineType typeName=“Header”>… …</defineType>
<defineType typeName=“Format_Chunk”>… …</defineType>
<defineType typeName=“Data_Chunk”>… …</defineType>
</definitions>
<dataset src=“myfile.wav”>
<useType typeName="Header" />
<useType typeName="Format_Chunk" />
<useType typeName="Data_Chunk" />
</dataset>
www.edikt.org
A BinX document
 <binx byteOrder=“bigEndian”>
Root element
– <definitions>
Data class section
Abstract data type
 <defineType typeName=“myTyp”>
– <arrayFixed>
• <character-8/>
• <dim indexTo=“9”/>
– </arrayFixed>
 </defineType>
Data instance section
– </definitions>
– <dataset src=“myfile.bin”>
 <useType typeName=“myTyp”/>
 <integer-32 varName=“X” />
– </dataset>
 </binx>
www.edikt.org
DataBinX
DataBinX = BinX with Data
<dataset src=“myfile.bin”>
<struct>
<short-16 />
<long-64 />
<double-64 />
</struct>
<arrayFixed>
<integer-32 />
<dim count=“2” />
</arrayFixed>
</dataset>
<dataset>
<struct>
<short-16>100</short-16>
<long-64>1000</long-64>
<double-64>5.257</double-64>
</struct>
<arrayFixed>
<dim>
<integer-32>1</integer-32>
</dim>
<dim>
<integer-32>2</integer-32>
</dim>
</arrayFixed>
</dataset>
www.edikt.org
e-Science Data Information and Knowledge Transformation
The BinX Library
Core library
Utilities
Applications
BinX Components
 The library has core functionality to
support generic utilities and applications
BinX core functionality
Parse/Gen BinX doc
Read/write binary data
Parse/Gen DataBinX
BinX Library
Core
Utilities
Applications
Generic tools
DataBinx pack/unpack
Extractor, Viewer
BinX editor
Applications
Domain-specific
www.edikt.org
BinX application models
 Data catalogue model
 Data manipulation model
 Data query model
 Data service model
 Data transportation model
www.edikt.org
Data catalogue model
Primary storage
Binary data files
Metadata
Syntactic annotation
Semantic annotation
Classification
Domain specific
Cross-reference
XLink
BinX
1
Abstract
METADATA
BinX
1.2
BinX
1.1
BinX
1.2.1
BinX
1.2.2
BinX
1.2.3
0101
0101
01
0101
0101
01
0101
0101
01
0101
0101
01
Detailed
BINARY
www.edikt.org
Data manipulation model
 Extraction
– Subset of a dataset
 Combination
– Merge several datasets
 Transformation
– Conversion of data types
– Change of sequence order
– Transposition of array dimensions
 Transparency
– Automatic change of byte order
www.edikt.org
Data query model
 In-dataset query
BinX
BinX
0101010
10 data
XLink
source
– XPath against virtual XML
BinX
source
BinX
 Cross-dataset query
source
Utility
 Defining result format
– SAX events
0101010
10 data
BinX
library
– Link into multiple datasets
 Output interface
source
0101010
10 data
XPath
– XQuery-based return
fragment
0101010
10 data
Transform
XQuery
SAX
Events
DataBinX
SAX
Events
VOTable
SAX
Events
APP
Custom
APP
DataBinx
APP
VOTable
www.edikt.org
Data service model
 Publishing logical datasets in BinX
DB
0101
0101
BinX 01
Dataset from
multiple data
sources
0101
0101
0101
01
0101
BinX 010101
0101
01
0101
0101 BinX
01
Grid
Dataset from
several binary files
Dataset from one
binary file
Client
www.edikt.org
Data transportation model
DataBinX as interlingua
BinX
Util
XSLT
XML
document
BinX
Schema
+
BinX
Binary
DataBinX
XSLT
ZIP
tool
BinX
Util
ZIP
(MIME)
Send
Receive
ZIP
tool
www.edikt.org
e-Science Data Information and Knowledge Transformation
Application in Astronomy
Case Study
Data Conversion
Between FITS and VOTable
Application in astronomy
 FITS and VOTable conversion
SIMPLE =
……
END
T
BinX library
Core
<?xml version=.
<VOTABLE>
……
</VOTABLE>
01010101
DataBinX Utility
www.edikt.org
FITS file
79
0
Primary
HDU
SIMPLE =
T / file does conform to FITS standard
BITPIX
=
8 / number of bits per data pixel
NAXIS
=
1 / number of data axes
Header
……
END
3D 4A 14 0F 1C FE 25 04 … …
Data
XTENSION= ‘BINTABLE’
/ binary table extension
BITPIX
=
8 / 8-bit bytes
NAXIS
=
2 / 2-dimensional binary table
Header
……
Extension
END
7B 3E 40 2C 16 70 E7 6F … …
Data
www.edikt.org
VOTable
<VOTABLE>
<RESOURCE>
<PARAM name=“Obs” value=“Bob”/>
<TABLE name=“Stars”>
<FIELD name=“Star-name” datatype=“char” arraysize=“10” />
<FIELD name=“RA” datatype=“float” />
<FIELD name=“Dec” datatype=“float” />
<FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” />
<DATA>
<TABLEDATA>
<TR>
<TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD>
<TD>4 5 3 4 3 2 1 2 3 3 5 6</TD>
</TR>
</TABLEDATA>
</DATA>
</TABLE>
</RESOURCE>
</VOTABLE>
www.edikt.org
FITS →DataBinX →VOTable
 FITS to VOTable conversion
FITS
DataBinX
Utility
DataBinX
XSLT
transformer
Schema
BinX
XSLT
VOTable
Preprocessor
www.edikt.org
VOTable→DataBinX→FITS
 VOTable to FITS conversion
Schema
BinX
VOTable
DataBinX
DataBinX
Utility
XSLT
transformer
XSLT
FITS
Header
Post
processor
Binary
Data
FITS
www.edikt.org
e-Science Data Information and Knowledge Transformation
BinX Software
Software library in C++
Documentation
Utilities and Samples
Future releases






XPath-based data query
DFDL support
Output through SAX events
Output as XQuery return
Database interfacing
Java wrapper for utilities
www.edikt.org
Support
 Information and software download:
– http://www.edikt.org/binx
 Questions:
– [email protected]
 Requirements and suggestions:
– [email protected][email protected]
www.edikt.org