BinX – A Tool for Binary File Access

Download Report

Transcript BinX – A Tool for Binary File Access

e-Science Data Information and Knowledge Transformation
BinX – A Tool for Binary File
Access
eDIKT project team
Ted Wen [email protected]
Robert Carroll [email protected]
Agenda






About the BinX project
Introduction to the BinX language
Introduction to the BinX library
Example application
Overview of the BinX API
Discussion
www.edikt.org
The problem
 Most scientific data are in binary files
 Binary data files are not all standardized
 Binary data files are platform-dependent
 XML is useful to represent metadata
 Scientific datasets can be too large in
XML
www.edikt.org
What is BinX?
 Binary in XML
– Annotation language
 Using XML
 Descriptive
 Low-level
– Software components
 BinX library
 Generic utilities
 API
www.edikt.org
How and Why BinX is used
Special
Application
Program
01010101010
0101010101
0101010101
10101010100
01000010111
01010101010
10101010110
<dataset>
……
</dataset>
Application
Program
Application
Program
BinX
Library
Application
Program
www.edikt.org
e-Science Data Information and Knowledge Transformation
The BinX Language
Annotating a binary data stream
Mark up data types
Mark up sequences
Mark up arrays
Complex structures
Data elements
 Primitive data elements
– Byte, character, integer, real
 Complex data elements
– Arrays, struct, union
 User-defined data elements
www.edikt.org
Primitive Data Types
 Character
– <character-8>
– <string>
(Fixed length, variable length and delimited)
 Integer
–
–
–
–
<byte-8>
<short-16>, <unsignedShort-16>
<integer-32>, <unsignedInteger-32>
<long-64>, <unsignedLong-64>
 Real
– <float-32>
– <double-64>
– <quadruple-128>
www.edikt.org
Primitive Data Types
 Mark up data types
FF 7F
7F FF FF FF
1
2
00 00 C8 42
42 C8 00 00
3
4
1. <short-16 byteOrder=“littleEndian”> 32767</short-16>
2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>
3. <float-32 byteOrder=“littleEndian”>100.0</float-32>
4. <float-32 byteOrder=“bigEndian”>100.0</float-32>
www.edikt.org
Abstract “struct” types
 Mark up a sequence
Screen descriptor in GIF:
Screen width: unsigned short;
Screen height: unsigned short;
Packed field: a byte
Background colour index: byte
Pixel aspect ratio: byte
<struct>
<unsignedShort-16 />
<unsignedShort-16 />
<byte-8 />
<byte-8 />
<byte-8 />
</struct>
www.edikt.org
Abstract “array” types
 Mark up an array
A 2-dimensional array
containing 10-by-100,
32-bit integers
<arrayFixed>
<integer-32 />
<dim indexTo=“99”>
<dim indexTo=“9” />
</dim>
</ arrayFixed >
www.edikt.org
Embedded abstract types
 Complex structures
<struct>
<short-16 />
<arrayFixed>
<byte-8 />
<dim indexTo=“7” />
</arrayFixed>
<struct>
<integer-32 />
<float-32 />
<double-64 />
</struct>
</struct>
www.edikt.org
User-defined metadata
 Label the data types and structures
<struct varName=“Data Sample”>
<short-16 varName=“ID” />
<arrayFixed varName=“List of 10 complex numbers”>
<struct varName=“Complex”>
<float-32 varName=“Real” />
<float-32 varName=“Imaginary” />
</struct>
<dim indexTo=“9” />
</arrayFixed>
</struct>
www.edikt.org
Reusable type definitions
 Define macros for reuse
<definitions>
<defineType typeName=“FourCC”>
<arrayFixed>
<character-8 />
<dim count=“4” />
</arrayFixed>
</defineType>
</definitions>
<struct varName=“Wave_Header”>
<useType typeName=“FourCC” varName=“Keyword” />
<integer-32 varName=“Chunk_Size” />
</struct>
www.edikt.org
Linking to binary data
 Reference the binary data file
<definitions>
<defineType typeName=“Header”>… …</defineType>
<defineType typeName=“Format_Chunk”>… …</defineType>
<defineType typeName=“Data_Chunk”>… …</defineType>
</definitions>
<dataset src=“myfile.wav”>
<useType typeName="Header" />
<useType typeName="Format_Chunk" />
<useType typeName="Data_Chunk" />
</dataset>
www.edikt.org
The BinX document
<?xml version=“1.0”?>
<binx xmlns=“http://www.edikt.org/binx”>
<dataset src=“binary.bin”
byteOrder=“littleEndian”>
<short-16/>
<integer-32/>
<double-64/>
</dataset>
</binx>
www.edikt.org
A BinX document
 <binx byteOrder=“bigEndian”>
Root element
– <definitions>
Data class section
Abstract data type
 <defineType typeName=“myTyp”>
– <arrayFixed>
• <character-8/>
• <dim indexTo=“9”/>
– </arrayFixed>
 </defineType>
Data instance section
– </definitions>
– <dataset src=“myfile.bin”>
 <useType typeName=“myTyp”/>
 <integer-32 varName=“X” />
– </dataset>
 </binx>
www.edikt.org
DataBinX
DataBinX = BinX with Data
<dataset src=“myfile.bin”>
<struct>
<short-16 />
<long-64 />
<double-64 />
</struct>
<arrayFixed>
<integer-32 />
<dim count=“2” />
</arrayFixed>
</dataset>
<dataset>
<struct>
<short-16>100</short-16>
<long-64>1000</long-64>
<double-64>5.257</double-64>
</struct>
<arrayFixed>
<dim>
<integer-32>1</integer-32>
</dim>
<dim>
<integer-32>2</integer-32>
</dim>
</arrayFixed>
</dataset>
www.edikt.org
e-Science Data Information and Knowledge Transformation
The BinX Library
Core library
Utilities
Applications
Output from the library
 DataBinX
combined data and BinX document
 SchemaBinX
 Binary data stream
DataBinX = SchemaBinX + Binary data
www.edikt.org
BinX Components
 The library has core functionality to
support generic utilities and applications
BinX core functionality
Parse/Gen BinX doc
Read/write binary data
Parse/Gen DataBinX
BinX Library
Core
Utilities
Applications
Generic tools
DataBinx pack/unpack
Extractor
Applications
Domain-specific
www.edikt.org
BinX application models
 Data manipulation model
 Data transportation model
 Data service model
 Data query model
 Data catalogue model
www.edikt.org
Data manipulation model
 Extraction
– Subset of a dataset
 Combination
– Merge several datasets
 Transformation
– Conversion of data types
– Change of sequence order
– Transposition of array dimensions
 Transparency
– Automatic change of byte order
www.edikt.org
Data transportation model
DataBinX as interlingua
BinX
Util
XSLT
XML
document
BinX
Schema
+
BinX
Binary
DataBinX
XSLT
ZIP
tool
BinX
Util
ZIP
(MIME)
Send
Receive
ZIP
tool
www.edikt.org
Data service model
 Publishing logical datasets in BinX
DB
0101
0101
BinX 01
Dataset from
multiple data
sources
0101
0101
0101
01
0101
BinX 010101
0101
01
0101
0101 BinX
01
Grid
Dataset from
several binary files
Dataset from one
binary file
Client
www.edikt.org
Data query model
 Create DataBinX
– From Binary and BinX
DataBinX
BinX
+
Binary
010101010
 Query DataBinX
– Use XPath
 Create New DataBinX
XPath
– Results from query
 Parse DataBinX
New
DataBinX
BinX
+
Binary
010101010
– Create new Binary and
BinX
www.edikt.org
Data catalogue model
Primary storage
Binary data files
Metadata
Syntactic annotation
Semantic annotation
Classification
Domain specific
Cross-reference
XLink
BinX
1
Abstract
METADATA
BinX
1.2
BinX
1.1
BinX
1.2.1
BinX
1.2.2
BinX
1.2.3
0101
0101
01
0101
0101
01
0101
0101
01
0101
0101
01
Detailed
BINARY
www.edikt.org
e-Science Data Information and Knowledge Transformation
Application in Astronomy
Case Study
Data Conversion
Between FITS and VOTable
Application in astronomy
 FITS and VOTable conversion
SIMPLE =
……
END
T
BinX library
Core
<?xml version=.
<VOTABLE>
……
</VOTABLE>
01010101
DataBinX Utility
www.edikt.org
FITS file
79
0
Primary
HDU
SIMPLE =
T / file does conform to FITS standard
BITPIX
=
8 / number of bits per data pixel
NAXIS
=
1 / number of data axes
Header
……
END
3D 4A 14 0F 1C FE 25 04 … …
Data
XTENSION= ‘BINTABLE’
/ binary table extension
BITPIX
=
8 / 8-bit bytes
NAXIS
=
2 / 2-dimensional binary table
Header
……
Extension
END
7B 3E 40 2C 16 70 E7 6F … …
Data
www.edikt.org
VOTable
<VOTABLE>
<RESOURCE>
<PARAM name=“Obs” value=“Bob”/>
<TABLE name=“Stars”>
<FIELD name=“Star-name” datatype=“char” arraysize=“10” />
<FIELD name=“RA” datatype=“float” />
<FIELD name=“Dec” datatype=“float” />
<FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” />
<DATA>
<TABLEDATA>
<TR>
<TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD>
<TD>4 5 3 4 3 2 1 2 3 3 5 6</TD>
</TR>
</TABLEDATA>
</DATA>
</TABLE>
</RESOURCE>
</VOTABLE>
www.edikt.org
FITS →DataBinX →VOTable
 FITS to VOTable conversion
FITS
DataBinX
Utility
DataBinX
XSLT
transformer
Schema
BinX
XSLT
VOTable
Preprocessor
www.edikt.org
VOTable→DataBinX→FITS
 VOTable to FITS conversion
Schema
BinX
VOTable
DataBinX
DataBinX
Utility
XSLT
transformer
XSLT
FITS
Header
Post
processor
Binary
Data
FITS
www.edikt.org
Support
 Information and software download:
– http://www.edikt.org/binx
 Questions:
– [email protected]
 Requirements and suggestions:
– [email protected][email protected]
www.edikt.org
e-Science Data Information and Knowledge Transformation
BinX API
Parsing a BinX document
BxBinxFile* pReader = new BxBinxFile();
If (pReader->parse(“mybinx.xml”))
{
BxDataset* pDataset =
pReader->getDataset();
}
www.edikt.org
Reading a BinX document
BxArrayFixed* pArray = pDataset->getArray(0);
BxArrayFixed* pArray = pDataset>getArray(“fixed”);
 Get an array object
BxDataset* pStruct = pArray->get(0, 0);
 Get a struct from the array
www.edikt.org
Reading a BinX document
BxFloat32* pReal = pStruct>getFloat(“Real”);
Float real = pReal->getFloat();
 Get the data value
www.edikt.org
Creating BinX document
BxBinxFileWriter* pWriter = new
BxBinxFileWriter();
 Create a object to write out the document
BxDataset* pData = new BxDataset();
 Create a new dataset (in memory BinX
document)
BxShort16* i16 = new BxShort16(100);
pData->addDataObject(i16);
www.edikt.org
Creating BinX document
BxBinaryFile* pbf = new BxBinaryFile();
 Create a new binary file
pbf->setDatasetPointer(pData);
 Create a link to the BinX document
pWriter->setBinaryFilePtr(pbf);
pWriter->save("TestDataset.xml");
 Save the BinX document
www.edikt.org
Merge binary data
BxBinxFileReader * pFile1 = new
BxBinxFileReader(“file1.xml”);
BxBinxFileReader * pFile2 = new
BxBinxFileReader(“file2.xml”);
BxDataset * pDataset1 = pFile1->getDataset();
BxDataset * pDataset2 = pFile2->getDataset();
BxArray * pArray1 = pDataset1->getArray(0);
BxArray * pArray2 = pDataset2->getArray(0);
BxDataObject * pData1 = pArray1->getNext();
BxDataObject * pData2 = pArray2->getNext();
FILE * fo = fopen(“output.dat”,”wb”);
pData1->toStreamBinary(fo);
pData2->toStreamBinary(fo);
www.edikt.org
Summary
 One BinX document can describe
many binary files
 Generate BinX document from code
 Easy to use interfaces
 Flexible
www.edikt.org