Introduction to the BinX Library eDIKT project team Ted Wen

Download Report

Transcript Introduction to the BinX Library eDIKT project team Ted Wen

Introduction to the BinX Library

eDIKT project team Ted Wen [email protected]

Robert Carroll [email protected]

Agenda

 About the BinX project  A brief introduction to the BinX language  Introduction to the BinX library  Advanced API to the BinX library  Use cases and requirements  Dr Bob Mann  Dr Chris Maynard  Discussion

About the BinX project

The problem

 XML is useful to represent metadata  Scientific datasets can be too large in XML  Most scientific data are in binary files  Binary data files are not all standardized  Binary data files are platform-dependent

BinX – a solution

 Initially designed for the Grid environment  Annotate data schema for any binary file  Data elements are marked up in XML  Describe three levels of features in a binary file  Underlying physical representation (byte order)  Primitive data types (integer, float)  Structure of the dataset (array, table)

The BinX project at eDIKT

 Implementing a software library for BinX  Develop a series of tools based on the library  Choose C++ for performance  Write portable code for different platforms  Robust and easy to use

Development status

 Requirement gathering from July 2002  Development started in October 2002  Prototype finished in December 2002  Alpha version complete in April 2003  Beta version to be released in June 2003

The deliverables

 The BinX library  Compiled code on different platforms  Source code with Open Source license  Documentation  User’s guide  Developer’s guide  Utilities and examples

The BinX Language

What is BinX?

 The Binary XML Description Language  A language for annotating binary data files  It describes data types, data structures and attributes such as byte order  A BinX document is an XML file with metadata of a binary data file

A BinX document

Root element Data class section Data instance section  < dataset byteOrder =“bigEndian”> Abstract data type  < definitions >        < file src =“myfile.bin”> 

Data elements

 Primitive data elements  Byte, character, integer, real  Complex data elements  Arrays, struct, union  User-defined data elements

Primitive data types

    Bit  Character    Integer    , ,  Real ,  

Complex data types

   Arrays    Repetitive collection of any data element Multidimensional Three types of arrays    Fixed length array Variable-length array Streamed array Struct  A sequence of data elements Union  One of a group of possible data elements conditional to the discriminant

Arrays

 Fixed-length array      Streamed array     Variable-length array     

Struct

Union

        

User-defined data type

Data elements as instances

        

Reference defined elements

       

The BinX Library

Alpha version

Fundamental requirements

   Access to data elements in binary files via BinX    Parse the BinX document Build in-memory data structures Read data values from the binary file Automatic conversion   Byte ordering Padding Producing BinX document and binary data   Generate BinX document for data structures Save assigned data values into binary files

General use cases

 Data conversion (byte order)  Data extraction (sub-dataset)  Data combination (two arrays to one)  Data presentation (browse, pure XML)

BinX Components

 The library has core functionality to support generic utilities and applications BinX Library Core Utilities Applications BinX core functionality Parse BinX document Read binary data Generic tools Data conversion Extraction Packing/Unpacking Applications Domain-specific

The BinX library core

 Input: SchemaBinX , binary data file  Output: DataBinX , In-memory dataset … … In-memory Data structure (Values loaded on demand) The BinX library 0101010101 100

The BinX Utilities

 DataBinX generator  DataBinX splitter  SchemaBinX creator  Binary file indexer

DataBinX generator

 Put binary data inside XML  For browsing, web service return, query result set … … 100 The BinX library 0101010101

DataBinX splitter

 The reverse of DataBinX generator  Generate binary file for testing, transportation  Cross-platform (byte order) … … 100 The BinX library 0101010101

SchemaBinX creator

 GUI and Web-based utilities  Build BinX document interactively  Create a BinX document based on another

Binary file indexer

 Generating indices for binary data files  Such indices can be used for fast data access … … X Y 0000 0004 The BinX library 0101010101

Applications for astronomy

 FITS and VOTable conversion SIMPLE = T … … END 01010101 BinX library Core DataBinX Utility

… …

FITS →DataBinX →VOTable

 FITS to VOTable conversion FITS DataBinx DataBinx Utility XSLT transformer Schema BinX XSLT VOTable Preprocessor

VOTable→DataBinX→FITS

 VOTable to FITS conversion VOTable XSLT transformer DataBinx Utility Schema BinX Binary Data DataBinx XSLT Post processor FITS Preprocessor FITS Header

FITS-VOTable experiment

   Sample FITS file  A data table of 82 rows X 20 fields  File size: 37KB Generated DataBinx by DataBinx utility   Time spent: 268 ms DataBinx document size: 1.2MB

VOTable transformed by MSXML   Time spent: about 1 second VOTable document size: 51KB

Possible future releases

 DataBinX parsing  Utilities (GUI BinX editor)  XPath-based data query  DFDL support  Preserving special tags  For comments, application-specific tags  Text file support

Features or issues to consider

      Converting floating point numbers  80-bit, 96-bit, 128-bit floating point Array manipulation (slice, section) SAX-based XML document parsing  Use cases in place of DOM parsing  Built in the library or as add-on component?

Database support  Annotating database tables?

 Query database tables through BinX?

Java version of the library  Keeping exactly the same features with the C++ version?

Supporting XQuery  Query binary data files with XQuery on BinX

Support

 For problems of usage:  http://www.edikt.org/binx (coming soon)  [email protected]

 For requirements and suggestions:  [email protected]

[email protected]