Transcript HDF Update

HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop X
November 29, 2006
HDF
Outline
• Organizational info
• HDF Software Update
• Other Activities of Interest
Nov. 29, 2006
HDF Workshop X, Landover MD
2
Organizational info
“The HDF Group” = “THG”
Founded Dec. 2006
Went solo July 15, 2006
Non-profit
Nov. 29, 2006
HDF Workshop X, Landover MD
4
THG mission
To support the vast community of HDF
users and to ensure the sustainable
development of HDF technologies and
the ongoing accessibility of HDF-stored
data.
The HDF Team
Frank Baker
Christian Chilan
Peter Cao
Vailin Choi
Mike Folk
Anne Jennings
Barbara Jones
Quincey Koziol
James Laird
Raymond Lu
John Mainzer
Matthew Needham
Pedro Nunes
Tammi O’Neill
Elena Pourmal
Binh-minh Ribler
Randy Ribler
Rishi Sinha
Kent Yang
And all those wonderful folks out there
who contribute ideas, requests, bug
reports, code, and support.
Nov. 29, 2006
HDF Workshop X, Landover MD
6
HDF Software Update
HDF4 update
Platforms to be dropped
• Operating systems
•
•
•
•
•
•
•
•
• Compilers
HPUX 11.00
Crays SV1 and TS IEEE
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.7, 2.8, 2.9
Windows 2000
MAC OSX 10.3
Nov. 29, 2006
HDF Workshop X, Landover MD
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
10
Platforms to be added
• Systems
•
•
•
•
•
•
•
• Compilers
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel
Cray XT3
Windows 64-bit (?)
Linux 2.6
HPUX 11.23
IBM Power 5
Nov. 29, 2006
HDF Workshop X, Landover MD
• g95
• PGI V. 6.1
• Intel 9.*
11
New features
• Configuration
• Switched to use F77_FUNC macro for better
Fortran support (no hard-coded compilers
anymore!)
• Support for shared libraries
• Library
• No hard-coded limit on number of opened files
• New APIs to control number of files opened by
application
• Fortran support for SZIP compression
Nov. 29, 2006
HDF Workshop X, Landover MD
12
Bugs fixes
• Tools
• A lot of improvements to the hdp, hrepack,
hdiff and hdfimport utilites based on users’
feedback
• Library
• Data corruption bug for several opened
unlimited dimension SDSs
• Better handling of SDSs with duplicated
names in SDgetdimscale and more
Nov. 29, 2006
HDF Workshop X, Landover MD
13
HDF5 update
No new releases!
• Focus on HDF5 release 1.8
• HDF5-1.8.0 Alpha 5 release is available from:
hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
Nov. 29, 2006
HDF Workshop X, Landover MD
15
Platforms to be dropped
• Operating systems
•
•
•
•
•
•
• Compilers
HPUX 11.00
MAC OS 10.3
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.8 and 2.9
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
• MPICH 1.2.5
http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
Nov. 29, 2006
HDF Workshop X, Landover MD
16
Platforms to be added
• Systems
• Compilers
•
•
•
•
•
Alpha Open VMS
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel (?)
Cray XT3
Windows 64-bit (32-bit
binaries)
• Linux 2.6
• BG/L
Nov. 29, 2006
HDF Workshop X, Landover MD
•
•
•
•
•
g95
PGI V. 6.1
Intel 9.*
MPICH 1.2.7
MPICH2
17
New Features
in HDF5 1.8
HDF5 1.8 new library features
• Datatype and dataspace features
•
•
•
•
•
•
•
•
Serialized dataspaces and datatypes
Ability to create data type from text description
Integer to float conversions during I/O
Revised exception handling during type
conversion
Compact storage for N-bit data types
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter
Nov. 29, 2006
HDF Workshop X, Landover MD
19
HDF5 1.8 – new library features
• Group revisions
•
•
•
•
Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation
Nov. 29, 2006
HDF Workshop X, Landover MD
20
HDF5 1.8 – new library features
• Link improvements
• External links -- can refer to objects in another file
• User defined links – apps create own kinds of
links
• Attribute improvments
• Storage improvements for large numbers of attr
• Iterate or look up by creation order
Nov. 29, 2006
HDF Workshop X, Landover MD
21
HDF5 1.8 – new library features
• Support for Unicode UTF-8 character set
• Shared header info – duplicate header info
shared, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Data transformation filter
• Stackable Virtual File Drivers
• Better UNIX/Linux portability
Nov. 29, 2006
HDF Workshop X, Landover MD
22
HDF5 1.8– new APIs
• New extendible error-handling API
• New APIs to copy objects between files fast
• Dimension scale model and API
• “HDFpacket” – API to read/write packets efficiently
Nov. 29, 2006
HDF Workshop X, Landover MD
23
HDF5 1.8 – backward and
forward compatibility
HDF5 1.8 vs. 1.6.5
• Differences between 1.8 vs. 1.6.5
• Some file format changes
• Several new routines added
• Old APIs deprecated -- removed in later release
• Consequences
• Application requiring 1.8 format changes will write
objects that 1.6.5 library cannot read
• To exploit 1.8 changes, apps need to be rewritten
Nov. 29, 2006
HDF Workshop X, Landover MD
25
Principle of
“Maximum file format compatibility”
Unless instructed otherwise, the HDF5 library will
write objects using the earliest version of the format
possible for describing the information.
Assures forward compatibility with the older
versions whenever possible – objects in new
files can be read with old libraries if those
objects are “known” to the old libraries.
Nov. 29, 2006
HDF Workshop X, Landover MD
26
Command line tools
New features for old tools
• h5dump
• Dump data in binary format
• h5diff
• Compare dataset regions
• Parallel h5diff (ph5diff)
• Compare two files in MPI parallel environment
• h5repack
• Efficient data copy using H5Gcopy()
• Able to handle big datasets
Nov. 29, 2006
HDF Workshop X, Landover MD
32
New HDF5 Tools
• h5copy
• Copies an group, dataset or named datatype from
one location to another location
• Copies within a file or across files
• h5check
• Verifies an HDF5 file against the defined HDF5
File Format Specification
• h5stat
• Reports statistics about a file and objects in a file
Nov. 29, 2006
HDF Workshop X, Landover MD
33
HDF Java Products
HDFView changes
• Quality improvements for HDF-java package
• Full documentation of hdf-java object package
• Test suite for hdf-java object package
• Support 64-bit Java on Linux and Solaris
• Many new features, including
•
•
•
•
•
Change font size easily
Grab and move image
Create new table (compound dataset) from template
Filter out fill value for image creation
-geometry option for very high resolution displays
Nov. 29, 2006
HDF Workshop X, Landover MD
35
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView 2.4 with bug fixes/new
features with HDF5 1.8 release
• New GUI features dealing with table, image
and animation
• Writing capability for HDF5-SRB model
Nov. 29, 2006
HDF Workshop X, Landover MD
36
Website Development for
HDF-EOS Tools &
Information Center
Website for HDF-EOS Tools
• THG now manages HDF-EOS web site
•
•
•
•
Registered domain names: hdfeos.net/.org/.com
Re-implemented major topic areas
Re-designed interface
Registered google search
• Will continue maintenance
• Phase two
• Host mailing list
• Support simple forum features
Nov. 29, 2006
HDF Workshop X, Landover MD
38
Website for HDF-EOS Tools
Nov. 29, 2006
HDF Workshop X, Landover MD
39
Other Activities of
Interest
Performance R&D
HDF5 - PnetCDF performance comparison
Flash I/O Benchmark (Checkpoint files)
PnetCDF
HDF5 collective
HDF5 independent
2500
MB/s
2000
1500
1000
uP: Power 5
500
0
10
110
210
310
Number of Processors
I/O performance of PnetCDF is comparable with
parallel HDF5 when the libraries are used in similar
manners.
Nov. 29, 2006
HDF Workshop X, Landover MD
42
PnetCDF4 - PnetCDF comparison
Bandwidth (MB/S)
PNetCDF collective
NetCDF4 collective
160
140
120
100
80
60
40
20
0
0
16
32
48
64
80
96
112
128
144
Number of processors
I/O performance of parallel NetCDF4 is comparable
with PnetCDF with about 15% slowness on average for
the output of ROMS history file.
Nov. 29, 2006
HDF Workshop X, Landover MD
43
Collective I/O improvements
• HDF5 supports collective IO for non-regular
selections
• Collective IO for chunked storage is not trivial.
• Non-regular selection performance optimizations:
• Added IO options to achieve good collective IO
performance
• Added APIs for applications to participate in the
optimization process
• See the poster
Nov. 29, 2006
HDF Workshop X, Landover MD
44
DOE Labs
Sandia
National
Laboratory
Lawrence
Livermore
National
Laboratory
DOE ASC* and Others
• Support HDF5 on major systems at Sandia &
Lawrence Livermore National Laboratories
• R&D efforts underway
•
•
•
•
File recovery after a crash
Very fast write speed – goal is 300 MB/sec
Read-while-writing capability
Java library and HDFView improvements
* Advanced Scientific Computing project
Nov. 29, 2006
HDF Workshop X, Landover MD
46
Flight test
Flight test – collect, then process
Nov. 29, 2006
HDF Workshop X, Landover MD
48
Boeing HDF5 for flight test data
• Boeing 787 active archive
• 10 TB per flight-test day
• Must handle raw, real-time data
• High speed ingest, by “packet”
• Post-processing, by “time-history”
• Boeing High Level API’s
• HDFpacket – released with HDF5 1.8
• HDFtime_history – new, open version likely
Nov. 29, 2006
HDF Workshop X, Landover MD
49
Product data
STEP
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat
Managing genomic data
C# HDF5 API
for Agilent
Agilent C# project
• Why?
• Heavy use of C# at Agilent
• Compatibility with Matlab
• Other interest in HDF5 at Agilent
• What?
• Prototype API in C# for Windows XP
• Basic functions to create, open, close, read, write
• Limited datatypes, no partial I/O
• When?
• March 2007
Nov. 29, 2006
HDF Workshop X, Landover MD
53
HDF5 Software
Tools & Applications
Fortran C++ Java C#
C API
HDF I/O Library
HDF File
Nov. 29, 2006
HDF Workshop X, Landover MD
54
NetCDF 4
NetCDF 4 project
• Enhanced NetCDF-4 Interface to HDF5
• Combine features of netCDF and HDF5
• Take advantage of their separate strengths
• Collaboration between NCSA, THG, Unidata
• Currently in Alpha Release
• Waiting for beta release
Nov. 29, 2006
HDF Workshop X, Landover MD
56
NetCDF-4 Architecture
netCDF-3
applications
netCDF
files
netCDF-4
HDF5 files
netCDF-4
applications
HDF5
applications
netCDF-3
Interface
netCDF-4
Library
HDF5
files
HDF5 Library
• Supports access to netCDF files and HDF5 files
created through netCDF-4 interface
Nov. 29, 2006
HDF Workshop X, Landover MD
57
Archival formats
• Proposal to NOAA Scientific Data
Stewardship program
• Will investigate use of OAIS “Archive
Information Package” standard with HDF5
• PI: Ruth Duerr (NSIDC) and Kent Yang
OAIS: Open Archival Information System
Nov. 29, 2006
HDF Workshop X, Landover MD
58
Asymmetries between
collecting and accessing data
• Huge streams of data
collected …
Nov. 29, 2006
• To be accessed in little
bits…
HDF Workshop X, Landover MD
60
Challenge – efficient remote access
• How do we efficiently find and access data
from distributed repositories, when the data
are big and complex?
• Storage Resource Broker (SRB)
• Efficient access to HDF5 objects in repository
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
Nov. 29, 2006
HDF Workshop X, Landover MD
61
Example – Storage resource broker
• Storage Resource Broker – repository for
heterogeneous data collections
• Simplifies storage, query and access to massive
amounts of scientific data
• Has data in HDF5, netCDF, other formats
Nov. 29, 2006
HDF Workshop X, Landover MD
62
Normal SRB configuration
client
HDF5
HDF5 File
(whole file or a
sequence of
bytes)
SRB Server
MCAT
Nov. 29, 2006
HDF Workshop X, Landover MD
63
OPeNDAP-HDF5 project
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
• Replaces direct file access with remote query and
access
• Widely used in Earth Sciences
Nov. 29, 2006
HDF Workshop X, Landover MD
64
OPeNDAP – HDF5 Project
• A NASA ROSES NRA project
• Tasks
•
•
•
•
HDF5-DAP2 server (now a prototype)
HDF5-DAP4 server
DAP4 to HDF5 conversion utility
Investigate integrated DAP-aware HDF5 library
Nov. 29, 2006
HDF Workshop X, Landover MD
65
SQL Server and HDF5
with Microsoft
SQL Server and HDF5
• Microsoft “dream environment for scientists”
• Combine data management, computing
• SQL Server 2005 solution
• Combine RDBMS with scientific analysis tools,
together in one integrated system.
• HDF5 & other formats manage scientific objects
Nov. 29, 2006
HDF Workshop X, Landover MD
67
HDF5 in SQL server
Visualization
Libraries
Web Services
(MATLAB,…)
(XML, REST, RSS)
OLAP and
Data Mining
Reporting
.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)
HDF5 EDM model
SQL Server
HDF5
HDF5
TVFs
Index
HDF
5 type
Nov. 29, 2006
HDF5
files
HDF5
FS blob
HDF Workshop X, Landover MD
68
Thank you all
and
Thank you NASA!
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.
Questions/comments?
Information Sources
• HDF website
http://hdfgroup.org/
• HDF5 Information Center
http://hdfgroup.org/HDF5/
• HDF Helpdesk
[email protected]
• HDF users mailing list
[email protected]
coming soon: [email protected]
Nov. 29, 2006
HDF Workshop X, Landover MD
72