Transcript HDF Update
HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop X November 29, 2006 HDF Outline • Organizational info • HDF Software Update • Other Activities of Interest Nov. 29, 2006 HDF Workshop X, Landover MD 2 Organizational info “The HDF Group” = “THG” Founded Dec. 2006 Went solo July 15, 2006 Non-profit Nov. 29, 2006 HDF Workshop X, Landover MD 4 THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data. The HDF Team Frank Baker Christian Chilan Peter Cao Vailin Choi Mike Folk Anne Jennings Barbara Jones Quincey Koziol James Laird Raymond Lu John Mainzer Matthew Needham Pedro Nunes Tammi O’Neill Elena Pourmal Binh-minh Ribler Randy Ribler Rishi Sinha Kent Yang And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support. Nov. 29, 2006 HDF Workshop X, Landover MD 6 HDF Software Update HDF4 update Platforms to be dropped • Operating systems • • • • • • • • • Compilers HPUX 11.00 Crays SV1 and TS IEEE AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.7, 2.8, 2.9 Windows 2000 MAC OSX 10.3 Nov. 29, 2006 HDF Workshop X, Landover MD • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0 10 Platforms to be added • Systems • • • • • • • • Compilers MAC OSX 10.4 (Intel) Solaris 2.* on Intel Cray XT3 Windows 64-bit (?) Linux 2.6 HPUX 11.23 IBM Power 5 Nov. 29, 2006 HDF Workshop X, Landover MD • g95 • PGI V. 6.1 • Intel 9.* 11 New features • Configuration • Switched to use F77_FUNC macro for better Fortran support (no hard-coded compilers anymore!) • Support for shared libraries • Library • No hard-coded limit on number of opened files • New APIs to control number of files opened by application • Fortran support for SZIP compression Nov. 29, 2006 HDF Workshop X, Landover MD 12 Bugs fixes • Tools • A lot of improvements to the hdp, hrepack, hdiff and hdfimport utilites based on users’ feedback • Library • Data corruption bug for several opened unlimited dimension SDSs • Better handling of SDSs with duplicated names in SDgetdimscale and more Nov. 29, 2006 HDF Workshop X, Landover MD 13 HDF5 update No new releases! • Focus on HDF5 release 1.8 • HDF5-1.8.0 Alpha 5 release is available from: hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html Nov. 29, 2006 HDF Workshop X, Landover MD 15 Platforms to be dropped • Operating systems • • • • • • • Compilers HPUX 11.00 MAC OS 10.3 AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.8 and 2.9 • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0 • MPICH 1.2.5 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html Nov. 29, 2006 HDF Workshop X, Landover MD 16 Platforms to be added • Systems • Compilers • • • • • Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel (?) Cray XT3 Windows 64-bit (32-bit binaries) • Linux 2.6 • BG/L Nov. 29, 2006 HDF Workshop X, Landover MD • • • • • g95 PGI V. 6.1 Intel 9.* MPICH 1.2.7 MPICH2 17 New Features in HDF5 1.8 HDF5 1.8 new library features • Datatype and dataspace features • • • • • • • • Serialized dataspaces and datatypes Ability to create data type from text description Integer to float conversions during I/O Revised exception handling during type conversion Compact storage for N-bit data types Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter Nov. 29, 2006 HDF Workshop X, Landover MD 19 HDF5 1.8 – new library features • Group revisions • • • • Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation Nov. 29, 2006 HDF Workshop X, Landover MD 20 HDF5 1.8 – new library features • Link improvements • External links -- can refer to objects in another file • User defined links – apps create own kinds of links • Attribute improvments • Storage improvements for large numbers of attr • Iterate or look up by creation order Nov. 29, 2006 HDF Workshop X, Landover MD 21 HDF5 1.8 – new library features • Support for Unicode UTF-8 character set • Shared header info – duplicate header info shared, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Data transformation filter • Stackable Virtual File Drivers • Better UNIX/Linux portability Nov. 29, 2006 HDF Workshop X, Landover MD 22 HDF5 1.8– new APIs • New extendible error-handling API • New APIs to copy objects between files fast • Dimension scale model and API • “HDFpacket” – API to read/write packets efficiently Nov. 29, 2006 HDF Workshop X, Landover MD 23 HDF5 1.8 – backward and forward compatibility HDF5 1.8 vs. 1.6.5 • Differences between 1.8 vs. 1.6.5 • Some file format changes • Several new routines added • Old APIs deprecated -- removed in later release • Consequences • Application requiring 1.8 format changes will write objects that 1.6.5 library cannot read • To exploit 1.8 changes, apps need to be rewritten Nov. 29, 2006 HDF Workshop X, Landover MD 25 Principle of “Maximum file format compatibility” Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. Assures forward compatibility with the older versions whenever possible – objects in new files can be read with old libraries if those objects are “known” to the old libraries. Nov. 29, 2006 HDF Workshop X, Landover MD 26 Command line tools New features for old tools • h5dump • Dump data in binary format • h5diff • Compare dataset regions • Parallel h5diff (ph5diff) • Compare two files in MPI parallel environment • h5repack • Efficient data copy using H5Gcopy() • Able to handle big datasets Nov. 29, 2006 HDF Workshop X, Landover MD 32 New HDF5 Tools • h5copy • Copies an group, dataset or named datatype from one location to another location • Copies within a file or across files • h5check • Verifies an HDF5 file against the defined HDF5 File Format Specification • h5stat • Reports statistics about a file and objects in a file Nov. 29, 2006 HDF Workshop X, Landover MD 33 HDF Java Products HDFView changes • Quality improvements for HDF-java package • Full documentation of hdf-java object package • Test suite for hdf-java object package • Support 64-bit Java on Linux and Solaris • Many new features, including • • • • • Change font size easily Grab and move image Create new table (compound dataset) from template Filter out fill value for image creation -geometry option for very high resolution displays Nov. 29, 2006 HDF Workshop X, Landover MD 35 Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView 2.4 with bug fixes/new features with HDF5 1.8 release • New GUI features dealing with table, image and animation • Writing capability for HDF5-SRB model Nov. 29, 2006 HDF Workshop X, Landover MD 36 Website Development for HDF-EOS Tools & Information Center Website for HDF-EOS Tools • THG now manages HDF-EOS web site • • • • Registered domain names: hdfeos.net/.org/.com Re-implemented major topic areas Re-designed interface Registered google search • Will continue maintenance • Phase two • Host mailing list • Support simple forum features Nov. 29, 2006 HDF Workshop X, Landover MD 38 Website for HDF-EOS Tools Nov. 29, 2006 HDF Workshop X, Landover MD 39 Other Activities of Interest Performance R&D HDF5 - PnetCDF performance comparison Flash I/O Benchmark (Checkpoint files) PnetCDF HDF5 collective HDF5 independent 2500 MB/s 2000 1500 1000 uP: Power 5 500 0 10 110 210 310 Number of Processors I/O performance of PnetCDF is comparable with parallel HDF5 when the libraries are used in similar manners. Nov. 29, 2006 HDF Workshop X, Landover MD 42 PnetCDF4 - PnetCDF comparison Bandwidth (MB/S) PNetCDF collective NetCDF4 collective 160 140 120 100 80 60 40 20 0 0 16 32 48 64 80 96 112 128 144 Number of processors I/O performance of parallel NetCDF4 is comparable with PnetCDF with about 15% slowness on average for the output of ROMS history file. Nov. 29, 2006 HDF Workshop X, Landover MD 43 Collective I/O improvements • HDF5 supports collective IO for non-regular selections • Collective IO for chunked storage is not trivial. • Non-regular selection performance optimizations: • Added IO options to achieve good collective IO performance • Added APIs for applications to participate in the optimization process • See the poster Nov. 29, 2006 HDF Workshop X, Landover MD 44 DOE Labs Sandia National Laboratory Lawrence Livermore National Laboratory DOE ASC* and Others • Support HDF5 on major systems at Sandia & Lawrence Livermore National Laboratories • R&D efforts underway • • • • File recovery after a crash Very fast write speed – goal is 300 MB/sec Read-while-writing capability Java library and HDFView improvements * Advanced Scientific Computing project Nov. 29, 2006 HDF Workshop X, Landover MD 46 Flight test Flight test – collect, then process Nov. 29, 2006 HDF Workshop X, Landover MD 48 Boeing HDF5 for flight test data • Boeing 787 active archive • 10 TB per flight-test day • Must handle raw, real-time data • High speed ingest, by “packet” • Post-processing, by “time-history” • Boeing High Level API’s • HDFpacket – released with HDF5 1.8 • HDFtime_history – new, open version likely Nov. 29, 2006 HDF Workshop X, Landover MD 49 Product data STEP Bioinformatics caacaagccaaaactcgtacaa Cgagatatctcttggaaaaact gctcacaatattgacgtacaag gttgttcatgaaactttcggta Acaatcgttgacattgcgacct aatacagcccagcaagcagaat Managing genomic data C# HDF5 API for Agilent Agilent C# project • Why? • Heavy use of C# at Agilent • Compatibility with Matlab • Other interest in HDF5 at Agilent • What? • Prototype API in C# for Windows XP • Basic functions to create, open, close, read, write • Limited datatypes, no partial I/O • When? • March 2007 Nov. 29, 2006 HDF Workshop X, Landover MD 53 HDF5 Software Tools & Applications Fortran C++ Java C# C API HDF I/O Library HDF File Nov. 29, 2006 HDF Workshop X, Landover MD 54 NetCDF 4 NetCDF 4 project • Enhanced NetCDF-4 Interface to HDF5 • Combine features of netCDF and HDF5 • Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in Alpha Release • Waiting for beta release Nov. 29, 2006 HDF Workshop X, Landover MD 56 NetCDF-4 Architecture netCDF-3 applications netCDF files netCDF-4 HDF5 files netCDF-4 applications HDF5 applications netCDF-3 Interface netCDF-4 Library HDF5 files HDF5 Library • Supports access to netCDF files and HDF5 files created through netCDF-4 interface Nov. 29, 2006 HDF Workshop X, Landover MD 57 Archival formats • Proposal to NOAA Scientific Data Stewardship program • Will investigate use of OAIS “Archive Information Package” standard with HDF5 • PI: Ruth Duerr (NSIDC) and Kent Yang OAIS: Open Archival Information System Nov. 29, 2006 HDF Workshop X, Landover MD 58 Asymmetries between collecting and accessing data • Huge streams of data collected … Nov. 29, 2006 • To be accessed in little bits… HDF Workshop X, Landover MD 60 Challenge – efficient remote access • How do we efficiently find and access data from distributed repositories, when the data are big and complex? • Storage Resource Broker (SRB) • Efficient access to HDF5 objects in repository • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data Nov. 29, 2006 HDF Workshop X, Landover MD 61 Example – Storage resource broker • Storage Resource Broker – repository for heterogeneous data collections • Simplifies storage, query and access to massive amounts of scientific data • Has data in HDF5, netCDF, other formats Nov. 29, 2006 HDF Workshop X, Landover MD 62 Normal SRB configuration client HDF5 HDF5 File (whole file or a sequence of bytes) SRB Server MCAT Nov. 29, 2006 HDF Workshop X, Landover MD 63 OPeNDAP-HDF5 project • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data • Replaces direct file access with remote query and access • Widely used in Earth Sciences Nov. 29, 2006 HDF Workshop X, Landover MD 64 OPeNDAP – HDF5 Project • A NASA ROSES NRA project • Tasks • • • • HDF5-DAP2 server (now a prototype) HDF5-DAP4 server DAP4 to HDF5 conversion utility Investigate integrated DAP-aware HDF5 library Nov. 29, 2006 HDF Workshop X, Landover MD 65 SQL Server and HDF5 with Microsoft SQL Server and HDF5 • Microsoft “dream environment for scientists” • Combine data management, computing • SQL Server 2005 solution • Combine RDBMS with scientific analysis tools, together in one integrated system. • HDF5 & other formats manage scientific objects Nov. 29, 2006 HDF Workshop X, Landover MD 67 HDF5 in SQL server Visualization Libraries Web Services (MATLAB,…) (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF 5 type Nov. 29, 2006 HDF5 files HDF5 FS blob HDF Workshop X, Landover MD 68 Thank you all and Thank you NASA! Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. Questions/comments? Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk [email protected] • HDF users mailing list [email protected] coming soon: [email protected] Nov. 29, 2006 HDF Workshop X, Landover MD 72