Central Topic - HDF-EOS

Download Report

Transcript Central Topic - HDF-EOS

HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop XII
Aurora, Colorado
October 16, 2008
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
1
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
2
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
3
What’s up with The HDF
Group?
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
4
Announcement!
NASA Commits $3.1M to
The HDF Group to
Support Earth System
Science
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
5
NASA Commits …
• “The HDF Group has received a 3-year contract from
NASA to provide ongoing development and support for
the HDF technologies used by NASA’s Earth Observing
System.
• The project continues the relationship that was first
established in 1994, when HDF was selected as the
standard format for the EOS Data and Information
System (EOSDIS).
• Since that time, over 4 petabytes of mission data and
derived data products have been stored in HDF4 and
HDF5, with an estimated 1.6 million users.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
6
• Under the new contract, The HDF Group will
support NASA’s EOS program in five critical
areas:
 Provide user support to EOS data providers and
data consumers
 Perform software development and quality
assurance
 Assure long-term access to HDF data
 Integrate with complementary technologies and
applications
 Advise follow-on earth systems projects
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
7
What is
The HDF Group
And why does it exist?
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
8
History of The HDF Group
• 18 Years at University of Illinois National Center
for Supercomputing Applications
• Spun-off from University July 2006
• Non-profit
• 20+ scientific, technology, professional staff
• Intellectual property:
 The HDF Group owns HDF4 and HDF5
 HDF formats and libraries to remain open
 BSD-type license
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
9
The HDF Group Mission
To ensure long-term
accessibility of HDF data
through sustainable
development and support of
HDF technologies.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
10
Goals
• Maintain, evolve HDF for sponsors and
communities that depend on it
• Provide consulting, training, tuning,
development, research
• Sustain the group for long term to assure data
access over time
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
11
The HDF Group Services
• Helpdesk and Mailing Lists
 Available to all users as a first level of support
• Standard Support
 Rapid issue resolution support
• Consulting
 Needs assessment, troubleshooting, design reviews, etc.
• Enterprise Support
 Coordinating HDF activities across departments
• Special Projects
 Adapting customer applications to HDF
 New features and tools, with changes normally incorporated into open
source product
 Research and Development
• Training
 Tutorials and hands-on practical experience
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
12
Members of the HDF support community
•
•
•
•
•
•
NASA
Sandia National Laboratory (2)
University of Illinois/NCSA
A leading U.S. aerospace company
NOAA Science Data Stewardship
New projects and partners
 A major product lifecycle management company
 A bioinformatics software company
 Engineering Research and Development Center –
Topographic Engineering Center
 NPOESS
 ITT VIS
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
13
Initiatives and areas of increased interest
•
•
•
•
•
•
•
Bioinformatics
High performance computing (HPC)
Microsoft products (HPC, .NET, others)
Database integration
Improving concurrency
Performance and storage efficiency
Improving high level language support
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
14
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
15
Basic Library Releases
HDF4
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
16
Overview of basic library releases
HDF5 1.8
1.8.0
(Feb 08)
HDF5
1.6
1.6.7
(Feb 08)
1.6.8
(Nov 08)
HDF4
4.2r3
(Feb 08)
4r2.4
Nov 08
H4-H5
Conversion
Software
Oct. 16, 2008
1.8.1
(May 08)
1.8.2
(Nov 08)
2.0
(May 08)
HDF and HDF-EOS Workshop XII
17
HDF5 1.8.0 (Feb 08)
• Major release with file format changes and
features.
• File format changes affect backward/forward
compatibility with previous releases.
• See "New Features in Release 1.8.0 and Format
Compatibility Considerations”
http://hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
18
HDF5 1.8 minor releases
• 1.8.1 (May 08)
 A minor release with bug fixes
 Provided 1.8 full support for Fortran applications
 Enhanced tools with 1.8.0 features
• HDF5 1.8.2 coming Nov 08
 Minor bug fixes
 Tool enhancements
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
19
HDF5 1.6 minor releases
• 1.6.7 (Feb 08)
 Modification to address Aura issue
• 1.6.8 coming Nov 08
 Minor bug fixes
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
20
Future HDF5 releases (highlights)
• Release HDF5 1.10.0




Performance improvements
Some new features
Support for Fortran 2003 features
Target date November 2009
• When to drop support for 1.6.* ?
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
21
HDF 4 minor releases
• 4.2r3 (Feb 08)
 Improved support for apps using HDF4 and NetCDF3
 Improved support for data sets and coordinate
variable with the same names
• Release HDF4r2.4 coming Nov 08
 Minor bug fixing, tools enhancements
 Support for C shared libraries
 Support for 32-bit version on Mac Intel
• http://hdfgroup.org/products/hdf4/
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
22
H4-H5 Conversion Software 2.0 (May)
• Re-built with HDF5 1.8.1 and HDF 4.2r3.
• Conversion tool h4toh5 enhanced
 Converts HDF-EOS2 files to HDF5 files
 Makes HDF5 files readable by NetCDF4
http://hdfgroup.org/h4toh5/
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
23
HDF-EOS library
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
24
HDF-EOS2 and HDF-EOS5
• Auto configuration for HDF-EOS2 and HDF-EOS5
 Compile and test libraries with automatic
configuration tools
 Thank you, Abe!
• Testing of EOS2 and EOS5
 Test daily with HDF4 and HDF5 development code
 Periodically test on EOS-critical platforms
• EOS website support
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
25
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
26
h5check 1.0 (March 2008)
• A validation tool to verify whether an HDF5 file
is encoded according to the HDF5 File Format
Specification.
• To ensure format integrity and long-term
compatibility between versions of the HDF5
library.
• By default, the file is verified against 1.8.x.
Can also verify against 1.6.x.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
27
Major Improvements for Existing Tools
• Improved handling of large datasets by h5diff,
h5repack, hdiff, and hrepack
• Other added capabilities




H5import: to import strings
H5diff: to deal with NaN values
H5dump: to dump objects in requested order
H5repack:
• To apply multiple filters to all objects
• To add a userblock
• To align datasets in file at byte offsets that support
efficient access
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
28
In the works: h52jpeg
• Converts datasets in an HDF5 file to a jpeg image.
• Prototype available, if you are interested.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
29
Please send us your
comments and requests
regarding the HDF4 and
HDF5 library and tools
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
30
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
31
HDF Java
• HDF-Java 2.5 release
 Beta 1 Release Feb 08
 Full release planned for Dec. 2008
• HDF5 JNI updated for HDF5 1.8.x with 1.6 flag
• Binary for 32-bit Linux and 64-bit Solaris
• Also added daily testing added for hdf-java
products
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
32
Also in the pipeline
• Full Java Support for HDF5 1.8.x
 Add and test new functions in Java wrapper
 Implement and test new functions in C JNI
 Use new functions in HDF-Java objects
• Add many new features
• Improve performance
• Revise HDFView User’s Guide
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
33
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
34
Surviving a System Failure
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
35
Surviving a System Failure in HDF5
• Problem:
 In the event of an application or system crash, data
in HDF5 files are susceptible to corruption
 Corruption can occur if structural metadata is being
written when the crash occurs
• Initial Objective:
 Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash
 No guarantee on state of raw data – contains
whatever data made it to disk prior to crash
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
36
HDF5 Metadata Journaling Recovery
Application
crashes
H5recover Tool
Restored
HDF5 File
Corrupted HDF5 File
Companion Journal File
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
38
Faster HDF5 Data Appends
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
40
Fast Data Appends
• Problem: Metadata operations limit the rate at
which HDF5 can append data to datasets.
• Solution: new data structure for indexing chunks:
 Allows constant time extend, shrink and lookup of
chunks in datasets with single unlimited dimension
 # of metadata I/O operations to append to dataset
is independent of # of chunks
 Also allows single-writer/multiple-reader access
• Details at:
http://hdfgroup.uiuc.edu/RFC/HDF5/ReviseChunks/
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
41
HDF Performance
Framework
A framework for performance
regression testing
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
42
HDF Performance Framework
• A tool for




Testing on multiple platforms
Testing different versions
Long term regression testing
Assistance in debugging
• New for 1.8:
 API and format versioning
 Improved reporting interfaces
• Future related work
 Quality monitoring of the software, such as code
coverage, memory usage
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
43
Other library work
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
44
Library Features
• Improved external link support
 External link: link to HDF5 object in another file
 Can more easily specify path lookup of external
files
 Adding external link support for h5ls and h5dump
• Time datatype improvements
 Expand time type to support native formats better
 Adapt tools to display them properly
• Port to OpenVMS (limited support)
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
45
Improving performance
• Faster file free-space management while file open
• Many transactions can create many holes
• Free space management recovers unused space
• Up to 38x improvement in experiments
• Direct I/O: file I/O goes directly between
application and storage, bypassing operating
system read and write caches
• Disabling automatic metadata cache flushing
 In experiments, direct I/O combined with metadata
cache disabling improved I/O speed by about 2x.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
46
Topics
What's up with The HDF Group?
Library Update
Tools update
HDF Java Products
Library development in the works
Other activities
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
47
Remote access
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
48
Three “remote access” projects
• HDF5-OPeNDAP handler
 See talk by Kent Yang: “HDF5 OPeNDAP project
update and demo”
• HDF5-iRODS integration
 See Peter Cao’s talk Thursday: “HDF5 iRODS”
• Accessing HDF5 through SSHFS-FUSE
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
49
Accessing HDF5 through SSHFS-FUSE
• Access to files on remote NFS system limited
• Combining FUSE (Filesystem in Userspace) with SSHFS
(Secure Shell File System)
 FUSE provides application with local view of remote file system
• Another way to mount remote file system
 SSHFS allows the local file system to access parts of remote file.
• e.g., “read” operation on the remote filesystem can be served
through SSH
• Subsetting can be efficiently done with SSHFS
• Extract a dataset (5 MB) from a 96 MB HDF5 file
 Download whole file + subset locally: 9.85 seconds
 Subset with SSHFS: 0.47 seconds
• Technical report in the works
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
50
HDF4 Layout Map Project
• Problem
 Long-term readability of HDF data dependent on
long-term availability of HDF software
• Proposed solution
 Create a map of the layout of data objects in an
HDF file, allowing a simple reader to be written to
access the data
• See today’s talk by Folk and Duerr: “Ensuring
Long Term Access to Remotely Sensed HDF4
Data with Layout Maps.”
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
51
HDF and .NET Framework
• Prototype .NET wrappers for HDF5 1.8.0
 Based on subset of HDF5 C routines
• Released in March, 2008
• Unsupported
 Considerable interest, but currently no funding to
support or maintain
 Use hdf-forum email list for questions
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
52
netCDF-4
Released June 2008!!
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
53
Investigation of HDF
Support in Some Open
Source Software Packages
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
54
Five open source packages
• PyHDF
 Python interface to HDF4
 http://pysclint.sourceforge.net/pyhdf/
• Geospatial Data Abstraction Library (GDAL)
 Translator library for Raster Geospatial Data Formats
 Supports about 100 file formats
 http://gdal.org/
• NCAR Common Language (NCL)
 Interpreted Language for Data Analysis and Visualization
 http://ncl.ucar.edu/
• Grid Analysis and Display System (GrADS)
 Interpreted Language for Data Analysis and Visualization
 http://iges.org/grads/
• GNU Data Language (GDL)
 Interpreted Language for Data Analysis and Visualization
 Data Analysis and Visualization
 http://gnudatalanguage.sourceforge.net/
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
55
Evaluation criteria
• Formats
 HDF4, HDF5, netCDF
 Objects supported in each language
• Installation
 Availability of binaries
 Other requirements
• Adequacy of documentation
• Technical report available soon.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
56
Windows Virtualization
Motivation: high cost of
maintaining many different
Windows configurations
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
57
Maintenance & Testing with VMWare
•
•
•
•
•
Multiple virtual machines run in parallel
Only relevant software installed
Each represents a supported configuration
Run nightly tests of HDF4, HDF5
Each is powered on, tested, cleaned
automatically
• Technical report available soon.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
58
HDF5 Data Transform Pilot Study
• Tools for Flight Test Data
• Framework to define and apply transformations
to data being read
• Transformations specified in Python
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
59
Science Data Stewardship
• Goal: migrate data to a single standards-based archive
format.
• Approach: investigate how to store NASA ECS data and
metadata in HDF5 Archival Information Packages (AIP).
• See talk by Yang, Duerr et al: “Using HDF5 Archive
Information Package to preserve HDF-EOS2 data”
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
60
Thank You All
and
Thank You NASA!
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
61
Acknowledgements
This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and Space Administration (NASA)
under NASA Awards NNX06AC83A and
NNX08AO77A.
Any opinions, findings, and conclusions or
recommendations expressed in this material are
those of the author(s) and do not necessarily
reflect the views of the National Aeronautics and
Space Administration.
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
62
Questions/comments?
Oct. 16, 2008
HDF and HDF-EOS Workshop XII
63