No Slide Title

Download Report

Transcript No Slide Title

HDF Update
Mike Folk
National Center for Supercomputing Applications
HDF and HDF-EOS Workshop IX
December 1, 2005
-1-
HDF
Outline
• Organizational info
• HDF Software Update
• Other Activities of Interest
-2-
HDF
Organizational info
-3-
HDF
The HDF Team
Frank Baker
Christian Chilan
Peter Cao
Vailin Choi
Mike Folk
Fang Guo
Anne Jennings
Barbara Jones
Quincey Koziol
James Laird
Raymond Lu
John Mainzer
Pedro Nunes
Elena Pourmal
Binh-minh Ribler
Eric Shapiro
Rishi Sinha
Arash Termehchy
Kent Yang
And all those wonderful folks out there who
contribute ideas, requests, bug reports, code, and
support.
-4-
HDF
The HDF Group is Moving
HDF
-7-
HDF
“The HDF Group” = “THG”
-8-
HDF
THG
• Why spin off from U of Illinois?
– Creating a sustainable organization
– We do more than R&D
• THG already exists
-9-
HDF
How will THG be different from the NCSA
HDF Group?
•
•
•
•
•
•
Business model
Location
Staff
THG – NCSA – UIUC relations
Affect on NASA and other affiliation
Intellectual property
- 10 -
HDF
HDF Software Update
- 11 -
HDF
Major software milestones since Oct. 2004
HDF Java 2.1
HDF Web browser plug-in
HDF 4.2r1
HDF5 1.6.4
HDF4-to-HDF5 conversion tools 1.2
HDF Java 2.2
HDF5 1.6.5
Nov Dec Jan Feb Mar Apr May Jun
2004
2005
- 12 -
Jul Aug Sep Oct Nov
HDF
Release highlights
- 13 -
HDF
HDF 4.2r1 – February 2005
• Szip compression fixed
• Windows
– hdiff and hrepack added
– Config, build, testing procedures improved
• h4fc utility fixed
- 14 -
HDF
HDF 4.2r1 – new compilers and platforms
• Mac OS X
• Linux 2.4
– Fortran IBM xlf v. 8.1
– Absoft f95 v. 8.2
– Absoft Fortran f95 v.
9.0
– PGI C and Fortran
– Intel C and Fortran
• AMD Opteron
• Cray TS IEEE
- 15 -
HDF
HDF5-1.6.4 – March 2005
• High-Level (HL) library
– Some new C APIs added
– Fortran APIs added
– HL library now built and installed by default
• Library built and tested with SZIP 2.0.
• Many changes to improve library performance
– Especially for variable length types and metadata cache
• H5jam – a new utility
– Allows a text file to be added to the "user block" at the
beginning of an HDF5 file
- 16 -
HDF
Platforms to be dropped in future releases
• Operating systems
–
–
–
–
–
• Compilers
– We use the latest
versions of vendors
compilers as they
become available and
drop the previous ones
Solaris 2.8
HPUX B.11.00
Crays T3E and T90
Linux RH 7.* and 8.*
Windows 2000
- 17 -
HDF
Platforms to be added
• Systems
• Compilers
–
–
–
–
–
– gcc 4.*
– HDF5 Fortran: Leahy,
NAG, G95
– MPI-2
Solaris 2.10
Cray X1
Cray XT3
NEC SX6
HP 64-bit (HPUX
11.23)
– Mac OS 10.4
- 18 -
HDF
Coming next: Major release HDF5 1.8
• Windows MPICH support: prototype
• Integer to float conversions
– Will support integer to float conversions during I/O
– http://hdf.ncsa.uiuc.edu/RFC/dtype_conv_overflow/Overflow.html
• New error-handling API
• Dimension scales
– Similar to dimension scales in HDF4
– http://hdf.ncsa.uiuc.edu/RFC/H5DimScales/H5dimscale_Specification_1_0
-5.pdf
- 19 -
HDF
• N-bit compression filter
– Compact storage for user-defined datatypes.
– http://hdf.ncsa.uiuc.edu/RFC/NBitPacking/NBitPacking.html
• Offset+size storage filter
– Performs a scale and/or offset operation on each data value,
truncating the resulting value to a lesser number of bits before
storing it.
– http://hdf.ncsa.uiuc.edu/RFC/ScaleOffsetCompress/ScaleOffse
tCompress.html
- 20 -
HDF
• Group revisions
– Option to access objects according to creation order
– Improved performance for groups containing a large
number of objects.
– http://hdf.ncsa.uiuc.edu/RFC/ReviseGroups/
• Improved metadata cache
– New metadata cache improves performance and
memory usage in the library.
– Apps that access files with a large number of objects
should see significant performance improvement and
should use less memory.
- 21 -
HDF
• Data transformation filter
– Performs data transformation during I/O operations.
– Transform expressed by algebraic formula (e.g. a*x + b)
– http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/H5_dev/
html/RM_H5P.html#Property-SetDataTransform
• Ph5diff – parallel h5diff
– Compares two files in an MPI parallel environment.
– Compares multiple datasets simultaneously.
– http://hdf.ncsa.uiuc.edu/RFC/PH5DIFF/
- 22 -
HDF
• HDFpacket API
–
–
–
–
Data collected in “packets”
“Horizontal” view, per time step
Efficient access to fixed- and variable-length records
http://hdf.ncsa.uiuc.edu/RFC/HDF5Packet/Tech_reprt_
HDF5Packet.pdf
• Possible: HDFtime_history API
– Archival, viewing, analysis
– “Vertical” view, per parameter
- 23 -
HDF
SZIP integration with HDF4 and HDF5
• Development and integration completed
– Includes libraries and tools
• SZIP documentation web page
– http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/
– Examples and performance studies for HDF5
- 24 -
HDF
Parallel I/O and chunking
• Collective I/O – key to improving
performance for parallel HDF5
• Current versions only allow collective I/O
for regular selection in contiguous storage
• Expanding use of collective IO in HDF5
– For regular selection in chunked storage
– For irregular selection for both chunked and
contiguous storage
- 25 -
HDF
Java and other tools
- 26 -
HDF
Tools development
• HDF4
– hrepack and hdiff performance improved
• H4 to H5 Conversion Tools
– Updated to HDF4.2r1, HDF5-1.6.4
• H5jam
– New tools to add/remove user block in front of file
• H5dump
– Faster for files with large numbers of objects
– Can dump contents of the boot block
– Can dump dataset filters, storage layout, fill value
• Parallel h5diff
– Enables h5diff to run in parallel
- 27 -
HDF
HDF Java Products
- 28 -
HDF
HDFView changes
• Support for Storage Resource Broker (SRB)
– HDF5 object level access to remote files
•
•
•
•
•
Display HDF5 compound datatypes with arrays
Create/display HDF5 named datatypes
Create links in HDF5
Improve ability to manipulate palette
Select row/column for xy plot in the table view
- 29 -
HDF
New Functions in Java API
• Request an individual object without
loading entire structure of file
• Send client request to SRB server and
receive result from server
• Create HDF5 indexing table
• Query for HDF5 datasets
- 30 -
HDF
HDF Web-browser Plug-in
•
•
•
•
•
•
Extends browser to display HDF4/5 files
A ‘lite” version of HDFView
Analogous to PDF reader
Fewer browsing features
No editing features
Windows Only
- 31 -
HDF
- 32 -
HDF
HDF Web-browser Plug-in
• Not an applet
– It is downloaded and installed once
– An applet is downloaded with each invocation
• http://hdf.ncsa.uiuc.edu/plugins/
- 33 -
HDF
HDF-EOS module for HDFView
• Developed by HDF-EOS team
• Optional module for HDF-EOS files
– Reads, displays HDF-EOS grid, swath, etc.
– (Generic modules show native HDF5 objects)
• Tested with HDFView 2.3
• To do -- get permission to release with
HDFView
- 34 -
HDF
- 35 -
HDF
Future work for Java
• Add OPeNDAP client support to HDFview
– Seamlessly retrieve data from any OPeNDAP server
• Support HDF5 Dimension Scales
– Recognize geospatial coordinates
• Support for HDF5 Indexing
– Create indexing table and query HDF5 datasets
• H5Gen
– Generate HDF5 file from XML file
- 36 -
HDF
Other Activities of Interest
- 37 -
HDF
DOE/ASC*
“ASC provides the integrating simulation and modeling
capabilities and technologies needed …for future design
assessment and certification of nuclear weapons and
their components”
• Massively parallel computing and I/O
• Complex data models and big data
• HDF5 a standard format for ASC apps
* “Advanced Simulation and Computing Program”
- 38 -
HDF
Boeing
- 39 -
HDF
Boeing
HDF5 for flight test data
• Commercial (Boeing 787) and military planes
• 787 active archive
– HDFtime_history
– 10 TB per flight-test day
– Also post-testing data
• Must handle raw, real-time data
– Variable-length datatypes/records
– High speed ingest
– HDFpacket API
- 40 -
HDF
Boeing High Level API’s
• HDFpacket (see above)
• HDFtime_history
– Structured records for archive, analysis, viewing
– “Vertical” view, per parameter
- 41 -
HDF
Object encryption to support access control
• For Boeing
• Investigated the role of encryption in
developing access control
• Developed a prototype, now being tested
- 42 -
HDF
Indexing
- 43 -
HDF
Projection Indexes in HDF5
•
•
•
•
Standardize indexing in HDF5
Make indexes portable
Just a prototype
See Rishi Sinha’s talk
- 44 -
HDF
Product model data
- 45 -
HDF
Product data exchange – STEP
• STEP is an ISO data
transfer standard.
• Defines
characteristics of
product throughout its
life cycle.
• Widely used in design
and manufacturing.
STEP
• Uses EXPRESS
data modeling
language to
describe data.
- 46 -
HDF
STEP Limitations
• Currently text-based format
• Requires all the objects to be in memory
• Apps starting to produce very large data
volumes
• EU looking for a binary equivalent for
STEP
- 47 -
HDF
HDF5 as binary format for STEP
• EU identified HDF5 as best candidate
• Prototype in the works
– EXPRESS  HDF5 mappings
– Convert sample data collections
• Workshop at U of Illinois next week.
• National Archives also funding HDF study.
- 48 -
HDF
Bioinformatics
- 49 -
HDF
DNA sequencing workflows
• Diverse formats, some
proprietary
• Highly redundant data
• Repeated file processing
• Disconnected programs
• In-core processing
models
• Lack of persistence
- 50 -
HDF
Multiple Levels of Information
SNP Score
Contig Summaries
Discrepancies
Contig Qualities
Coverage Depth
Trace
Reads
Aligned bases
Read
quality
Contig
Percent match
- 51 -
HDF
HDF5 as binary format for bioinformatics
- 52 -
HDF
netCDF and OPeNDAP
- 54 -
HDF
netCDF-HDF Project
• Enhanced NetCDF-4 Interface to HDF5
– Combine features of netCDF and HDF5
– Take advantage of their separate strengths
• Collaboration between NCSA and Unidata
• Currently in Alpha Release
- 55 -
HDF
New OPeNDAP HDF5 Project
• Four parts
– Bring existing prototype into conformance with the
DAP2 NASA/ESE RFC
– Develop a DAP4 server for HDF5
– Develop server-side utilities to convert DAP4 data
responses to an HDF5 file
– Investigate an integrated DAP-aware HDF5 library, that
could provide seamless access to both local and remote
data
• Funded by NASA ROSES “Advancing Collaborative
Connections for Earth-Sun System Science”
- 56 -
HDF
Archival formats
- 57 -
HDF
Archival formats
• Ruth Duerr (NSIDC) initiated investigations
• How to preserve the content & performance
features of complex scientific data formats
• At the same time provide the requisite
simplicity needed for long term archival
storage.
• Ruth will speak about this
- 58 -
HDF
Hydroinformatics
- 62 -
HDF
Hydroinformatics
• HDF5 as exchange format for
hydroinformatics data
– Groundswell of interest lately
– Sometimes in combination with netCDF 4
– Talk to Mike Folk
- 63 -
HDF
“Hydroinformatics”
- 64 -
HDF
Thank You
- 65 -
HDF
Questions/comments?
- 66 -
HDF
Information Sources
• HDF website
– http://hdf.ncsa.uiuc.edu/
• HDF5 Information Center
– http://hdf.ncsa.uiuc.edu/HDF5/
• HDF Helpdesk
– [email protected]
• HDF users mailing list
– [email protected]
- 67 -
HDF