WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service) ET-ADRS: 23-25 April 2008 WMO.
Download
Report
Transcript WMO ET-ADRS Hierarchical Data Format (HDF) Manuel Fuentes (ECMWF) Erdem Erdi (Turkish State Meteorological Service) ET-ADRS: 23-25 April 2008 WMO.
WMO ET-ADRS
Hierarchical Data Format (HDF)
Manuel Fuentes
(ECMWF)
Erdem Erdi
(Turkish State Meteorological Service)
ET-ADRS: 23-25 April 2008
1
WMO
Outline
Brief introduction to HDF
SWOT Analysis
Practical examples
ET-ADRS: 23-25 April 2008
2
WMO
Hierarchical Data Format: HDF
HDF is a file format
HDF files are self-described
HDF technologies at present include two data management
formats (HDF4 and HDF5) and libraries, a modular data
browser/editor, associated tools and utilities, and a conversion
library
Both HDF4 and HDF5 were designed to be a general scientific
format, adaptable to virtually any scientific or engineering
application, and also have been used successfully in nontechnical areas
HDF5 is particularly good at dealing with data where complexity
and scalability are important
ET-ADRS: 23-25 April 2008
3
WMO
Features provided by HDF5 technology
Unlimited size, extensibility, and portability
General data model
Unlimited variety of datatypes
Flexible, efficient I/O
Flexible data storage
Data transformation and complex subsetting
ET-ADRS: 23-25 April 2008
4
WMO
HDF5: Unlimited size, extensibility, and portability
HDF5 does not limit the size of files or the size or number of
objects in a file.
The HDF5 format and library are extensible and designed to
evolve gracefully to satisfy new demands.
HDF5 functionality and data is portable across virtually all
computing platforms and is distributed with C, C++, Java, and
Fortran90 programming interfaces.
ET-ADRS: 23-25 April 2008
5
WMO
HDF5: General data model
The HDF5 data model supports complex data relationships and
dependencies through its grouping and linking mechanisms.
HDF5 accommodates many common types of metadata and
arbitrary user-defined metadata.
ET-ADRS: 23-25 April 2008
6
WMO
HDF5: Unlimited variety of datatypes
HDF5 supports a rich set of pre-defined datatypes as well as the
creation of an unlimited variety of complex user-defined
datatypes.
Datatype definitions can be shared among objects in an HDF file,
providing a powerful and efficient mechanism for describing data.
Datatype definitions include information such as byte order
(endian), size, and floating point representation, to fully describe
how the data is stored, insuring portability to other platforms.
ET-ADRS: 23-25 April 2008
7
WMO
Data model and datatypes
ET-ADRS: 23-25 April 2008
8
WMO
HDF5: Flexible, efficient I/O
HDF5, through its virtual file layer, offers extremely flexible
storage and data transfer capabilities. Standard (Posix), Parallel,
and Network I/O file drivers are provided with HDF5.
Application developers can write additional file drivers to
implement customized data storage or transport capabilities.
The parallel I/O driver for HDF5 reduces access times on parallel
systems by reading/writing multiple data streams simultaneously.
ET-ADRS: 23-25 April 2008
9
WMO
HDF5: Flexible data storage
HDF5 employs various compression, extensibility, and chunking
strategies to improve access, management, and storage
efficiency.
HDF5 provides for external storage of raw data, allowing raw data
to be shared among HDF5 files and/or applications, and often
saving disk space.
ET-ADRS: 23-25 April 2008
10
WMO
HDF5: Data transformation and complex subsetting
HDF5 enables datatype and spatial transformation during I/O
operations.
HDF5 data I/O functions can operate on selected subsets of the
data, reducing transferred data volume and improving access
speed.
ET-ADRS: 23-25 April 2008
11
WMO
Governance: The HDF Group
The mission of The HDF Project is
to develop, promote, deploy and support open and free
technologies that facilitate scientific data exchange, access,
analysis, archiving and discovery
to ensure long-term availability and support for HDF
technologies, and by extension, long-term accessibility of
data stored using HDF technologies
The HDF group currently includes 15 full time staff members and
3 to 5 students. The group’s annual budget is $2.1 million, which
is mostly provided by the government sector
ET-ADRS: 23-25 April 2008
12
WMO
Copyright
http://hdf.ncsa.uiuc.edu/HDF5/doc/Copyright.html
HDF5 (Hierarchical Data Format 5) Software Library and Utilities
with Copyright 2006-2008 by The HDF Group (THG).
NCSA HDF5 (Hierarchical Data Format 5) Software Library and
Utilities with Copyright 1998-2006 by the Board of Trustees of the
University of Illinois.
All rights reserved.
ET-ADRS: 23-25 April 2008
13
WMO
Copyright (cont.)
Redistribution and use in source and binary forms, with or without modification, are
permitted for any purpose (including commercial purposes) provided that the following
conditions are met:
Redistributions of source code must retain the above copyright notice, this list of
conditions, and the following disclaimer
Redistributions in binary form must reproduce the above copyright notice (which is on the
previous slide) , this list of conditions, and the following disclaimer in the documentation
and/or materials provided with the distribution
In addition, redistributions of modified forms of the source or binary code must carry
prominent notices stating that the original code was changed and the date of the change
All publications or advertising materials mentioning features or use of this software are
asked, but not required, to acknowledge that it was developed by The HDF Group and by the
National Center for Supercomputing Applications at the University of Illinois at UrbanaChampaign and credit the contributors
Neither the name of The HDF Group, the name of the University, nor the name of any
Contributor may be used to endorse or promote products derived from this software without
specific prior written permission from THG, the University, or the Contributor, respectively
DISCLAIMER: THIS SOFTWARE IS PROVIDED BY THE HDF GROUP (THG) AND THE
CONTRIBUTORS "AS IS" WITH NO WARRANTY OF ANY KIND, EITHER EXPRESSED OR
IMPLIED. In no event shall THG or the Contributors be liable for any damages suffered by the
users arising out of the use of this software, even if advised of the possibility of such
damage
ET-ADRS: 23-25 April 2008
14
WMO
SWOT Analysis: Criteria
Ability to present information pertinent to WMO Programmes
Ability to encode textual information, such as warnings
Ability for usage in operational data exchanges
Ability for usage in transmission of information to users outside
NMHSs
Ability for usage in storage systems by NMHSs, centres or other
users
Compliance and status with regard to existing standards
Inter-operability, translation back and forward to other DRSs
Can it be used to envelope objects
Available and widespread support (skills and technology)
ET-ADRS: 23-25 April 2008
15
WMO
SWOT : Present information pertinent to WMO
Ability/suitability to present information pertinent to WMO
Programmes and Member needs including weather, climate,
water, atmospheric constituents, oceanography, aviation and
other related environmental information
Any data of 2-D, 3-D meteorology, hydrology or similar science
can be handled
Used by satellite applications in meteorology due to suitability for
large and complex data.
There are not too many tools for non-satellite meteorological data
It’s not clear how to handle millions of bulletins:
Group bulletins into a big file
1 bulletin per file (minimum HDF5 file size: 2 Kbyte)
ET-ADRS: 23-25 April 2008
16
WMO
SWOT : Present information for pictorial display
Ability/suitability to present information for pictorial display
HDF can store and present graphical data with 2 or 3
dimensions, allows for raster and vectors. Tools can display
information in graphical form
ET-ADRS: 23-25 April 2008
17
WMO
SWOT : Encode textual information
Ability/suitability to encode textual information, such as warnings
HDF can store textual information of any length
Suitable for storing metadata
ET-ADRS: 23-25 April 2008
18
WMO
SWOT : Encode Metadata
ET-ADRS: 23-25 April 2008
19
WMO
SWOT : Usage in operational data exchanges
Ability/suitability for usage in operational data exchanges (real
time or otherwise) between NMHSs and centres. Including
information regarding existing usage especially with regard to
extent of use
EUMETSAT dissemination (EUMETCAST) supports HDF as
delivery format
There is no naming convention for satellite data (only TERRA
& AQUA share naming convention, because the same team
developed the 2 satellites). Otherwise each satellite has
different naming convention and order of elements in file
There are 2 attempts to standardize satellite data in HDF:
• KNMI-HDF5: Special library for encoding
• HDF-EOS: TERRA, AQUA & Petabytes more
ET-ADRS: 23-25 April 2008
20
WMO
SWOT : Usage in operational data exchanges
An important portion of the operational satellite based
meteorological data and products are distributed to the
meteorological community in the HDF format, in near-realtime or
non-realtime (archive).
Some examples are
EUMETSAT SAF Products (NWC SAF, LAND SAF)
EUMETSAT EPS data
NASA EOS Data and products
ET-ADRS: 23-25 April 2008
21
WMO
SWOT : Transmission of information to users outside
NMHS
Ability/suitability for usage in transmission of information to
users outside NMHSs or centres. Including information regarding
existing usage especially with regard to extent of use
HDF is widely used in scientific communities:
• Universities, Research labs
• Space agencies (like NASA and EUMETSAT)
HDF is mainly used for satellite data.
Use of HDF in a variety of disciplines and users
• Encourages development of tools
• Makes it easy to use outside NMHSs
Software publicly available with supported tools
ET-ADRS: 23-25 April 2008
22
WMO
SWOT : Usage in storage systems
Ability/suitability for usage in storage systems by NMHSs, centres
or other users. Including information regarding existing usage
especially with regard to extent of use
Parallel I/O
Machine independent
Compression
The HDF Group is committed to ensure the long-term
accessibility of HDF-stored data
EUMETSAT does not store data in HDF, but convert from raw
NASA archives the Earth Observing System data in HDF
“Grouping” of data at archiving may impose restrictions on
how data can be retrieved
ET-ADRS: 23-25 April 2008
23
WMO
SWOT : Standards
Compliance and status with regard to with existing standards.
Are they open standards? Which body overseas them. Is there
any proprietary nature to them. Are they flexible enough to
accommodate our current and foreseen needs. How are they
updated, is it a straight forward process
HDF format is very suitable for GIS (as it can handle both data
and metadata in the same file). However, it is not widely used
for GIS because of the lack of a convention (schema)
It is governed by The HDF Group
Compression: SZIP method is proprietary, ZLIB is open
HDF licence seems flexible
The library is updated regularly in a straight forward manner
ET-ADRS: 23-25 April 2008
24
WMO
SWOT : Interoperability
How suitable is the DRS to the WIS and to developing the
appropriate metadata? Is existing documentation good? How
much variance is there in current implementations? Are the
existing flavours inter-operable?
HDF can meet the requirements regarding metadata required for
the WIS
Documentation is good with lots of examples
There are 2 implementations:
Tools to convert from HDF4 to HDF5
No direct inter-operability between the 2 implementations
ET-ADRS: 23-25 April 2008
25
WMO
SWOT : Conversion to/from other DRS
What are the issues for translating back and forward to other
DRSs?
Translation could be loss-less when using same encoding
method (not standard)
Encoding: Offset and scale factor. Tools are not aware of
encoding
Compression can be used instead of encoding in order to
avoid larger files
Compression is transparent for users
Native data types are 1, 2, 4, 8 bytes
ET-ADRS: 23-25 April 2008
26
WMO
SWOT : Envelope objects
Can they be used to envelope objects or act as a pseudo-carrier
for other data formats?
HDF can handle/envelope any kind of data format either binary
or ASCII
HDF can handle BLOBs (stream of bytes)
ET-ADRS: 23-25 April 2008
27
WMO
SWOT: Support, skills and technology
Available and widespread support for the DRS (skills and
technology)
The HDF Group’s commitment to:
• Support HDF
• Ensure long-term accessibility of the data
Established user community
ET-ADRS: 23-25 April 2008
28
WMO
SWOT Summary: Strengths
HDF5 can store and present 2-D or 3-D data (gridded fields),
together with metadata
Rich set of predefined datatypes and data relationships
High performance features:
Parallel I/O
Unlimited dimensions
Compression
Unlimited size and amount of data
I/O functions can operate on subsets of data
Open data format and free software (libraries and tools)
Operational services (EUMETCAST) support HDF
ET-ADRS: 23-25 April 2008
29
WMO
SWOT Summary: Weaknesses
HDF is a file format, as opposed to a message/bulletin format (like
GRIB or BUFR)
There is no convention:
Names to use
Order in which to store elements
May not handle well point observation data
There aren’t many tools for meteorological data
Comparison with NetCDF:
HDF5’s general data model makes writing data more difficult
than NetCDF
HDF5 will be the storage format for NetCDF4
ET-ADRS: 23-25 April 2008
30
WMO
SWOT Summary: Opportunities
Using HDF5 may improve inter-operability with other disciplines
Using HDF5 may improve usability of meteorological data outside
NMHSs
Software publicly available, with numerous (general) tools and
programming languages
ET-ADRS: 23-25 April 2008
31
WMO
SWOT Summary: Threats
The HDF format is developed and maintained by a single group
(The HDF Group). Any problem with funding could jeopardise the
existence of the format or its support
Meteorology would be very small community compared to other
users of HDF. Requirements of the Meteorological community
may not be so important for the HDF community
ET-ADRS: 23-25 April 2008
32
WMO
Practical Examples
Data received at ECMWF, converted to BUFR, then used by
Forecasting System:
HDF4
Microwave Brightness Temperature from Tropical Rainfall
Measuring Mission (trmm)
Rainfall from Tropical Rainfall Measuring Mission (trmm)
HDF5
METOP GOME-2 total column ozone data
Aura OMI ozone data
Each data stream has its own conversion tool HDF to BUFR
ET-ADRS: 23-25 April 2008
33
WMO
Practical Examples
Eumetsat
HDF-EOS
HDF-KNMI
We haven’t found any examples of field observations (SYNOP,
METAR, radiosondes..)
ET-ADRS: 23-25 April 2008
34
WMO
Practical Example: MODIS swath data from channel 3
ET-ADRS: 23-25 April 2008
35
WMO
Practical Example: 3-D data
Schwarzschild metric (spatial components only)
ET-ADRS: 23-25 April 2008
36
WMO
Practical Example: MSG total precipitable water
ET-ADRS: 23-25 April 2008
37
WMO
Practical Example: SAF cloudtype product
ET-ADRS: 23-25 April 2008
38
WMO
Conclusion
HDF5 is going to be the base for NetCDF4. It makes more sense to
focus on NetCDF4 than HDF5
Support for subsetting
Parallel I/O
Unlimited dimensions
Compression
Remove current limitations on file size
HDF is a file format as opposed to GRIB/BUFR which are message
(or bulletin) formats
HDF might not be suitable for operational exchange of
meteorological data between NMHSs, but to present
meteorological information to other users/disciplines
ET-ADRS: 23-25 April 2008
39
WMO