NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

Download Report

Transcript NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9

NetCDF-4 Interoperability with
HDF4 and HDF5
Ed Hartnett
Unidata, 8/4/9
Purpose of Interoperability
Features: World Conquest
• The purpose of the interoperability features is to
allow users to use netCDF programs on nonnetCDF data archives.
• NetCDF-Java can read many data formats; the
idea is to bring some of this functionality to the
C/Fortran/C++ libraries.
Warning and Request
•
HDF4 and HDF5 interoperability features are
still being tested. They are not ready for
operational use yet.
•
The interoperability features are available in
the netCDF daily snapshot release.
•
Please use them and send feedback to:
[email protected]
Overview
•
•
HDF4 Interoperability
–
What is HDF4 and why bother with it?
–
Reading HDF4 files with netCDF.
–
Limitations and request for help.
HDF5 Interoperability
–
What is HDF5 and why bother with it?
–
Reading HDF5 files with netCDF.
–
Limitations.
What is HDF4?
•
The original HDF format, superseded by
HDF5.
•
HDF4 has built-in 32-bit limits that make it
unattractive for new data sets. It is still actively
supported by The HDF Group, but no new
features are added.
•
Get more info about HDF4 at:
http://www.hdfgroup.org/products/hdf4
Why Read HDF4?
Some important data sets are distributed in HDF4,
for example the Aqua/Terra satellite data.
HDF4 Background
•
HDF4 has several different APIs. The one of
greatest interest to netCDF users is the SD
(Scientific Data) API.
•
The SD API is (intentionally) very similar to the
netCDF classic data model.
Confusing: HDF4 Includes
NetCDF v2 API
•
A netCDF V2 API is provided with HDF4 which
writes SD data files.
•
This must be turned off at HDF4 install-time if
netCDF and HDF4 are to be linked in the
same application.
•
There is no easy way to use both HDF4 with
netCDF API and netCDF with HDF4 read
capability in the same program.
Reading HDF4 SD Files



Starting with version 4.1, netCDF will be able to
read HDF4 files created with the “Scientific
Dataset” (SD) API.
This is read-only: NetCDF can't write HDF4!
The intention is to make netCDF software work
automatically with important HDF4 scientific
data collections.
Building NetCDF to Read HDF4
•
This is only available for those who also build
netCDF with HDF5.
•
HDF4, HDF5, zlib, and other compression
libraries must exist before netCDF is built.
•
Build like this:
./configure –with-hdf5=/home/ed –enable-hdf4
Compiling with HDF4
•
Include netcdf header file as usual.
•
Include locations of netCDF, HDF5, and HDF4
include directories:
-I/loc/of/netcdf/include -I/loc/of/hdf5/include I/loc/of/hdf4/include
Linking with HDF4
The HDF4 and HDF5 libraries (and associated
libraries) are needed and must be linked into all
netCDF applications. The locations of the lib
directories must also be provided:
-L/loc/of/netcdf/lib -L/loc/of/hdf5/lib L/loc/of/hdf4/lib
-lmfhdf -ldf -ljpeg -lhdf5_hl -lhdf5 -lz
Use nc-config to Help with
Compile Flags
• The nc-config utility is provided to help with
compiler flags:
$ ./nc-config --cflags
-I/usr/local/include
$ ./nc-config --libs
-L/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 -lz
-lm -lhdf4
$ ./nc-config --flibs
-M/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 lz -lm -lhdf4
Implementation Notes
•
You don't need to identify the file as HDF4
when opening it with netCDF, but you do have
to open it read-only.
•
The HDF4 SD API provides a named, shared
dimension, which fits easily into the netCDF
model.
•
The HDF4 SD API uses other HDF4 APIs,
(like vgroups) to store metadata. This can be
confusing when using the HDF4 data dumping
tool hdp.
C Code to Read HDF4 SD File
/* Create a file with one SDS, containing our phony data. */
sd_id = SDstart(FILE_NAME, DFACC_CREATE);
sds_id = SDcreate(sd_id, PRES_NAME, DFNT_INT32,
DIMS_2, dim_size);
SDwritedata(sds_id, start, NULL, edge, (void *)data_out);
if (SDendaccess(sds_id)) ERR;
if (SDend(sd_id)) ERR;
/* Now open with netCDF and check the contents. */
if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;
if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in,
&unlimdim_in)) ERR;
...
ncdump and HDF4 SD Files
•
With HDF4 reading enabled, ncdump works
on HDF4 files.
•
Sample MODIS file:
../ncdump/ncdump -h MOD29.A2000055.0005.005.2006267200024.hdf
netcdf MOD29.A2000055.0005.005.2006267200024 {
dimensions:
Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice = 406 ;
Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice = 271 ;
Along_swath_lines_1km\:MOD_Swath_Sea_Ice = 2030 ;
Cross_swath_pixels_1km\:MOD_Swath_Sea_Ice = 1354 ;
variables:
float Latitude(Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice,
Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice) ;
Latitude:long_name = "Coarse 5 km resolution latitude" ;
Latitude:units = "degrees" ;
...
HDF-EOS Not Understood
•
Many HDF4 data sets of interest follow the
HDF-EOS metadata standard.
•
Stored as a long text string in global attributes,
the HDF-EOS metadata looks messy.
// global attributes:
:HDFEOSVersion = "HDFEOS_V2.9" ;
:StructMetadata.0 =
"GROUP=SwathStructure\n\tGROUP=SWATH_1\n\t\tSwathName
=\"MOD_Swath_Sea_Ice\"\n\t\tGROUP=Dimension\n\t\t\\tOBJEC
T=Dimension_1\n\t\t\t\tDimensionName=\"Coarse_swath_lines_5
km\"\n\t\t\t\tSize=406\n\t\t\tEND_OBJECT=Dimension_1\n\t\t\tOB
JECT=Dimension_2\n\t\t\t\tDimensionName=\"Coarse_swath_pix
els_5km\"\n\t\t\t\tSize=271\n\t\t\t...
HDF4 Read Testing
•
Tested in libsrc4/tst_interops2.c, which creates
some HDF4 files with the SD API, and then
reads them with netCDF.
•
If –enable-hdf4-file-tests is used with netCDF
configure, some Aura/Terra satellite data files
are downloaded from Unidata FTP site, then
read by libsrc4/tst_interops3.c.
HDF4 Interoperability Limitations
• File must be opened read-only.
• Only HDF4 SD data files are currently
understood.
• This feature cannot be used at the same time as
HDF4's netCDF v2 API, because HDF4 steals
the netCDF v2 API function names. So you must
use –disable-netcdf when building HDF4. (It
might also work to –disable-v2 for the netCDF
build.)
Future HDF4 Work
•
More tests.
•
Support for HDF4 image types.
•
Test support for compressed data.
•
Add some support for HDF-EOS metadata in
the libcf library, using the HDF-EOS toolkit.
Request for User Help – What
Data to Read?
•
Please send me pointers to scientifically
important HDF4 datasets.
•
The intention is not to read any HDF4 data,
just those of wide scientific interest.
Contribute Code to Write HDF4?
•
Some programmers use the netCDF v2 API to
write HDF4 files.
•
It would not be too hard to write the glue code
to allow the v2 API -> HDF4 output from the
netCDF library.
•
The next step would be to allow netCDF v3/v4
API code to write HDF4 files.
•
Writing HDF4 seems like a low priority to our
users. I would be happy to help any user who
would like to undertake this task.
What is HDF5?
•
HDF5 is an extremely general data storage
format with many advanced features: on-thefly compression, parallel I/O, a rich data
model, etc.
•
Starting with netCDF-4.0, netCDF has been
able to use HDF5 as a storage layer, exposing
some of the advanced features.
•
But, until version 4.1, only HDF5 files created
with netCDF-4 could be understood by
netCDF-4.
Why Read HDF5 Files?
•
Many important datasets are available in
HDF5 format, including data from the Aqua
satellite.
Rules for Reading HDF5 Files
•
NetCDF-4.1 provides read-only access to
existing HDF5 files if they do not violate some
rules:
–
Must not use circular group structure.
–
HDF5 reference type (and some other obscure
types) are not understood.
–
Write access still only possible with netCDF4/HDF5 files.
HDF5 Version 1.8 Background
•
In version 1.8, HDF5 introduced “dimension
scales” as a way of supporting shared
dimensions.
•
Also in version 1.8, HDF5 introduced ordering
by creation, rather than ordering
alphabetically.
•
But most data providers don't use these
features, but instead use HDF5 1.6.
NetCDF-4.1 Relaxes Some
Restrictions for HDF5 Files
•
Before netCDF-4.1, HDF5 files had to use
creation ordering and dimension scales in
order to be understood by netCDF-4.
•
Starting with netCDF-4.1, read-only access is
possible to HDF5 files with alphabetical
ordering and no dimension scales. (Created
by HDF5 1.6 perhaps.)
•
HDF5 may have dimension scales for all
dimensions, or for no dimensions (not for just
some of them).
HDF5 C Code to Write HDF5 File
/* Create file. */
if ((fileid = H5Fcreate(FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT,
H5P_DEFAULT)) < 0) ERR;
/* Create the space for the dataset. */
dims[0] = LAT_LEN;
dims[1] = LON_LEN;
if ((pres_spaceid = H5Screate_simple(DIMS_2, dims, dims)) < 0) ERR;
/* Create a variable. It will not have dimension scales. */
if ((pres_datasetid = H5Dcreate(fileid, PRES_NAME,
H5T_NATIVE_FLOAT,
pres_spaceid, H5P_DEFAULT)) < 0) ERR;
if (H5Dclose(pres_datasetid) < 0 || H5Sclose(pres_spaceid) < 0 ||
H5Fclose(fileid) < 0) ERR;
NetCDF C Code to Read HDF5
File
/* Read the data with netCDF. */
if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR;
if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in,
&unlimdim_in)) ERR;
if (ndims_in != 2 || nvars_in != 1 || natts_in != 0 ||
unlimdim_in != -1) ERR;
if (nc_close(ncid)) ERR;
Future Plans for HDF5
Interoperability
• More testing.
• Proper handling of reference types. This will
require (probably) an extension of the netCDF
APIs.
• Better handling of strange group structures, if
this proves necessary to read important data.
Summary
•
With the 4.1 release, the netCDF
C/Fortran/C++ libraries allow read-only access
to some existing HDF4 and HDF5 data
archives.
•
The intention is not to develop a completely
general translation, but instead to focus on
datasets of significance to the Earth science
community.
•
Write capability is quite possible, but we don't
plan on providing it because the demand for
this is low.