netCDF - Unidata

Download Report

Transcript netCDF - Unidata

NetCDF
Ed Hartnett
Unidata/UCAR
[email protected]
Unidata
• Unidata - helps universities acquire,
display, and analyze Earth-system data.
• UCAR – University Corporation for
Atmospheric Research - a nonprofit
consortium of 66 universities.
SDSC Presentation, July 2005
•
•
Intro to NetCDF Classic
Intro to NetCDF-4
What is NetCDF?
• A conceptual data model for scientific
data.
• A set of APIs in C, F77, F90, Java, etc. to
create and manipulate data files.
• Some portable binary formats.
• Useful for storing arrays of data and
accompanying metadata.
History of NetCDF
netCDF developed
at Unidata
1988
1991
1996
netCDF 2.0
released
netCDF 4.0
beta released
netCDF 3.0
released
2004
2005
netCDF 3.6.0
released
Getting netCDF
• Download latest release from the netCDF
web page:
http://www.unidata.ucar.edu/content/software/netcdf
• Builds and installs on most platforms with
no configuration necessary.
• For a list platforms netCDF versions have
built on, and the output of building and
testing netCDF, see the web site.
NetCDF Portability
• NetCDF is tested on a wide variety of
platforms, including Linux, AIX, SunOS,
MacOS, IRIX, OSF1, Cygwin, and
Windows.
• We test with native compilers when we
can get them.
• 64-bit builds are supported with some
configuration effort.
What Comes with NetCDF
• NetCDF comes with 4 language APIs: C,
C++, Fortran 77, and Fortran 90.
• Tools ncgen and ncdump.
• Tests.
• Documentation.
NetCDF Java API
• The netCDF Java API is entirely separate
from the C API.
• You don’t need to install the C API for the
Java API to work.
• Java API contains many exciting features,
such as remote access and more
advanced coordinate systems.
Tools to work with NetCDF Data
• The netCDF core library provides basic data
access.
• ncgen and ncdump provide some helpful
command line functionality.
• Many additional tools are available, see:
http://www.unidata.ucar.edu/packages/netcdf/software.html
CDL – Common Data Language
• Grammar defined for displaying
information about netCDF files.
• Can be used to create files without
programming.
• Can be used to create reading program in
Fortran or C.
• Used by ncgen/ncdump utilities.
Example of CDL
netcdf foo { // example netCDF specification in CDL
dimensions:
lat = 10, lon = 5, time = unlimited;
variables:
int lat(lat), lon(lon), time(time);
float z(time,lat,lon), t(time,lat,lon);
double p(time,lat,lon); int rh(time,lat,lon);
lat:units = "degrees_north";
lon:units = "degrees_east";
data:
lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90;
lon = -140, -118, -96, -84, -52;
}
Software Architecture of NetCDF-3
V2 C tests
V2 C API V3 C tests ncgen
F77 tests
ncdump C++ API
F90 API
F77 API
V3 C API
• Fortran, C++ and V2 APIs are all built on the
C API.
• Other language APIs (perl, python, MatLab,
etc.) use the C API.
NetCDF Documentation
• Unidata distributes a NetCDF Users Guide
which describes the data model in detail.
• A language-specific guide is provided for
C, C++, Fortran 77, and Fortran 90 users.
• All documentation can be found at:
http://my.unidata.ucar.edu/content/software/netcdf/docs
NetCDF Jargon
• “Variable” – a multi-dimensional array of
data, of any of 6 types (char, byte, short,
int, float, or double).
• “Dimension” – information about an axis:
it’s name and length.
• “Attribute” – a 1D array of metadata.
More NetCDF Jargon
• “Coordinate Variable” – a 1D variable with
the same name as a dimension, which
stores values for each dimension value.
• “Unlimited Dimension” – a dimension
which has no maximum size. Data can
always be extended along the unlimited
dimension.
The NetCDF Classic Data Model
• The netCDF Classic Data Model contains
dimensions, variables, and attributes.
• At most one dimension may be unlimited.
• The Classic Data Model is embodied by
netCDF versions 1 through 3.6.0
• NetCDF is moving towards a new, richer
data model: the Common Data Model.
NetCDF Example
• Suppose a user wants to store
temperature and pressure values on a 2D
latitude/longitude grid.
• In addition to the data, the user wants to
store information about the lat/lon grid.
• The user may have additional data to
store, for example the units of the data
values.
NetCDF Model Example
Dimensions
Variables
latitude
temperature
longitude
pressure
Attributes
Units: C
Units: mb
Coordinate Variables
latitude
longitude
Important NetCDF Functions
• nc_create and nc_open to create and open files.
• nc_enddef, nc_close.
• nc_def_dim, nc_def_var, nc_put_att_*, to define
dimensions, variables, and attributes.
• nc_inq, nc_inq_var, nc_inq_dim, nc_get_att_* to
learn about dims, vars, and atts.
• nc_put_vara_*, nc_get_vara_* to write and read
data.
C Functions to Define Metadata
/* Create the file. */
if ((retval = nc_create(FILE_NAME, NC_CLOBBER, &ncid)))
return retval;
/* Define the dimensions. */
if ((retval = nc_def_dim(ncid, LAT_NAME, LAT_LEN, &lat_dimid)))
return retval;
if ((retval = nc_def_dim(ncid, LON_NAME, LON_LEN, &lon_dimid)))
return retval;
/* Define the variables. */
dimids[0] = lat_dimid;
dimids[1] = lon_dimid;
if ((retval = nc_def_var(ncid, PRES_NAME, NC_FLOAT, NDIMS, dimids, &pres_varid)))
return retval
if ((retval = nc_def_var(ncid, TEMP_NAME, NC_FLOAT, NDIMS, dimids, &temp_varid)))
return retval;
/* End define mode. */
if ((retval = nc_enddef(ncid)))
return retval;
C Functions to Write Data
/* Write the data. */
if ((retval = nc_put_var_float(ncid, pres_varid, pres_out)))
return retval;
if ((retval = nc_put_var_float(ncid, temp_varid, temp_out)))
return retval;
/* Close the file. */
if ((retval = nc_close(ncid)))
return retval;
C Example – Getting Data
•
•
•
/* Open the file. */
if ((retval = nc_open(FILE_NAME, 0, &ncid)))
return retval;
•
•
•
•
•
/* Read the data. */
if ((retval = nc_get_var_float(ncid, 0, pres_in)))
return retval;
if ((retval = nc_get_var_float(ncid, 1, temp_in)))
return retval;
•
•
•
•
•
/* Do something useful with the data… */
/* Close the file. */
if ((retval = nc_close(ncid)))
return retval;
Data Reading and Writing
Functions
• There are 5 ways to read/write data of each
type.
• var1 – reads/writes a single value.
• var – reads/writes entire variable at once.
• vara – reads/writes an array subset.
• vars – reads/writes an array by slices.
• varm – reads/writes a mapped array.
• Ex.: nc_put_vars_short writes shorts by slices.
Attributes
• Attributes are 1-D arrays of any of the 6
netCDF types.
• Read/write them with functions like:
nc_get_att_float and nc_put_att_int.
• Attributes may be attached to a variable,
or may be global to the file.
NetCDF File Formats
• Starting with 3.6.0, netCDF supports two binary
data formats.
• NetCDF Classic Format is the format that has
been in use for netCDF files from the beginning.
• NetCDF 64-bit Offset Format was introduced in
3.6.0 and allows much larger files.
• Use classic format unless you need the large
files.
NetCDF-3 Summary
• NetCDF is a software library and some
binary data formats, useful for scientific
data, developed at Unidata.
• NetCDF organizes data into variables, with
dimensions and attributes.
• NetCDF has proven to be reliable, simple
to use, and very popular.
Why Add to NetCDF-3?
• Increasingly complex data sets call for
greater organization.
• Size limits, unthinkably huge in 1988, are
routinely reached in 2005.
• Parallel I/O is required for advanced Earth
science applications.
• Interoperability with HDF5.
NetCDF-4
• NetCDF-4 aims to provide the netCDF API
as a front end for HDF5.
• Funded by NASA, executed at Unidata
and NCSA.
• Includes reliable netCDF-3 code, and is
fully backward compatible.
NetCDF-4 Organizations
• Unidata/UCAR
• NCSA – The National Center for
Supercomputing Applications
University of Illinois at Urbana-Champaign
• NASA – NetCDF-4 was funded by NASA
award number AIST-02-0071.
New Features of NetCDF-4
• Multiple unlimited dimensions.
• Groups to organize data.
• New types, including compound types and
variable length arrays.
• Parallel I/O.
The Common Data Model
• NetCDF-4, scheduled for beta-release this
Summer, will conform to the Common
Data Model.
• Developed by John Caron at Unidata, with
the cooperation of HDF, OpenDAP,
netCDF, and other software teams, CDM
unites different models into a common
framework.
• CDM is a superset of the NetCDF Classic
Data Model
The NetCDF-4 Data Model
• NetCDF-4 implements the Common Data Model.
• Adds groups, each group can contain variables,
attributes and dimensions, and groups.
• Dimensions are scoped so that variables in
different groups can share dimensions.
• Compound types allow users to define new
types, comprised of other atomic or user-defined
types.
• New integer and string types.
Software Architecture of NetCDF-4
V2 C tests
F77 tests
V2 C API V3 C tests ncgen
ncdump C++ API
F77 API
V4 C API
V3 C API
F90 API
HDF5
NetCDF-4 Release Status
• Latest alpha release includes all netCDF-4
features – depends on latest HDF5
development snapshot.
• Beta release – due out in August, replaces
artificial netCDF-4 constructs, and
depends on a yet-to-be-released version
of HDF5.
• Promotion from beta to full release will
happen sometime in 2006.
Building NetCDF-4
• NetCDF-4 requires that HDF5 version
1.8.3 be installed. This is not released yet.
• The latest HDF5 development release
works with the latest netCDF alpha
release.
• To build netCDF-4, specify –enable-netcdf4 at configure.
When to Use NetCDF-4 Format
• The new netCDF-4 features (groups, new
types, parallel I/O) are only available for
netCDF-4 format files.
• When you need HDF5 files.
• When portability is less important, until
netCDF-4 becomes widespread.
Versions and Formats
netCDF developed
by Glenn Davis
1988
1991
netCDF 4.0
beta released
netCDF 3.0
released
1996
netCDF 2.0
released
2004
2005
netCDF 3.6.0
released
NetCDF-4 Format
64-Bit Offset Format
Classic Format
NetCDF-4 Feature Review
•
•
•
•
•
•
•
Multiple unlimited dimensions.
How to use groups.
Using compound types.
Other new types.
Variable length arrays.
Parallel I/O.
HDF5 Interoperability.
Multiple Unlimited Dimensions
• Unlimited dimensions are automatically
expanded as new data are written.
• NetCDF-4 allows multiple unlimited
dimensions.
Working with Groups
• Define a group, then use it as a container for
the classic data model.
• Groups can be used to organize sets of data.
An Example of Groups
Model_Run_1
Model_Run_2
history
lat
rh
units
lon
temp
units
history
lat
rh
units
lon
temp
units
Model_Run_1a
history
lat
rh
units
lon
temp
units
New Functions to Use Groups
• Open/create returns ncid of root group.
• Create a new group with nc_def_grp.
nc_def_grp(int parent_ncid, char *name, int *new_ncid);
• Learn about groups with nc_inq_grps.
nc_inq_grps(int ncid, int *numgrps, int *ncids);
C Example Using Groups
if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR;
if (nc_def_grp(ncid, DYNASTY, &tudor_id)) ERR;
if (nc_def_dim(tudor_id, DIM1_NAME,
NC_UNLIMITED, &dimid)) ERR;
if (nc_def_grp(tudor_id, HENRY_VII, &henry_vii_id))
ERR;
if (nc_def_var(henry_vii_id, VAR1_NAME, NC_INT, 1,
&dimid, &varid)) ERR;
if (nc_put_vara_int(henry_vii_id, varid, start, count,
data_out)) ERR;
if (nc_close(ncid)) ERR;
Create Complex Types
• Like C structs, compound types can be
assembled into a user defined type.
• Compound types can be nested – that is,
they can contain other compound types.
• New functions are needed to create new
types.
• V2 API functions are used to read/write
complex types.
C Example of Compound Types
/* Create a file with a compound type. Write a little data. */
if (nc_create(FILE_NAME, NC_NETCDF4, &ncid)) ERR;
if (nc_def_compound(ncid, sizeof(struct s1), SVC_REC, &typeid)) ERR;
if (nc_insert_compound(ncid, typeid, BATTLES_WITH_KLINGONS,
HOFFSET(struct s1, i1), NC_INT)) ERR;
if (nc_insert_compound(ncid, typeid, DATES_WITH_ALIENS,
HOFFSET(struct s1, i2), NC_INT)) ERR;
if (nc_def_dim(ncid, STARDATE, DIM_LEN, &dimid)) ERR;
if (nc_def_var(ncid, SERVICE_RECORD, typeid, 1, dimids, &varid)) ERR;
if (nc_put_var(ncid, varid, data)) ERR;
if (nc_close(ncid)) ERR;
New Ints, Opaque, String Types
• Opaque types are bit-blobs of fixed size.
• String types allow multi-dimensional arrays
of strings.
• New integer types: UBYTE, USHORT,
UINT, UINT64, INT64.
Variable Length Arrays
• Variable length arrays allow the efficient
storage of arrays of variable size.
• For example: an array of soundings of
different number of elements.
Parallel I/O with NetCDF-4
• Must use configure option –enable-parallel when
building netCDF.
• Depends on HDF5 parallel features, which
require MPI.
• Must create or open file with nc_create_par or
nc_open_par.
• All metadata operations are collective.
• Adding a new record is collective.
• Variable reads/writes are independent by
default, but can be changed to do collective
operations.
HDF5 Interoperability
• NetCDF-4 can interoperate with HDF5 with a
SUBSET of HDF5 features.
• Will not work with HDF5 files that have looping
groups, references, and types not found in
netCDF-4.
• HDF5 file must use new dimension scale API to
store shared dimension info.
• If a HDF5 follows the Common Data Model,
NetCDF-4 can interoperate on the same files.
Future Plans for NetCDF
• NetCDF 4.0 release in 2006.
• Beta for next major version of netCDF in
Summer, 2006.
• Full compatibility with Common Data Model.
• Remote access, including remote subsetting of
data.
• XML-based representation of netCDF metadata.
• Full Fortran 90 support, but limited F77 support.
For Further Information
• netCDF mailing list:
[email protected]
• email Ed: [email protected]
• netCDF web site: www.unidata.ucar.edu