No Slide Title

Download Report

Transcript No Slide Title

HDF
HDF/HDF-EOS Workshop III
Sept. 14-16, 1999
Mike Folk, HDF Group
http://hdf.ncsa.uiuc.edu/
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
NCSA/Univ of Illinois at Urbana-Champaign
HDF
1
Topics
I. Overview
II. NCSA HDF Activities
III. HDF5
IV. HDF4 vs. HDF5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
2
I. HDF Overview
NCSA/Univ of Illinois at Urbana-Champaign
HDF
3
HDF Mission
To develop, promote, deploy, and support
open and free technologies that facilitate
scientific data storage, exchange, access,
analysis and discovery.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
4
What is HDF?
• Scientific data file format & supporting software
• For images, arrays, tables, other structures
• Features
– Portability across architectures
• I/O library
• Files
– Efficient I/O
– Efficient storage
NCSA/Univ of Illinois at Urbana-Champaign
HDF
5
Why use HDF?
•
•
•
•
•
•
Manage data
Share data
Use software that understands HDF
Improve I/O performance
Improve storage efficiency
Use an open standard
NCSA/Univ of Illinois at Urbana-Champaign
HDF
6
An HDF File: A Collection of Scientific
Data Objects
HDF file containing four 3-D arrays
NCSA/Univ of Illinois at Urbana-Champaign
HDF
7
Mixing HDF Objects in One File
3-D array
group
Raster image
palette
HDF file
Raster
image
3-D array
Lat lon temp
---- ---- ----12
23
3.1
15
24
4.2
17
21
3.6
16
35
5.7
Table
NCSA/Univ of Illinois at Urbana-Champaign
HDF
8
HDF Software
Utilities and applications for
manipulating, viewing, and
analyzing data.
General Applications
Application
Programming
Interfaces
Low-level
Interface
HDF
file
}
HDF I/O library
– High-level, object-specific APIs.
– Low-level API for I/O to files, etc.
File or other data source.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
9
HDF Applications Software
• Free software
– NCSA HDF library and utilities
– Other software
• Commercial/other software that “understands”
– all of HDF (Noesys, IDL, HDF Explorer)
– certain HDF objects (MATLAB, WebWinds)
– certain HDF applications (SHARP, WIM)
• http://hdf.ncsa.uiuc.edu/tools.html
NCSA/Univ of Illinois at Urbana-Champaign
HDF
10
What platforms does HDF run on?
•
•
•
•
•
Sun: Solaris
SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E
HP9000, HP-Convex Exemplar
IBM: RS6000, SP2
DEC: Alpha/Digital UNIX, OpenVMS
VAX: OpenVMS
• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98
• PowerPC: Mac-OS
NCSA/Univ
University of Illinois at Urbana-Champaign
HDF
11
A Sampling of HDF Users
NCSA-affiliated Science teams
Visualization, data exch, fast I/O, ...
Mathworks, Fortner Software,
Research Systems Inc., etc.
Format supported by vendors of vis
and data analysis software
Boeing
Space-time change detection in images
Distributed Oceanographic Data
System (DODS)
Remote access to earth science data
Army Research Lab
Network distributed global memory
Center for Analysis & Prediction
of Storms
Fast parallel I/O, portability,
multi-resolution grids
TRAPPIST
(Euro consortium)
Exchange, analysis & visualization of
non-destructive testing data
NCSA/Univ of Illinois at Urbana-Champaign
HDF
12
Major User #1: EOSDIS
• ESDIS Project
– open standard exchange format and I/O library for EOSDIS
– EOS applications
• HDF requirements
–
–
–
–
–
Earth science data types (HDF-EOS, etc,)
User support for scientists, data producers, etc.
Library and file structure improvements
HDF tools, utilities, access software
Software maintenance and QA
NCSA/Univ of Illinois at Urbana-Champaign
HDF
13
Major User #2: ASCI
• ASCI Data Models and Formats (DMF) Group
– open standard exchange format and I/O library for ASCI
– DOE tri-lab ASCI applications
• HDF requirements
–
–
–
–
large datasets (> a terabyte)
ASCI data types, especially meshes
good performance in massive parallel environments
primarily HDF 5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
14
II. NCSA HDF Activities
NCSA/Univ of Illinois at Urbana-Champaign
HDF
15
Java applications
• HDF APIs
– Basis for tools that access HDF
• HDF Viewers
– HDF browser/visualizer
• HDF4 Data Server Prototype
– Lessons learned about remote access to
NCSA/Univ of Illinois at Urbana-Champaign
HDF
16
Remote Data Access
• The SDB: Web-based Server-side Data
Browser
• Java for remote access
• WP-ESIP: DODS project
• Computational Grids (Globus/GASS)
NCSA/Univ of Illinois at Urbana-Champaign
HDF
17
HDF Standardization
• To share files, users must organize them similarly.
• HDF user groups create standard profiles
– Ways to organize data in HDF files.
– Metadata
– API
• Examples: HDF-EOS, ASCI DMF
NCSA/Univ of Illinois at Urbana-Champaign
HDF
18
HDF-EOS software layers
HDF-EOS Applications
HDF-EOS
profiles
General Applications
HDF-EOS
API
Application
Programming
Interfaces
Low-level
Interface
HDF
file
NCSA/Univ of Illinois at Urbana-Champaign
HDF
19
“HDF Configuration Record” (HCR)
• To simplify the tasks of defining, comparing,
and producing HDF-EOS files
• Formal (ODL) descriptions of HDF-EOS
objects
NCSA/Univ of Illinois at Urbana-Champaign
HDF
20
HCR of Swath
/* Project XYZ */
/* First version defined on June 10th, 1998 */
OBJECT = SWATH
NAME = SCAN1
OBJECT = Dimension
NAME = GeoTrack
Size = 1200
END_OBJECT = Dimension
OBJECT = Dimension
NAME = GeoCrossTrack
Size = 205
END_OBJECT = Dimension
OBJECT = Dimension
NAME = DataX
Size = 2410
END_OBJECT = Dimension
END_OBJECT = SWATH
END
NCSA/Univ of Illinois at Urbana-Champaign
HDF
21
HCR
• HCR Utilities:
– Converters: HCR  HDF-EOS
– Edit HCR and HDF-EOS
– Compare HCR with HDF-EOS file
• Current projects:
– Extend HCR converters to all of HDF4
– Similar work with HDF5
– XML too
NCSA/Univ of Illinois at Urbana-Champaign
HDF
22
III. HDF5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
23
Why HDF5?
• HDF shortcomings
exposed by EOSDIS, ASCI and others...
–
–
–
–
–
Limits on object & file size (<2GB)
Limited number of of objects (<20K)
Rigid data models
I/O performance
Aging software infrastructure (code entropy)
NCSA/Univ of Illinois at Urbana-Champaign
HDF
24
• …new Demands...
– Bigger, faster machines and storage systems
• massive parallelism, parallel file systems
• teraflop speeds, terabyte storage
– Greater complexity
• complex data structures
• complex subsetting
– More emphasis on remote & distributed access
NCSA/Univ of Illinois at Urbana-Champaign
HDF
25
• … and ASCI Requirements
–
–
–
–
Compatibility with vector bundle model
Compatibility with MPI-IO
Ability to transform data between memory & storage
Parallel file systems: PIOFS, HPSS, etc.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
26
New HDF5 Features
• More scalable
– Larger arrays and files
– More objects
• Improved data model
– New datatypes
– Single comprehensive dataset object
• Improved software
– More flexible, robust library
– More flexible API
– More I/O options
NCSA/Univ of Illinois at Urbana-Champaign
HDF
27
HDF5 data model
• Two primary objects
• Dataset
– multidimensional array of elements
– rich variety of datatypes
• group
– directory-like structure
– contains datasets, groups, other objects
NCSA/Univ of Illinois at Urbana-Champaign
HDF
28
Dataset components
• multidimensional array
• header with metadata
–
–
–
–
datatype
dataspace
attributes
storage properties
NCSA/Univ of Illinois at Urbana-Champaign
HDF
29
Simple datatypes
•
•
•
•
•
•
The usual scalars: integer & float
user-defined scalars (e.g. 13-bit integers)
variable length (e.g. strings)
pointers to objects or regions of datasets
enumeration
opaque
NCSA/Univ of Illinois at Urbana-Champaign
HDF
30
Compound datatypes
•
•
•
•
User-defined
Comparable to C structs
Members can be simple or compound types
Members can be multidimensional
NCSA/Univ of Illinois at Urbana-Champaign
HDF
31
Data Spaces
• How data are organized to form a dataset
– rank
– dimensions
• Subsetting during I/O operations
– What subset of data is to be moved
– In-memory organization of data
– In-file organization of data
NCSA/Univ of Illinois at Urbana-Champaign
HDF
32
HDF5 dataset: array of records
int8
int4
int16
Datatype:
float32
Dimensionality: 5 x 3
Record
3
5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
33
Dataspaces
Reading Dataset into Memory from File
File
Memory
2D array of integers
3D array of floats
Read
NCSA/Univ of Illinois at Urbana-Champaign
HDF
34
Selection: Examples of mappings between file selections
and memory selections.
(a) A hyperslab from a 2D array to the
corner of a smaller 2D array
(c) A sequence of points from a 2D array to
a sequence of points in a 3D array.
(b) A regular series of blocks from a 2D
array to a contiguous sequence at a
certain offset in a 1D array
(d) Union of slabs in file to union of slabs in
memory. No. of elements must be equal.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
35
Attributes
• Named pieces of data
• Stored in a dataset or group header
• Operations are scaleddown versions of the
dataset operations
– Not extendible
– No compression
– No partial I/O
NCSA/Univ of Illinois at Urbana-Champaign
HDF
36
Property list
• Properties of objects or operations
• Describe how to create, store, access and
transfer data
NCSA/Univ of Illinois at Urbana-Champaign
HDF
37
Some Properties
Better subsetting
access time;
extendable
• chunked
Improves storage
efficiency,
transmission speed
• compressed
Datasets can be
extended in any
direction
• extendable
• split file
Dataset “Fred”
File A
HDF
Metadata for Fred
File B
Metadata in one file,
raw data in another.
Data for Fred
NCSA/Univ of Illinois at Urbana-Champaign
38
Dataset components
Dataset
Metadata
Data
Attributes
time = 32.4
pressure = 987
temp = 56
Dataspace
Datatype
Dim_3=2
Rank=2
Dim_2=4
Dim_1=5
int16
Storage properties
Chunked; compressed
NCSA/Univ of Illinois at Urbana-Champaign
HDF
39
Groups
•
•
•
•
•
Structures for organizing the file
Like Vgroups in HDF4
Like directories in hierarchical file system
Every file starts with a root group
Groups have attributes
NCSA/Univ of Illinois at Urbana-Champaign
HDF
40
Groups
• A mechanism for collections of
related objects
• Every file starts with a
root group
• Can have attributes
• Like directories
in Unix, but a graph,
rather than a tree
“root”
NCSA/Univ of Illinois at Urbana-Champaign
HDF
41
Groups
Groups and members of groups can be shared
root
NCSA/Univ of Illinois at Urbana-Champaign
HDF
42
Mounting
File A
File B
root
root
mount!
NCSA/Univ of Illinois at Urbana-Champaign
HDF
43
Reading & writing with HDF5
• Set properties
• Describe the data
– datatypes
– rank and dimensions
– mapping between file and memory
• Read/write
NCSA/Univ of Illinois at Urbana-Champaign
HDF
44
Files needn’t be files - Virtual File Layer
VFL: A public API for writing I/O drivers
Hid_t
“File” Handle
VFL: Virtual File I/O Layer
stdio
mpio
memory
network I/O drivers
“Storage”
Files
HDF
Memory
Network
NCSA/Univ of Illinois at Urbana-Champaign
45
HDF5 tools
• Current
– hdf5ls - lists contents of HDF5 file
– h5dumper - higher level view
– hdf5hdf4 converter
• Future
–
–
–
–
–
HDF
Convert HDF5  ascii, binary, GIFF, etc
Convert HDF4  HDF5
Java tools - VisAD, etc.
File/code generation from DDL description
Talking to vendors
NCSA/Univ of Illinois at Urbana-Champaign
46
Other HDF5 activities
•
•
•
•
Performance tuning
Object model
Fortran and C++ API
Thread-safe HDF5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
47
IV. HDF4 vs. HDF5
NCSA/Univ of Illinois at Urbana-Champaign
HDF
48
HDF4 vs. HDF5
• HDF4
• HDF5 - successor to HDF4
– Original format and library
– Compatible with all earlier
versions
– 6 primary objects
•
•
•
•
•
multidim array of scalars
raster image, palette
table
annotation
group
– Biggest current user: Earth
Observing System Data and
Info System (EOSDIS)
– New format and library
– Not compatible with earlier
versions
– 2 primary objects
• multidim. array of records
• group
– Biggest current user: Accelerated
Strategic Computing Initiative
(ASCI)
NCSA/Univ of Illinois at Urbana-Champaign
HDF
49
HDF4 object types can be derived from
HDF5 datasets and groups
HDF5 group
HDF5 dataset
HDF4 Vgroup
lat
12
15
17
23
25
lon
23
24
21
35
31
temp
3.1
4.2
3.6
7.2
6.3
HDF4 Vdata
1-dim array
of records
HDF
HDF4 SDS
n-dim array
of scalars
2-dim array of
multi-component
scalars
HDF4
8-bit raster
March 15, 1990.
Simulation with k=10.0,
beta=1.22e3. Calculate
the magnitude ...
03 04 43 43 43
-3 72 44 50 34
45 77 34 23 57
45 67 87 00 45
NCSA/Univ of Illinois at Urbana-Champaign
HDF4
24-bit raster
50
Status of HDF4 vs. HDF5
• HDF4 is still an EOS standard
• HDF5 likely also
• HDF4 maintenance
– Maintained as long as EOS needs it
– Minimal new feature
• New applications: use HDF5 if possible!
– New features, performance improvements, etc.
NCSA/Univ of Illinois at Urbana-Champaign
HDF
51
HDF Information
• HDF Information Center
– http://hdf.ncsa.uiuc.edu/
• HDF Help email address
– [email protected]
• HDF users mailing list
– [email protected]
NCSA/Univ of Illinois at Urbana-Champaign
HDF
52