Introduction to HDF5 Data and Programming Models - HDF-EOS

Download Report

Transcript Introduction to HDF5 Data and Programming Models - HDF-EOS

Introduction to HDF5
HDF & HDF-EOS Workshop XII
October 15, 2008
10/15/08
HDF & HDF-EOS Workshop XII
1
Topics Covered
- Introduce HDF5
- Describe HDF5 Data and Programming Models
- Walk Through Example Code
10/15/08
HDF & HDF-EOS Workshop XII
2
For More Information …
All workshop slides will be available from:
http://hdfeos.org/workshops/ws12/workshop_twelve.php
10/15/08
HDF & HDF-EOS Workshop XII
3
What is HDF5?
HDF = Hierarchical Data Format
• Data model, library and file format for managing
data
• Tools for accessing data in the HDF5 format
10/15/08
HDF & HDF-EOS Workshop XII
4
Brief History of HDF
1987
At NCSA (University of Illinois), a task force formed to create an
architecture-independent format and library:
AEHOO (All Encompassing Hierarchical Object Oriented format)
Became HDF
Early NASA adopted HDF for Earth Observing System project
1990’s
1996
DOE’s ASC (Advanced Simulation and Computing) Project began
collaborating with the HDF group (NCSA) to create “Big HDF”
(Increase in computing power of DOE systems at LLNL, LANL and
Sandia National labs, required bigger, more complex data files).
“Big HDF” became HDF5.
1998
HDF5 was released with support from National Labs, NASA, NCSA
2006
The HDF Group spun off from University of Illinois as non-profit
corporation
10/15/08
HDF & HDF-EOS Workshop XII
5
Why HDF5?
In one sentence ...
10/15/08
HDF & HDF-EOS Workshop XII
6
Answering big questions …
Matter and the universe
Life and nature
August 24, 2001
August 24, 2002
Total Column Ozone (Dobson)
60
385
610
Weather and climate
10/15/08
HDF & HDF-EOS Workshop XII
7
… involves big data …
10/15/08
HDF & HDF-EOS Workshop XII
8
… varied data …
LCI Tutorial
Thanks to Mark Miller, LLNL
10/15/08
HDF & HDF-EOS Workshop XII
9
… and complex relationships …
SNP Score
Contig Summaries
Discrepancies
Contig Qualities
Coverage Depth
Trace
Reads
Aligned bases
Read
quality
Contig
Percent match
10/15/08
HDF & HDF-EOS Workshop XII
10
… on big computers …
… and small computers …
10/15/08
HDF & HDF-EOS Workshop XII
11
How do we…
• Describe our data?
• Read it? Store it? Find it? Share it? Mine it?
• Move it into, out of, and between computers and
repositories?
• Achieve storage and I/O efficiency?
• Give applications and tools easy access our data?
10/15/08
HDF & HDF-EOS Workshop XII
12
Solution: HDF5!
• Can store all kinds of data in a variety of ways
• Runs on most systems
• Lots of tools to access data
• Emphasis on standards (HDF-EOS, CGNS)
• Library and format emphasis on I/O efficiency and
storage
10/15/08
HDF & HDF-EOS Workshop XII
13
Structure of HDF5 Library
Applications
Object API (C, F90, C++, Java)
Library internals
Virtual file I/O
File or other “storage”
10/15/08
HDF & HDF-EOS Workshop XII
14
HDF Tools
- HDFView and Java Products
- Command-line utilities (h5dump, h5ls, h5cc,
h5diff, h5repack)
10/15/08
HDF & HDF-EOS Workshop XII
15
HDF5 Applications & Domains
Examples: Thermonuclear simulations
Product modeling
Data mining tools
Visualization tools
Climate models
Simulation, visualization,
remote sensing…
HDF-EOS
Virtual File Layer
(I/O Drivers)
Stdio
CGNS
ASC
Communities
HDF5 Data Model & API
Split Files
MPI I/O
Custom
Storage
HDF5
format
10/15/08
?
File
Split metadata File on parallel
and raw data files file system
HDF & HDF-EOS Workshop XII
User-defined
device
16
Lots of Layers in HDF5!
“Ogres are like onions.”
Shrek  HDF5 Monster??
Just like Shrek, once you get to
know HDF5 you will really like it!!
10/15/08
HDF & HDF-EOS Workshop XII
17
The HDF5 Format
10/15/08
HDF & HDF-EOS Workshop XII
18
An HDF5 file is a container…
…into
which you
can put
your data
objects.
10/15/08
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
HDF & HDF-EOS Workshop XII
19
HDF5 Structures for Organizing Objects
“/” (root)
“foo”
3-D array
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6
palette
Table
Raster image
Raster image
10/15/08
HDF & HDF-EOS Workshop XII
2-D array
20
HDF5 Data Model
Primary Objects
• Groups
• Datasets
Additional ways to organize and annotate data
• Attributes
• Storage and access properties
Everything else is built from these parts.
10/15/08
HDF & HDF-EOS Workshop XII
21
HDF5 Dataset
Metadata
Data
Dataspace
Rank Dimensions
3
Dim_1 = 4
Dim_2 = 5
Dim_3 = 7
Datatype
Integer
Attributes
Storage Info
Time = 32.4
Chunked
Pressure = 987
Compressed
Temp = 56
10/15/08
HDF & HDF-EOS Workshop XII
22
Dataspaces
Two roles:
• Dataspace contains spatial info about a dataset
stored in a file
• Rank and dimensions
• Permanent part of dataset
definition
Rank = 2
Dimensions = 4x6
• Partial I/0: Dataspace describes application’s data
buffer and data elements participating in I/O
Rank = 1
Dimension = 10
10/15/08
HDF & HDF-EOS Workshop XII
23
Write – from memory to disk
memory
10/15/08
disk
HDF & HDF-EOS Workshop XII
24
Partial I/O
Move just part of a dataset
memory
disk
(a) Slab from a 2D array to the
corner of a smaller 2D array
Elements in each must be same.
10/15/08
disk
memory
(b) Regular series of blocks from a
2D array to a contiguous sequence
at a certain offset in a 1D array
HDF & HDF-EOS Workshop XII
25
Datatypes (array elements)
• Datatype – how to interpret a data element
• Permanent part of the dataset definition
• Two classes: atomic and compound
10/15/08
HDF & HDF-EOS Workshop XII
26
Datatypes
• HDF5 atomic types include:
integer & float
user-definable (e.g., 13-bit integer)
variable length types (e.g., strings)
references to objects/dataset regions
enumeration - names mapped to integers
• HDF5 compound types
Comparable to C structs (“records”)
Members can be atomic or compound types
10/15/08
HDF & HDF-EOS Workshop XII
27
HDF5 dataset: array of records
3
5
Dimensionality: 5 x 3
int8
int4
int16
2x3x2 array of float32
Datatype:
Record
10/15/08
HDF & HDF-EOS Workshop XII
28
Properties
• Properties are characteristics of HDF5 objects
that can be modified
• Default properties handle most needs
• By changing properties can take advantage of the
more powerful features in HDF5
10/15/08
HDF & HDF-EOS Workshop XII
29
Special Storage Properties
Better subsetting
access time;
extensible
chunked
Improves storage
efficiency,
transmission speed
compressed
Arrays can be
extended in any
direction
extensible
File B
Metadata in one file,
raw data in another
Dataset “Fred”
split file
File A
Metadata for Fred
10/15/08
Data for Fred
HDF & HDF-EOS Workshop XII
30
Attributes (optional)
• Attribute – data of the form “name = value”,
attached to an object
• Operations similar to dataset operations, but …
Not extensible
No compression or partial I/O
• Can be overwritten, deleted, added during the
“life” of a dataset
10/15/08
HDF & HDF-EOS Workshop XII
31
HDF5 Dataset (again)
Metadata
Data
Dataspace
Rank Dimensions
3
Dim_1 = 4
Dim_2 = 5
Dim_3 = 7
Datatype
Integer
Attributes
Storage info
Time = 32.4
Chunked
Pressure = 987
Compressed
Temp = 56
10/15/08
HDF & HDF-EOS Workshop XII
32
Groups
• A mechanism for organizing collections
• Every file starts with a root group
A
• Similar to UNIX directories
• Can have attributes
k
10/15/08
HDF & HDF-EOS Workshop XII
“/”
C
B
l m
33
Path to HDF5 Object in a File
“/”
/ (root)
/x
/foo
/foo/temp
/foo/bar/temp
10/15/08
foo
temp
x
bar
temp
HDF & HDF-EOS Workshop XII
34
Shared Objects
“/”
A
P
C
B
R
P
/A/P
/B/R
/C/P
10/15/08
HDF & HDF-EOS Workshop XII
35
Questions So Far?
10/15/08
HDF & HDF-EOS Workshop XII
36
Useful Tools For New Users
h5dump:
Tool to “dump” or display contents of HDF5 files
h5cc, h5c++, h5fc:
Scripts to compile applications
HDFView:
Java browser to view HDF4 and HDF5 files
10/15/08
HDF & HDF-EOS Workshop XII
37
H5dump Command-line Utility To View HDF5 File
h5dump [--header] [-a ] [-d <names>] [-g <names>]
[-l <names>] [-t <names>] [-p] <file>
--header
-a <names>
-d <names>
-g <names>
-l <names>
-t <names>
-p
Display header only; no data is displayed.
Display the specified attribute(s).
Display the specified dataset(s).
Display the specified group(s) and all the members.
Displays the value(s) of the specified soft link(s).
Display the specified named datatype(s).
Display properties.
<names> is one or more appropriate object names.
10/15/08
HDF & HDF-EOS Workshop XII
38
Example of h5dump Output
HDF5 "dset.h5" {
GROUP "/" {
DATASET "dset" {
DATATYPE { H5T_STD_I32BE }
DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) }
DATA {
1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24
“/”
}
}
}
}
10/15/08
HDF & HDF-EOS Workshop XII
39
‘dset’
HDF5 Compile Scripts
• h5cc – HDF5 C compiler command
• h5fc – HDF5 F90 compiler command
• h5c++ – HDF5 C++ compiler command
To compile:
% h5cc h5prog.c
% h5fc h5prog.f90
10/15/08
HDF & HDF-EOS Workshop XII
40
Compile option: -show
-show: displays the compiler commands and options
without executing them
% h5cc –show Sample_c.c
gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API
-DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include
-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
-D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O
-fomit-frame-pointer -finline-functions -c Sample_c.c
gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions
-L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o
-L/home/packages/hdf5_1.6.6/Linux_2.6/lib
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a
/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a
-lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib
10/15/08
HDF & HDF-EOS Workshop XII
41
Browsing HDF5 Files with HDFView
10/15/08
HDF & HDF-EOS Workshop XII
42
HDFView
Structure of File
10/15/08
Contents
of Dataset
HDF & HDF-EOS Workshop XII
43
HDFView File Menu
10/15/08
HDF & HDF-EOS Workshop XII
44
10/15/08
HDF & HDF-EOS Workshop XII
45
Simple HDF5 File in HDFView
Right-click and select
“Open” with mouse
Right-click and select
“Show Properties”
with mouse
10/15/08
HDF & HDF-EOS Workshop XII
46
Simple HDF5 File in HDFView
10/15/08
HDF & HDF-EOS Workshop XII
47
HDF-EOS5 File in HDFView
10/15/08
HDF & HDF-EOS Workshop XII
48
Right-click and select
“Open As” with mouse
10/15/08
HDF & HDF-EOS Workshop XII
49
What you can’t see
with slides:
-Picture displayed instantly
-File size is 906,229,176
10/15/08
HDF & HDF-EOS Workshop XII
50
Introduction to
HDF5 Programming Model
and APIs
10/15/08
HDF & HDF-EOS Workshop XII
51
Operations Supported by the API
• Create objects (groups, datasets, attributes, complex data
types, …)
• Assign storage and I/O properties to objects
• Perform complex subsetting during read/write
• Use variety of I/O “devices” (parallel, remote, etc.)
• Transform data during I/O
• Make inquiries on file and object structure, content,
properties
10/15/08
HDF & HDF-EOS Workshop XII
52
General Programming Paradigm
• Properties of object are optionally defined
Creation properties
Access property lists
• Object is opened or created
• Object is accessed, possibly many times
• Object is closed
10/15/08
HDF & HDF-EOS Workshop XII
53
Order of Operations
• An order is imposed on operations by argument
dependencies
For Example:
A file must be opened before a dataset
-becausethe dataset open call requires a file handle
as an argument.
• Objects can be closed in any order.
10/15/08
HDF & HDF-EOS Workshop XII
54
The General HDF5 API
• Currently C, Fortran 90, Java, and C++ bindings.
• C routines begin with prefix H5?
? is a character corresponding to the type of object
the function acts on
Example Functions:
H5D : Dataset interface
e.g., H5Dread
H5F : File interface
e.g., H5Fopen
H5S : dataSpace interface e.g., H5Sclose
10/15/08
HDF & HDF-EOS Workshop XII
55
HDF5 Defined Types
For portability, the HDF5 library has its own defined
types:
hid_t:
hsize_t:
hssize_t:
object identifiers (native integer)
size used for dimensions (unsigned long or
unsigned long long)
for specifying coordinates and sometimes for
dimensions (signed long or signed long long)
herr_t:
function return value
hvl_t:
variable length datatype
For C, include hdf5.h in your HDF5 application.
10/15/08
HDF & HDF-EOS Workshop XII
56
The HDF5 API
• For flexibility, the API is extensive
Victronix
Swiss Army
Cybertool
34
 300+ functions
• This can be daunting… but there is hope
A few functions can do a lot
Start simple
Build up knowledge as more features are needed
10/15/08
HDF & HDF-EOS Workshop XII
57
Basic Functions
H5Fcreate (H5Fopen)
H5Screate_simple
H5Dcreate (H5Dopen)
H5Dread, H5Dwrite
H5Dclose
H5Sclose
H5Fclose
10/15/08
create (open) File
create dataSpace
create (open) Dataset
access Dataset
close Dataset
close dataSpace
close File
HDF & HDF-EOS Workshop XII
58
Other Common Functions
DataSpaces:
H5Sselect_hyperslab (Partial I/O)
H5Sselect_elements (Partial I/O)
Groups:
H5Gcreate, H5Gopen, H5Gclose
Attributes:
H5Acreate, H5Aopen_name,
H5Aclose, H5Aread, H5Awrite
Property lists:
H5Pcreate, H5Pclose
H5Pset_chunk, H5Pset_deflate
10/15/08
HDF & HDF-EOS Workshop XII
59
High Level APIs
• Included along with the HDF5 library
• Simplify steps for creating, writing, and reading
objects
• Do not entirely ‘wrap’ HDF5 library
10/15/08
HDF & HDF-EOS Workshop XII
60
Example HDF5 Code
10/15/08
HDF & HDF-EOS Workshop XII
61
Steps to Create a File
1. Decide on special properties the file should have
•
•
•
Creation properties, like size of user block
Access properties, such as metadata cache size
Use default properties (H5P_DEFAULT)
2. Create property lists, if necessary
3. Create the file
4. Close the file and the property lists, as needed
10/15/08
HDF & HDF-EOS Workshop XII
62
Code: Create a File
hid_t
herr_t
file_id;
status;
file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose (file_id);
“/” (root)
Note: Return codes not checked for errors in code samples.
10/15/08
HDF & HDF-EOS Workshop XII
63
Dataset Components
Metadata
Data
Dataspace
Rank Dimensions
3
Dim_1 = 4
Dim_2 = 5
Dim_3 = 7
Datatype
Integer
Attributes
Storage info
Time = 32.4
Chunked
Pressure = 987
Compressed
Temp = 56
10/15/08
HDF & HDF-EOS Workshop XII
64
Steps to Create a Dataset
1. Define dataset characteristics
•
•
•
Dataspace - 4x6
Datatype – integer
Properties if needed, or use H5P_DEFAULT
2. Decide where to put it
•
Obtain location ID:
- Group ID puts it in a Group
- File ID puts it in Root Group
“/” (root)
A
3. Create dataset in file
4. Close everything
10/15/08
HDF & HDF-EOS Workshop XII
65
HDF5 Pre-defined Datatype Identifiers
HDF5 defines* set of Datatype Identifiers per HDF5
session.
For example:
C Type
HDF5 File Type
HDF5 Memory Type
int
H5T_STD_I32BE
H5T_STD_I32LE
H5T_NATIVE_INT
float
H5T_IEEE_F32BE
H5T_IEEE_F32LE
H5T_NATIVE_FLOAT
double
H5T_IEEE_F64BE
H5T_IEEE_F64LE
H5T_NATIVE_DOUBLE
* Value of datatype is NOT fixed
10/15/08
HDF & HDF-EOS Workshop XII
66
Pre-defined File Datatype Identifiers
Examples:
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-point
H5T_STD_I32LE Four-byte, little-endian, signed two's
complement integer
Architecture*
Programming
Type
NOTE: What you see in the file. Name is the same everywhere and
explicitly defines a datatype.
*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”
10/15/08
HDF & HDF-EOS Workshop XII
67
Pre-defined Native Datatypes
Examples of predefined native types in C:
H5T_NATIVE_INT
H5T_NATIVE_FLOAT
H5T_NATIVE_UINT
H5T_NATIVE_LONG
H5T_NATIVE_CHAR
(int)
(float )
(unsigned int)
(long )
(char )
NOTE: Memory types.
Different for each machine.
Used for reading/writing.
10/15/08
HDF & HDF-EOS Workshop XII
68
Dataset Creation Property List
Dataset creation property list: information on how to
organize data in storage.
Chunked
Chunked &
compressed
H5P_DEFAULT: contiguous
10/15/08
HDF & HDF-EOS Workshop XII
69
Code: Create a Dataset
1
2
3
hid_t
hsize_t
herr_t
file_id, dataset_id, dataspace_id;
dims[2];
status;
4
file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
Create a dataspace
5
6
7
current dims
dims[0] = 4;
rank
dims[1] = 6;
dataspace_id = H5Screate_simple (2, dims, NULL);
Create a dataset
8
pathname
datatype
dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT);
dataspace
Terminate access to dataset, dataspace, file
property list
(default)
9 status = H5Dclose (dataset_id);
10 status = H5Sclose (dataspace_id);
11 status = H5Fclose (file_id);
10/15/08
HDF & HDF-EOS Workshop XII
70
Example Code - H5Dwrite
Dataset Identifier from
H5Dcreate or H5Dopen
Memory Datatype
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_data);
10/15/08
HDF & HDF-EOS Workshop XII
71
Example Code – H5Dwrite
status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, dset_data);
Data Transfer Property List
(MPI I/O, Transformations, …)
Memory
Dataspace
File
Dataspace
H5S_ALL selects entire
dataspace
10/15/08
HDF & HDF-EOS Workshop XII
72
Partial I/O
Memory Dataspace
H5S_ALL
File Dataspace (disk)
H5S_ALL
Get a Dataspace:
H5Screate_simple
H5Dget_space
Modify Dataspace:
H5Sselect_hyperslab
H5Sselect_elements
10/15/08
HDF & HDF-EOS Workshop XII
73
Example Code – H5Dread
status = H5Dread (dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);
10/15/08
HDF & HDF-EOS Workshop XII
74
High Level APIs: HDF5 Lite (H5LT)
#include "H5LT.h"
…
file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
status = H5LTmake_dataset (file_id,“A", 2, dims,
H5T_STD_I32BE, data);
status = H5Fclose (file_id);
10/15/08
HDF & HDF-EOS Workshop XII
75
High Level APIs
•
•
•
•
•
10/15/08
HDF5 Lite
HDF5 Image
HDF5 Table
HDF5 Dimension Scales
HDF5 Packet Table
HDF & HDF-EOS Workshop XII
76
Example: Create a Group
“/” (root)
A
B
4x6 array of
integers
file.h5
10/15/08
HDF & HDF-EOS Workshop XII
77
Steps to Create a Group
1. Decide where to put it – “root group”
•
Obtain location ID
2. Decide name – “B”
3. Create group in file
4. (Eventually) close the group.
10/15/08
HDF & HDF-EOS Workshop XII
78
Code: Create a Group
hid_t file_id, group_id;
...
/* Open “file.h5” */
file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR,
H5P_DEFAULT);
/* Create group "/B" in file. */
group_id = H5Gcreate (file_id,"B",0);
Size hint for number of
bytes to store names of
objects. 0=default
/* Close group and file. */
status = H5Gclose (group_id);
status = H5Fclose (file_id);
10/15/08
HDF & HDF-EOS Workshop XII
79
Thank you!
This work was supported by the Cooperative Agreement with the
National Aeronautics and Space Administration (NASA) under NASA
grant NNX06AC83A and NNX08A077A. Any opinions, findings,
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of NASA.
10/15/08
HDF & HDF-EOS Workshop XII
80