Transcript slides

A PLFS Plugin for HDF5 for
Improved I/O Performance
and Analysis
Kshitij Mehta1, John Bent2, Aaron Torres3, Gary Grider3,
Edgar Gabriel1
1
University of Houston, Texas
2 EMC Corp.
3 Los Alamos National Lab
DISCS 2012
Talk Outline
●
●
Background
–
HDF5
–
PLFS
Plugin
–
Goals and Design
●
Semantic Analysis
●
Experiments and Results
●
Conclusion
HDF5 – An Overview
●
Hierarchical Data Format
●
Data model, File format, and API
●
Tool for managing complex data
●
Widely used in industry and academia
●
●
User specifies data objects and logical relationship
between them
HDF5 maintains data structures, memory
management, metadata creation, file I/O
HDF5 – An Overview (II)
●
●
●
Parallel HDF5
–
Build with an MPI library
–
File create, dataset create, group create etc. are
collective calls
User can select POSIX I/O, or parallel I/O
using MPI-IO (individual/collective)
File portable between access by sequential,
PHDF5
HDF5 – An Overview (III)
File
PEs
Group
Group
.h5 file
Metadata
D1
D2
D3
D1
D2
D3
HDF5 – An Overview (IV)
●
File is a top level object, collection of objects
●
Dataset is a multi-dimensional array
–
Dataspace
●
●
–
Number of dimensions
Size of each dimension
Datatype
●
●
Native (int, float, etc.)
Compound (~struct)
●
Group is a collection of objects (groups, datasets, attributes)
●
Attributes used to annotate user data
●
Hyperslab selection
–
Specify offset, stride in the dataspace
–
e.g. write selected hyperslab from matrix in memory to selected
hyperslab in dataset in file
HDF5 Virtual Object Layer (VOL)
●
●
●
●
●
Public API
Recently introduced by the
HDF group
New abstraction layer,
intercepts API calls
Forwards calls to object plugin
Object Plugin
Allows third party plugin
development
Data can be stored in any
format
–
netCDF, HDF4 etc.
netCDF
.h5
Opportunities in HDF5
• Preserve semantic information about HDF5 objects
•
Single .h5 file a black box
•
Allows performing post-processing on individual
HDF5 objects
• Improve I/O performance on certain file systems
•
N-1 access often results in sub-optimal I/O performance
on file systems like Lustre
PLFS
• Parallel Log Structured File System developed at
LANL, CMU, EMC
• Middleware positioned between application and
underlying file system
• Transforms N-1 access pattern into N-N
• Processes write to separate files, sufficient
metadata maintained to re-create the original
shared file
• Demonstrated benefits on many parallel file
systems
Goals of the new plugin
• Store data in a new format, different from the native
single file format
•
Preserves semantic information
•
Perform additional analysis and optimizations
• Use PLFS to read/write data objects
•
Tackles performance problem due to N-1 access
Plugin Design
• Implementation for various object functions
• Provide a raw mapping of HDF5 objects to the underlying file
system
• HDF5 file, groups stored as directories
• Datasets as PLFS files
• Attributes as PLFS files stored as dataset_name.attr_name
• Use PLFS API calls in the plugin
• PLFS Xattrs store dataset metadata (datatype, dataspace,..)
• Xattrs provide key-value type access
PLFS Plugin
●
●
Relative path describes relationship between
objects
User still sees the same API
File
File/
Group
Group/
D1
Group
D2
D1
Group/
D2
D3
D3
Semantic Analysis (I)
• Active Analysis
• Application can provide a data parser function
• PLFS applies function on the streaming data
• Function outputs key-value pairs which can be embedded
in extensible metadata
• e.g. recording the height of the largest wave in ocean data
within each physical file
• Quick retrieval of the largest wave, since only need to search
extensible metadata
• Extensible metadata can be stored on burst buffers for faster
retrieval
Active Analysis (II)
FS
PE
data
PLFS
Parser
Output
Parser
Burst Buffer
Semantic Analysis (II)
• Semantic Restructuring
• Allows re-organizing data into a new set of PLFS shards
• e.g. assume ocean model stored row-wise
• Column-wise access expensive
• Analysis routine can ask for “column-wise reordering”
• PLFS knows what it means, since it knows the
structure
• Avoids application having to restructure data by
calculating a huge list of logical offsets
Semantic Restructuring (II)
HDF5 Datasets
Restructure
“Re-order wave lengths recorded in October 2012 in column-major (Hour x Day)”
Experiments and Results
●
Lustre FS, 12 OSTs, 1M stripe size
●
HDF5 performance tool “h5perf”
●
Multiple processes write data to multiple datasets in a
file
●
Bandwidth values presented are average of 3 runs
●
1,2,4,8,32,64 PEs
–
●
●
●
4 PEs/node max
10 datasets, minimum total data size 64G
Comparing MPI-IO Lustre, Plugin, AD_PLFS (PLFS
MPI-IO driver)
Individual I/O (non-collective) tests
Write Contiguous
• Aligned transfer size of 1M
• For almost all cases, plugin better than MPI-IO ,
AD_PLFS shows best performance
Write Interleaved
• Unaligned transfer size of (1M + 10 bytes)
• Plugin performance > MPI-IO
Read Performance
• Contiguous reads ( 1M ) and Interleaved reads ( 1M+10 bytes )
• Similar trend as in writes
• MPI-IO < Plugin < AD_PLFS
Conclusion
●
●
●
●
●
New plugin for HDF5 developed using PLFS
API
New output format allows for Semantic
Analysis
Using PLFS improves I/O performance
Tests show plugin performs better than MPI-IO
in most cases, AD_PLFS shows best
performance
Future Work: Use AD_PLFS API calls in the
plugin instead of native PLFS API calls, provide
collective I/O in the plugin
Thank You
Acknowledgements:
• Quincey Koziol, Mohamad Chaarawi – HDF group
• University of Dresden for access to Lustre FS
Questions
• Why not use AD_PLFS on default .h5 file ?
• Changing output format allows for semantic
analysis
• Provides a more object-based storage (DOE fast
forward proposal – EMC, Intel, HDF working
towards an object stack)