Central Topic

Download Report

Transcript Central Topic

Ensuring Long Term Access to
Remotely Sensed HDF4 Data
with Layout Maps
Mike Folks, The HDF Group
Ruth Duerr, NSIDC
1
Background and basic
concept
2
I’m Plastic Man!
HDF4 is
EXTENSIBLE
FLEXIBLE
SELFDESCRIBING
3
But
There’s a cost…
4
Complexity!
5
6
7
How do we save HDF users
from having to deal with all of
the complexity under the
hood?
8
Through the HDF software
libraries, either by using the
HDF APIs directly or by using
HDF tools that depend on the
HDF libraries.
But what about the future…
9
• There is a risk in depending solely on the HDF
libraries to access HDF-formatted data over the
long term.
• It is possible, especially in the distant future, that
the libraries may not be available.
10
Really smart people and software?
Maybe future
data users and
their computers
will be so smart
that the HDF4
format will be a
piece of cake.
11
Maybe not.
12
We need an “easy” button
13
“If only we could read HDF data with an
independent program that does not rely on
the HDF API…
A possible approach [would be to] extend
hdfls to print a hierarchical map of a data file,
[and] write ncdump/hdp-like utilities to find,
assemble and write out SDSes and vdatas.”
“Leveraging HDF Utilities”
Christopher Lynnes
HDF Workshop X.
14
The project
15
HDF4 mapping
• Problem
 The complex internal byte layout of HDF files
requires one to use the API to access HDF data.
 This makes long-term readability of HDF data
dependent on long-term allocation of resources to
support HDF software.
• Proposed solution
 Create a map of the layout of data objects in an
HDF file, allowing a simple reader to be written to
access the data.
16
HDF4 mapping project activities
1. Assess and categorize HDF4 data held by NASA
 To determine what types of objects to map.
 To get an idea of the magnitude of the project.
2. Develop prototype for proof of concept
 Develop markup-language based layout
specification.
 Develop tool to produce layout for an HDF4 file.
 Develop and test two independent tools to read
HDF4 data based solely on the map files.
17
Project activities (continued)
3. Assess results and plan next steps
 Present results and options for proceeding to the
community.
 Assess the likely usefulness of this approach, as
well as any desirable modifications
 Evaluate the effort required for a full solution that
best meets community needs
 Submit a proposal for the work needed to provide
a full solution
18
1. Assess and categorize
19
How many NASA HDF4 products?
Data Center
ASF
HDF4 Products
0
GES-DISC
236
GHRC
54
ASDC
63
LP-DAAC
67
NSIDC
47
ORNL-DAAC
2
PO.DAAC
22
SDAC
0
MrDC
95
Total
586
20
Data characteristics
Product Characteristics Examined
•
Product Identification




•
•
HDF-EOS version
For point data
•
•


•
Number of swaths
Maximum number of dimensions
Organized by time, space, both, or other
Whether dimension maps were used
For gridded data
•
•
•
•
Number of grids
Max number of dimensions in a grid
Number of projections used
Whether any grids were indexed
HDF Version
•
Number of 8-bit rasters
Number of 24-bit rasters
Number of general rasters
Whether any rasters had attributes
Whether any rasters were compressed
Whether any rasters were chunked
Whether there were any palettes
For SDS data







Number of point data sets
Maximum number of levels
For swath data
•
•
•
•
For raster data







Product Name
Data Level
Archive Location
Product Version
Whether the product was multi-file
For HDF-EOS products


•
•
Number of SDSs
Maximum number of dimensions
Did any SDS have attributes
Was any SDS annotated
Were dimension scales used
Was compression used and if so what kind
Was chunking used
For Vdata





Number of Vdata structures
Did any Vdata have attributes
Did any Vdata fields have attributes
Was compression used and if so what kind
Was chunking used
21
Other results
• Slightly more than half of the HDF4 products are in HDF-EOS 2
format
• Grids are the most common HDF-EOS data structures in use
• No products use a combination of grid, swath, and point data
structures
22
2. Prototype and proof of
concept
23
HDF4 mapping prototype workflow
HDF4 File
“H4.hdf”
hmap
linked with
HDF4 library
HDF4 Mapping File
(XML document)
“H4.hdf.map.xml”
Groups, Data Objects,
Structural and Application
Metadata;
Locations of Object Data
Object Data
Reader 2
1
(C
(Perl
program)
Script)
24
Proof-of-concept results
• The HDF Group created prototype map
generation software and a draft map
specification
• Map generator was tested on a wide variety of
data products
• GES-DISC and NSIDC independently wrote
software that uses maps to read data files in
NSIDC’s and GES-DISC’s archives
• Summary - the concept is feasible!
25
Example map fragment
<?xml version="1.0" encoding="utf-8"?>
<hdf4:HDFMap xmlns:hdf4="http://www.hdfgroup.org/HDF4/HDF4Map">
<hdf4:RootGroup>
<hdf4:SDS objName="data1" objPath="/" objID="xid-DFTAG_NDG-2">
<hdf4:Attribute name="data range" ntDesc="32-bit signed integer">
0 255
</hdf4:Attribute>
<hdf4:Datatype dtypeClass="INT" dtypeSize="4" byteOrder="BE" />
<hdf4:Dataspace ndims="2">
10 100
</hdf4:Dataspace>
<hdf4:Datablock nblocks="1">
<hdf4:BlockOffset>
2502
</hdf4:BlockOffset>
<hdf4:BlockNbytes>
4000
</hdf4:BlockNbytes>
</hdf4:Datablock>
</hdf4:SDS>
</hdf4:RootGroup>
</hdf4:HDFMap>
26
Next steps
27
Effort for full implementation
• Finalize map file xml specification
 compatibility with existing standards NCML, XFDU,
PREMIS, ESML, DFDL
• Implement production quality mapping tool and API
• Possibly do similar assessment for HDF5 maps.
28
Implementation Processes
• Generate maps for existing archives
 GES-DISC approach: append the map XML to the XML
files already kept for each file in their archive
 NSIDC non-ECS data implementation: add an XML file
for each data file in same directory
 ROM to add capability to NASA ECS systems in
process
 Other NASA systems TBD
• Generate maps for new data
 Add map generation as a step in the ingest process
using stand alone tool
 Request product generation systems to use new API
calls that generate maps
How you can help
• Consider what it might take to implement this for
your archive
• Review the materials on the wiki and elsewhere comment heavily!
 Wiki page added to NASA’s ESDC wiki
 Project page at The HDF Group website:
• http://www.hdfgroup.org/projects/hdf4mapping/
30
Thank you.
This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and Space Administration (NASA)
under NASA Award NNX06AC83A.
Any opinions, findings, and conclusions or
recommendations expressed in this material are
those of the author(s) and do not necessarily
reflect the views of the National Aeronautics and
Space Administration.
31