A Consistent, Standards-Based Format for NPOESS Data Products

Download Report

Transcript A Consistent, Standards-Based Format for NPOESS Data Products

HDF Lessons from NPOESS &
Future Opportunities
Alan M. Goldberg
HDF Workshop IX,
<[email protected]>
December 2005
NOTICE
This technical data was produced for the U.S. Government under
Contract No. 50-SPNA-9-00010, and is subject to the Rights in
Technical Data - General clause at FAR 52.227-14 (JUN 1987)
Approved for public release, distribution unlimited
© 2005 The MITRE Corporation. All rights reserved
Outline

NPOESS Process & Successes

NPOESS Path Forward

Convergence with other developments

Fundamentals

Future
© 2005 The MITRE Corporation. All rights reserved
Requirements for data products

Deal with complexity
– Large data granules

Order of Gb
– Complex intrinsic data complexity

Advanced sensors produce new challenges
– Multi-platform, multi-sensor, long duration data production
– Many data processing levels and product types

Satisfy operational, archival, and field terminal users
– Multiple users with heritage traditions
© 2005 The MITRE Corporation. All rights reserved
SENSORS
CCSDS (mux, code, frame) & Encrypt
Delivered Raw
Packetization
Compression
Aux.
Sensor
Data
ENVIRONMENTAL
SOURCE
COMPONENTS
RDR
Production
RDR Level
Filtration
A/D Conversion
Detection
Cal.
Source
Comm
Processing
C3S
Comm
Receiver
IDPS
Comm
Xmitter
Data
Store
OTHER
SUBSYSTEMS
SPACE SEGMENT
NPOESS products delivered at multiple
levels
Flux
Manipulation
TDR Level
SDR
Production
SDR Level
EDR
Production
EDR Level
© 2005 The MITRE Corporation. All rights reserved
Sensor product types
Swath-oriented
multispectral
imagery
– CMIS – conical scan
– Imagery EDRs – resampled on
uniform grid
spectra
3-d
swath-oriented grid
– Vertical profile EDRs
2-d
map grid
– Seasonal land products
Abstract
– OMPS SDRs – cross-track
spectra, limb spectra
Image-array
fourier spectra
– CrIS SDR
Directional
lists
– Active fires
– VIIRS – cross-track
whiskbroom
Slit
Point
spectra
– SESS energetic particle sensor
SDR
byte structures
– RDRs
Abstract
bit structures
– Encapsulated ancillary data
Bit
planes
– Quality flage
Associated
arrays (w/ stride?)
– geolocation
© 2005 The MITRE Corporation. All rights reserved
NPOESS product design development
Requirements
- Multi-platform, multisensor, long duration
data production
- Many data
processing levels and
product types
- Satisfy operational,
archival, and field
terminal users
Constraints
- Processing
architecture and
optimization
- Heritage designs
- Contractor style and
practices
- Budget and schedule
Intentions
- Use simple, robust standards
- Use best practices and experience from
previous operational and EOS missions
- Provide robust metadata
- Maximize commonality among products
- Forward-looking, not backward-looking
standardization
Design Process
- Experience
- Trades& Analyses
Result
Resources
- HDF5
- FGDC
- C&F conventions
- Expectation of tools by others
© 2005 The MITRE Corporation. All rights reserved
Essentials of the resulting design
 The
design is the manner in which we combine,
limit, and extend the available resources into an
NPOESS implementation
 Granule is the fundamental unit of tracking,
processing, and access
 Structural hierarchy from Files to Granules to
Arrays
 Files & granules contain both data & metadata
– Collection metadata (“quasi-static”) retained separate
from granule (“dynamic”) metadata
 Profiles
in XML which describe the granule
contents
 Sample data sets have been delivered by
Raytheon
© 2005 The MITRE Corporation. All rights reserved
Resulting design
Advantages

Disadvantages
– Flexible; Extensible;
Allows compression
– Inconsistent with heritage
operational formats (GRIB, BUFR)
– Accessed by API,
not format
– Limited tools
– Arrays can be
addressed either by
granule or by file
– Potentially selfdocumenting
– Handles abstract
data types and large
files
– BLOBs (e.g., raw
data, external files)
can be wrapped
File
File Metadata
Granule
Metadata
Granule
Metadata
Granule
Metadata
Arrays
Arrays
Arrays
Granule
Granule
Granule
© 2005 The MITRE Corporation. All rights reserved
NPOESS future evolution
 Moving
toward the ideal
 NPOESS
has time to evolve during NPP
– Gain experience with data sets, metadata, operational
users, direct users, CLASS ingest
 Use
flexibility of the HDF
– Attributes can be added to fill out metadata needs
 Use
additional HDF data features (e.g., bit planes)
 Use
more complete self-documentation
 Harmonization
with community data description
conventions
 Develop
more user tools
 Possible
benefit from netCDF – HDF convergence
© 2005 The MITRE Corporation. All rights reserved
Lessons & Way Forward
© 2005 The MITRE Corporation. All rights reserved
Observations from development to date




Avoid the temptation to use heritage approaches without
reconsideration, but …
Novel concepts need to be tested
Data concepts, profiles, templates, or best practices should be
defined before coding begins
Use broad, basic standards to the greatest possible extent
– FGDC has flexible definitions, if carefully thought through




Define terms in context; clarity and precision as appropriate
Attempt to predefine data organizations in the past (e.g., HDF-EOS
‘swath’ or HDF4 ‘palette’) have offered limited flexibility. Keep to
simple standards which can be built upon and described well.
Lesson: be humble
It is a great service to future programs if we capture lessons and
evolve the standards
How do we get true estimates of the life-cycle savings for good
design?
© 2005 The MITRE Corporation. All rights reserved
Thoughts on future features for Earth
remote sensing products

Need to more fully integrate product components with HDF
features

Formalize the organization of metadata items which establish the
data structure
– Need mechanism to associate arrays by their independent variables

Formalize the organization of metadata items which establish the
data meaning
– XML is a potential mechanism – can it be well integrated?
– Work needed to understand the advantages and disadvantages
– Climate and Forecast (CF) sets a benchmark

Need a mechanism to encapsulate files in native format
– Case in which HDF is only used to provide consistent access

Need more investment in testing before committing to a design
© 2005 The MITRE Corporation. All rights reserved
Primary and Associated Arrays
Index
Attribute
n-Dimensional
Dependant
Variable (Entity)
Array
Primary Array
e.g., Flux, Brightness,
Counts, NDVI
Associated Array(s)
e.g., QC, Error bars
dimension  n
1-Dimensional Attribute Variables
Index
Attribute
Primary
e.g., UTC time or angle
Additional
e.g., IET time, angle,
or presssure height
Associated
Independent
Variable(s)
Multi-Dimensional Attribute
Variables
2-Dimensional Independent
Variable Array(s)
e.g., lat/lon, XYZ, sun alt/az,
sat alt/az, or land mask
Key concept: Index Attributes organize the primary dependant variables,
or entities. The same Index Attributes maybe used to organize associated
independent variables. Associated independent variables may be used
singly (almost always), in pairs (frequently), or in larger combinations.
Issues going forward - style

Issues with assuring access understanding
– How will applications know which metadata is present?
– Need to define a core set with a default approach

Issues with users
– How to make providers and users comfortable with this or any
standard
– How to communicate the value of: best practices; careful &
flexible design; consistency; beauty of simplicity
– Ease of use as well as ease of creation

Issues with policy
– Helping to meet the letter and intent of the Information Quality
Act

Capturing data product design best practices
– Flexibility vs. consistency vs. ease-of-use for a purpose
© 2005 The MITRE Corporation. All rights reserved
Issues going forward - features

Issues with tools
– Tools are needed to create, validate, and exploit the data sets.


Understand structure and semantics
Issues with collections
– How to implement file and collection metadata, with appropriate
pointers forward and backward
– How to implement quasi-static collection metadata

Issues with HDF
– Processing efficiency (I/O) of compression, of compaction
– Repeated (fixed, not predetermined) metadata items with the
same <tag> not handled
– Archival format
© 2005 The MITRE Corporation. All rights reserved
Possible routes: Should there be an
HDF-GEO?

Specify a profile for the use of HDF in Earth science applications:

Generalized point (list), swath (sensor coordinates), grid
(georeferenced), abstract (raw), and encapsulated (native) profiles.

Generalized approach to associating georeferencing information
with observed information.

Generalized approach to incorporating associated variables with
the mission data

Generalized approach to ‘stride’

Preferred core metadata to assure human and machine readability

Identification metadata in UserBlock

Map appropriate metadata items from HDF native features (e.g.,
array rank and axis sizes)

Preferred approach to data object associations: arrays-of-structs
or structs-of-arrays?

Design guidelines or strict standardization?
© 2005 The MITRE Corporation. All rights reserved
Questions? Discussion?
© 2005 The MITRE Corporation. All rights reserved