Transcript Document
Strategies for Adding EML
Support to the GCE Data Toolbox
for Matlab
Wade Sheldon
Georgia Coastal Ecosystems LTER
(WWW: gce-lter.marsci.uga.edu/lter)
Background
Needed
universal solution for processing
tabular data sets (majority of IM work)
Goals:
Import from various data sources
Standardize units, date formats, attribute names
Assign metadata descriptors
Validate/QAQC
Generate statistical summaries, plots, maps
Export to various data/metadata formats
Support sub-setting & queries, super-setting (unions/joins)
Support automation of all steps
Automatically capture metadata throughout interactive processing
Background
Developed Matlab data structure specification for storing
data table tightly coupled with metadata
Developed ‘Toolbox’ (function library) for working with
data structures
Many roles in GCE IS:
Primary tool for acquisition, QAQC of data from monitoring
network, PI submissions
Data/metadata packaging (linked to RDMS)
Data distribution (flexible formats)
New Role: Automated harvesting/processing/QC/web posting of
remote data stores (USGS, NOAA) and post-processing of CSI
arrays downloaded via modem
Began public distribution of toolbox in 2002 (primarily for
end-user analysis of GCE data)
Toolbox Metadata Standard
Full
implementation of FLED (+ userextensible content)
Attribute-level metadata managed with data
General documentation descriptors stored in simple
array format (Category, Field, Value) – designed for
pre-formatted metadata, but parseable/updateable
Simple
user-editable style definition tables
used to produce formatted ASCII metadata
EML Differences
Higher granularity
Hierarchical structure (vs flatter 3-tier)
Different delineation of semantic/numerical
attribute descriptors (much overlap, but different
philosophy)
New unit dictionary requirements for validation
contrary to units/unit conversion conventions (at
odds with non-IM end-user focus of toolbox)
XML-based (requires extra steps for presentation)
Strategy
Short term: develop XSLT to convert EML
(primarily dataset, entity, attribute) to ASCII
headers for importing metadata along with data
Medium term: switch to EML-oriented metadata
schema (e.g. use similar arrays, but support direct
eml schema mapping by using xpath syntax for
category/field info)
Long term: add support for direct caching of EML
docs, include native xml routines for syncing
metadata during processing (requires more users
adopt latest Matlab version - R13)
Significance
Allow IM community take full advantage of these
tools/capabilities for their own site’s data with minimal remastering (EML + ASCII/Matlab table)
Allow LTER IM community to showcase researchoriented, metadata-driven tools to bolster support for EML
efforts immediately
If full EML support achieved, could become a useful
mechanism for automatically producing EMLdocumented/validated data sets (datalogging -> harvest ->
process -> QC -> dataset+EML -> validation)