Transcript Document

Strategies for Adding EML
Support to the GCE Data Toolbox
for Matlab
Wade Sheldon
Georgia Coastal Ecosystems LTER
(WWW: gce-lter.marsci.uga.edu/lter)
Background
 Needed
universal solution for processing
tabular data sets (majority of IM work)
 Goals:









Import from various data sources
Standardize units, date formats, attribute names
Assign metadata descriptors
Validate/QAQC
Generate statistical summaries, plots, maps
Export to various data/metadata formats
Support sub-setting & queries, super-setting (unions/joins)
Support automation of all steps
Automatically capture metadata throughout interactive processing
Background



Developed Matlab data structure specification for storing
data table tightly coupled with metadata
Developed ‘Toolbox’ (function library) for working with
data structures
Many roles in GCE IS:





Primary tool for acquisition, QAQC of data from monitoring
network, PI submissions
Data/metadata packaging (linked to RDMS)
Data distribution (flexible formats)
New Role: Automated harvesting/processing/QC/web posting of
remote data stores (USGS, NOAA) and post-processing of CSI
arrays downloaded via modem
Began public distribution of toolbox in 2002 (primarily for
end-user analysis of GCE data)
Toolbox Metadata Standard
 Full
implementation of FLED (+ userextensible content)


Attribute-level metadata managed with data
General documentation descriptors stored in simple
array format (Category, Field, Value) – designed for
pre-formatted metadata, but parseable/updateable
 Simple
user-editable style definition tables
used to produce formatted ASCII metadata
EML Differences





Higher granularity
Hierarchical structure (vs flatter 3-tier)
Different delineation of semantic/numerical
attribute descriptors (much overlap, but different
philosophy)
New unit dictionary requirements for validation
contrary to units/unit conversion conventions (at
odds with non-IM end-user focus of toolbox)
XML-based (requires extra steps for presentation)
Strategy

Short term: develop XSLT to convert EML
(primarily dataset, entity, attribute) to ASCII
headers for importing metadata along with data
 Medium term: switch to EML-oriented metadata
schema (e.g. use similar arrays, but support direct
eml schema mapping by using xpath syntax for
category/field info)
 Long term: add support for direct caching of EML
docs, include native xml routines for syncing
metadata during processing (requires more users
adopt latest Matlab version - R13)
Significance



Allow IM community take full advantage of these
tools/capabilities for their own site’s data with minimal remastering (EML + ASCII/Matlab table)
Allow LTER IM community to showcase researchoriented, metadata-driven tools to bolster support for EML
efforts immediately
If full EML support achieved, could become a useful
mechanism for automatically producing EMLdocumented/validated data sets (datalogging -> harvest ->
process -> QC -> dataset+EML -> validation)