Using Specimen Data in Scientific Workflow Environments to

Download Report

Transcript Using Specimen Data in Scientific Workflow Environments to

Using Specimen Data in Scientific
Workflow Environments to Connect
to Metadata Archive and Discovery
Services in Environmental Biology
CJ Grady,
J.H. Beach, A. Stewart, J. Cavner
University of Kansas Biodiversity Institute
Geospatial Metadata
• Describes
– What it is
– What it looks like
– Who assembled it
– When it was collected
– Etc
1960 - 1990
EML
• Ecological Metadata Language
– XML Schema
– Open Source
– Community Driven
– Describes ecological data
• Occurrence Data
• Climate Layers
• Species Ranges
Narratives
• Transformation of metadata into a story that is
appropriate for the intended audience
• Same metadata can be used to create
narrative for:
– Scientists
– Undergrads
– K-12 students
Narrative Example
• DataONE
– Distributed system for:
• Queries
• Data replication
– Initially supports EML
Study of Experiment
Reproducibility
Ellison, Aaron. 2010. Repeatability and transparency in ecological research.
Ecology 90.
There is a Solution!
Process Metadata
• Data about the process used
• Descriptive and prescriptive
• Documents process used to generate data /
metadata
– Quality control
– SDM experiments
– RAD experiments
Capabilities
• Reproducibility
– Actions are documented
• Transparency
– Experiments can be evaluated and validated
• Publishing
– Metadata can be published along with results
What we have done
• EML for all of our Species Distribution
Modeling services
• Simple process metadata
– Documents how an experiment is ran through our
cluster including what versions of software
– Also describes what web services would be called
to execute the experiment again
What we have done
• Clients
– Python library
– VisTrails
– QGIS
What we are doing
• Publishing EML to a repository
• Client extensions
• Extending process metadata
– HTTP message
– XPath
Process Metadata
Extensions
• HTTP Message
– Documents any web resource call over HTTP
• XPath processing
What we will do next
• Use standard APIs to communicate with
DataONE
• Continue to search for standard process
metadata and include it whenever possible
• Contribute process metadata extensions back
to the community
• Add additional conditional analysis elements
to the schema (JSON, etc)
Reproducibility
• Simple process metadata
• EML process metadata extensions
• Lifemapper client EML reader
Transparency
• EML for all service objects
• Descriptive process metadata
Publishing Aid
• Client access to public data / metadata
catalogs
• Publish buttons
Lessons Learned
• Had success starting with narrow, specific,
process steps and generalizing them
– Calls to our web services expanding to any HTTP
call
• Easy to get carried away with all of the
possibilities
Lifemapper funded by:
U.S. National Science Foundation
NSF EPSCoR 0553722
NSF EPSCoR 0919443
EHR/DRL 0918590
BIO/DBI 0851290
OCI/CI-TEAM 0753336
http://www.lifemapper.org
[email protected]