Document 7479644

Download Report

Transcript Document 7479644

North American initiatives
in Ecoinformatics:
Vegbank and SEEK
Robert K. Peet
and
The Ecological Society of America Vegetation Panel
The SEEK development team
Case Studies
Mean Species Richness
Upland
Riparian
(1090 plots) (121 plots)
Native
Exotic
31.12
55.66
0.20
7.98
(268 plots with
exotics)
(110 plots with
exotics)
Kruskal-Wallis: Native Richness Chisq = 353.2, df = 1, P < 0.0001
Exotic Richness Chisq = 127.7, df = 1, P < 0.0001
Type A: “plots that keep on giving”
• Unusual: supply-side driven, influenced little by competition
• Little spatial structure; disturbance important
Mountain riparian communities
High
2.0
(↑ species pool)
Log Richness
1.5
Low
A
B
C
D
1.0
0.5
High
Low
0.0
(↓ competitive exclusion)
-2
-1
0
1
Log Area (m2)
2
3
Traditional Community Ecology
The questions:
• How are communities structured?
• How do taxa interact?
The solutions :
• Simple observations.
• Simple experiments.
The scale:
• Stand or landscape.
Major data types
• Site data: climate, soils, topography, etc.
• Taxon attribute data: identification,
phylogeny, distribution, life-history,
functional attributes, etc.
• Occurrence data: attributes of
individuals (e.g., size, age, growth rate) and
taxa (e.g., cover, biomass) that co-occur at
a site.
EcoInformatics ?
Massive plot data have the potential to create new
disciplines and allow critical syntheses.
• Theoretical community ecology. Which taxa occur
together, and where, and following what rules?
• Remote sensing. What is really on the ground?
• Monitoring. What changes are really taking
place in the vegetation?
• Restoration. What should be our restoration targets?
• Vegetation & species modeling. Where should
we expect species & communities to occur after
environmental changes?
Conclusions?
• Standard data structures
• Standards for data exchange
• Public data archives (functions for
deposit, discovery, withdrawal,
citation, annotation)
• Standards for data archiving
• Standards for reference to
taxonomic data
• Standard software tools
Background
The ESA Vegetation Classification
Panel was established in 1993 with a
mandate to support the emerging
U.S. Vegetation Classification.
I am pleased to acknowledge the
support and cooperation of:
Ecological Society of America
National Center for Ecological
Analysis and Synthesis
Federal Geographic Data Committee
National Biological Information Infrastructure
Gap Analysis Program
National Science Foundation
Guidelines for Vegetation
Classification
The ESA Vegetation Panel and its partners have
collaborated to develop guidelines for the
floristic levels of the classification covering:
• Requirements for vegetation field plots.
• Documentation & description of floristic
types.
• Submission & peer review of proposed types.
• Management, citation, & archiving of
vegetation data.
Overview of online resources
vegbank.org
Stores plots and makes
them publicly accessible
TBA
Allows people to change and
update NVC and plants
natureserve.org
Stores current
communities in the NVC
plants.usda.gov
Stores current plant
taxonomy
VegBank
• The ESA Vegetation Panel is developing a
public archive for vegetation plots known as
VegBank (http://vegbank.org).
• VegBank is expected to function for
vegetation plot data in a manner analogous to
GenBank.
• Primary data will be deposited for reference,
novel synthesis, and reanalysis.
• The database architecture is generalizable to
most types of species co-occurrence data.
http://www.vegbank.org
VegBank data are open access
All data placed in VegBank are available to the
public at no charge (unless the plot contributor
places restrictions to protect location
information for rare and endangered species
or private lands).
Key data can be viewed by a simple web link.
The following link shows information for two
VegBank plots:
http://vegbank.org/get/std/observation/5153,5906
http://vegbank.org/get/std/observation/'VB.Ob.5153.YOSE98M19'
SynTaxon
Locality
Biodiversity
data structure
Community type databases
Observation/Collection
Event
Plot/Inventory databases
Object or specimen
Specimen databases
BioTaxon
Taxonomic databases
Project
Plot
Core elements
of VegBank
Plot
Observation
Taxon / Individual
Observation
Taxon
Interpretation
Plot
Interpretation
Taxon/community interpretation
• Multiple concepts can be linked
simultaneously
• Degree of fit for each can be
indicated
• Subsequent interpretations
supported.
VegBank Interface Tools
• Desktop client (VegBranch) for data
preparation and local use.
• Flexible XML data import supporting
VegBranch (& TurboVeg) formats.
• Flexible data export.
• Easy web access to central archive
VegBranch can be used for converting
legacy data, entering data, and maintaining
a local plot database.
Challenges
•
•
•
•
•
•
Data ownership, intellectual property
rights, & confidentiality
Multiple classifications of organsms
and communities
Multiple plot types (relevés & Hubbell
plots)
Data entry & submission tools
Perfect archiving
Plot and taxon interpretation
The Taxonomic database
challenge:
Standardizing organisms and communities
The problem:
Integration of data potentially representing
different times, places, investigators and
taxonomic standards.
The traditional solution:
A standard list of organisms / communities.
Standardized taxon lists fail
to allow dataset integration
The reasons include:
• The user cannot reconstruct the database as
viewed at an arbitrary time in the past,
• Taxonomic concepts are not defined (just lists),
• Multiple party perspectives on taxonomic
concepts and names cannot be supported or
reconciled.
This is the single largest impediment to large-scale
synthesis in ecology
High-elevation fir trees of
western North America
AZ NM
CO WY MT AB eBC
wBC WA OR
Distribution
Abies lasiocarpa
var. arizonica
Abies lasiocarpa var. lasiocarpa
USDA - ITIS
Abies bifolia
Flora North America
Abies lasiocarpa
Three concepts of
shagbark hickory
Splitting one species into two illustrates the
ambiguity often associated with scientific names.
If you encounter the name “Carya ovata (Miller) K.
Koch” in a database, you cannot be sure which of
two meanings applies.
Carya ovata
Carya carolinae-sept.
(Ashe) Engler & Graebner
(Miller)K. Koch
Carya ovata
sec. Gleason 1952
(Miller)K. Koch
sec. Radford et al. 1968
Six shagbark hickory assertions
Possible taxonomic synonyms are listed together
Names
Carya ovata
Carya carolinae-septentrionalis
Carya ovata v. ovata
Carya ovata v. australis
References
Gleason 1952 Britton & Brown
Radford et al. 1968 Flora Carolinas
Stone 1997 Flora North America
Taxon concepts
(One shagbark)
C. ovata sec Gleason ’52
C. ovata sec FNA ‘97
(Southern shagbark)
C. carolinae-s. sec Radford ‘68
C. ovata v. australis sec FNA ‘97
(Northern shagbark)
C. ovata sec Radford ‘68
C. ovata (v. ovata) sec FNA ‘97
Party Perspective
The Party Perspective on a Concept includes:
•Status – Standard, Nonstandard, Undetermined
• Correlation with other concepts –
Equal, Greater, Lesser, Overlap, Undetermined.
•Start & Stop dates.
VegBank is populating USDA
concepts & relationships
• Reference list:
– USDA PLANTS / ITIS
– 1999, 2005
• Standard treatments
–
–
–
–
Flora North America (8 volumes)
Isley – Legumes
Rollins – Brassicaceae
Selected treatments
Best practices
• When reporting identity of organisms, provide not
only the full scientific name of each kind of
organism, but also the reference that formed the
basis of the taxonomic concept.
• Reference high quality sources for taxon concepts
such as major compendia that provide their own
defined concepts.
• Comprehensive checklists typically lack true
taxonomic circumscriptions, but might be considered
to contain taxonomic concepts sufficient for
documenting organism identity.
• Identifications should include linkage to at least
one concept, but in some cases should be linked to
multiple concepts.
NatureServe provides access to the NVC
and supporting documentation
http://www.natureserve.org/explorer
Simple searches allow information on
communities to be located.
Key descriptive data are available online,
but the classification process is not yet
open to the full scientific community.
Coming soon – direct links to views of
typal and occurrence plots in VegBank
The ESA Panel and VegBank staff are developing
an open peer-review system to allow anyone to
contribute proposed revisions for the NVC.
The results of the peer-review process will be
published in an online journal linked to VegBank
Concluding remarks
• Much of what we are doing with the US
National Vegetation Classification is common
to the vegetation classification enterprise
worldwide, but much is also novel.
• Public plot archives, initially driven by the
classification enterprise, have the potential
to radically change the development of
ecology and biodiversity management in
general.
Highlights from the
Science Environment for
Ecological Knowledge
(SEEK)
What is SEEK?
Science Environment for Ecological Knowledge
Multidisciplinary project to create:
Scientific-workflow system (Kepler)
– Design, reuse, and execute scientific analyses
Distributed data network (EcoGrid)
– Environmental, ecological, and systematics data
KR & Semantic Mediation
– Discover, integrate, and compose hard-to-relate data
and services via ontologies
Taxonomic concept services
– Resolve taxon ambiguities
Collaborators (the SEEK team)
• NCEAS, UNM, SDSC/UCSD, U Kansas
• Vermont, Napier, ASU, UNC
Kepler: Scientific Workflows
• Model the way scientists work with their data now
– Mentally coordinate export and import of data among software
systems
• Workflows emphasize data flow
• Metadata-driven data ingestion
• Output generation includes creating appropriate
metadata
Query EcoGrid
to find data
Archive output to EcoGrid
with workflow metadata
SEEK EcoGrid
• Goal: allow diverse environmental data systems to
interoperate
– Hides complexity of underlying systems using lightweight
interfaces
– Integrate diverse data networks from ecology, biodiversity, and
environmental sciences
• Data systems
– Any system can implement these interfaces
– Prototyping using:
• Metacat, DiGIR, etc.
• Supports multiple metadata
standards
– EML, Darwin Core as foci
EcoGrid client interactions
• Modes of interaction
– Client-server
– Fully distributed
– Peer-to-peer
• EcoGrid Registry
– Node discovery
– Service discovery
• Aggregation services
– Centralized access
– Reliability
– Data preservation
Knowledge Representation
Current Ontologies
–
–
–
–
–
–
–
–
Ecological Concepts, Models, Networks
Measurements
Properties
Statistical Analyses
Time and Space
Taxonomic Identifiers
Units
Symbiosis
Recent Developments
– Biodiversity (measured traits, computation of traits)
– Descriptive Terminology for Plant Communities
– Ontology documentation
…
Data Procurement
Ontologies
SciName
Observation
context
item
property
ObservableItem
EcoProperty
value
parasiteOf
Parasite
Host
SpatialContext
xsd:string
location
inquilinismOf
InquilinismParasite
InquilinismHost
EcoEntity
GeoSpatialRegion
Abundance
property
TaxonID
parasiteOf
GeoCoordPoint
SciName
LatLonPoint
latDeg
xsd:float
lonDeg
xsd:float
genus
UTMPoint
UTMx
UTMy
xsd:float
SWDB
region
xsd:string
inquilinismOf
species
xsd:string
isa
zone
xsd:float xsd:int xsd:int
Aug 29, 2004
prop
role
Host
class
• Expressed in OWL, shown here graphically
• This is a “simple” OWL ontology (in terms of formulas)
SEEK AHM 2004
November, 2004
Results …
SWDB
SEEK AHM 2004
“Find all datasets that contain
abundance measurements of
‘Manica bradleyi’ inter-ant
parasites observed within
California”
Aug 29, 2004
November, 2004
SEEK High-Level Approach
User’s Taxonomic
concept + quality
measure
Semantic Mediation System
Concept matching/expansion/…
Weighted concepts
Return list
of Data Sets
Name/Concept
Repository
Taxon
coverage
EML
repository
Ecological metadata language
- EML (Containing Collector’s
Taxonomy transfer schema
- TML
Taxonomic concept(s))
Ecological
Data Set
Data Set
Data Set
Data Set
Ecological data set providers
Concept
Provider 1
e.g. Fishbase
Concept
Provider 2
e.g. ITIS
Concept
Provider 3
e.g. Prometheus
Taxonomic concept providers
Acknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154,
9904777, 0131178, 9905838, 0129792, and 0225676.
Collaborators: NCEAS (UC Santa Barbara), University of New Mexico
(Long Term Ecological Research Network Office), San Diego
Supercomputer Center, University of Kansas (Center for Biodiversity
Research), University of Vermont, University of North Carolina, Napier
University, Arizona State University, UC Davis
The National Center for Ecological Analysis and Synthesis, a Center
funded by NSF (Grant Number 0072909), the University of California,
and the UC Santa Barbara campus.
The Andrew W. Mellon Foundation.
Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON
Conclusions?
• Standard data structures
• Standards for data exchange
• Public data archives (functions for
deposit, discovery, withdrawal,
citation, annotation)
• Standards for data archiving
• Standards for reference to
taxonomic data
• Standard software tools