GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.
Download
Report
Transcript GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.
GLOBAL
BIODIVERSITY
INFORMATION
FACILITY
Developing taxonomic names services
to enhance findability
David Remsen
ECAT Programme Officer
October 22, 2007
WWW.GBIF.ORG
Presentation
Overview of GBIF and data portal
Informatics challenges relating to taxon
data
What we are doing about it
Wider implications of our efforts
GBIF mission
…to make the world’s biodiversity
data freely and universally available
via the Internet
What is biodiversity?
GBIF follows the broadly outlined CBD
recognition of levels of biological diversity:
• Molecules / genes
• Species
• Ecosystems / ecology
New GBIF data portal
http://data.gbif.org/
GBIF Data Types
Core data
types on
GBIF
network
Taxon names
Taxon occurrence information
Fields used
in indexing
records
specimen records from natural
history collections
observational records
Mandatory
Highly desirable
Scientific name
Geospatial location
Institutional code
Collection date
Collection code
Higher taxon info
Catalogue number
Date last modified
GBIF network today
Users
List species recorded
in Costa Rica
Find all occurrences
of Papilio machaon
Registry
Portal
Mirror
Databases
Find occurrences
of Primates from
Madagascar
Find type specimen
for Coffea odorata
DiGIR
DiGIR
DiGIR
TAPIR
DiGIR
TAPIR
DiGIR
X
X
X
X
X
X
X
Find occurrences from
Antananarivo Province
Index
Mirror
http://data.gbif.org/
Species: Achillea millefolium
Kingdom: Animalia
Country: Madagascar
Dataset: Continuous Plankton
Recorder Database
Maps - occurrence density
Actions
http://data.gbif.org/
Occurrence download
Names and type specimens
Images
occurrence record data
http://data.gbif.org/ws/rest/occurrence
occurrence density data
taxon data
http://data.gbif.org/ws/rest/density
http://data.gbif.org/ws/rest/taxon
GBIF
Data Portal
Web Services
http://data.gbif.org/ws/rest/resource
dataset metadata
http://data.gbif.org/ws/rest/provider
data provider metadata
http://data.gbif.org/ws/rest/network
data network metadata
Web services
http://data.gbif.org/ws/rest/occurrence/list/?taxonConceptKey=14724348&format=darwin
Embedding in other sites
iSpecies
Portal Summary
Hundreds of Institutional providers
Thousands of Resources
Millions of Records
Collective & Integrated Access
Wide Taxonomic, Temporal and Geographic Scope
Free and Open Access to all
Go forth and integrate!
Parallel: GenBank
Informatics challenges relating
to taxon data
The Makings of a problem
Everything I Just Said
Meets
The names problem in biology
All accumulated information of a species is tied to a scientific
name, a name that serves as a link between what has been
learned in the past and what we today add to the body of
knowledge.
- Grimaldi & Engel, 2005, Evolution of the Insects
Nature of the problem?
Access to data is via limited points of entry
Biology has a “names problem.”
This names problem impacts these data
access entry points.
Exacerbated by:
Wide taxonomic, temporal scope
Federated origins of data
Limited points of entry: search
Limited points of entry: browse
Breakdown of the names problem
Synonymy
A single concept may reference multiple names
Equivalent
Inclusive
Homography (Homonymy)
A single name may refer to multiple concepts
Definition
A single name may refer to multiple KINDS of
concepts
“Name” refers to:
A lexical “concept”
Synonyms
A nomenclatural concept
A set of character strings
A Code-regulated fact
A taxonomic concept
Homonyms
A Hypothesis or opinion
All of these are important to distinguish
Synonyms: Equivalence
Lexical/Orthographic
Informed by: Nomenclators, Taxonomies, Algorithm
Nomenclatural
Informed by: Nomenclators, Monographs (with
interpretation)
Taxonomic
Informed by: Monographs, Floras, Faunas, derived
checklists
Different classes of equivalence are addressed by different
resources
Lexical synonym: A single concept may reference
multiple names
ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam
IPNI Gerardia paupercula var borealis (Pennell) Deam
MOBOT Gerardia paupercula Britt. var borealis Deam
Informed by: Nomenclators, Taxonomies, Algorithm
Identifies the preferred lexigraphy of the name
Automates the grouping of lexical variation
Orthographic synonym: A single concept may
reference multiple names
Loligo pealeii
Loligo pealei
Loligo pealii
Loligo plei
Informed by: Nomenclators, Taxonomies, Algorithm
Vernacular synonym: A single concept may
reference multiple names
Nomenclatural synonym: A single concept may
reference multiple names
Nomenclatural synonym: A single concept may
reference multiple names
ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam
IPNI Gerardia paupercula var borealis (Pennell) Deam
MOBOT Gerardia paupercula Britt. var borealis Deam
MOBOT Agalinis paupercula (Gray) Britton var. borealis Pennell (Zenkert 1934)
OHIO DNR Agalinis paupercula (Gray) Britt. var. borealis Pennell
IPNI Agalinis paupercula Britton var. borealis Pennell
ITIS Agalinis paupercula var. borealis Pennell
Informed by: Nomenclators and generally NOT by taxonomy
Taxonomic synonym: A single concept may
reference multiple names (or it may not)
Informed by: Taxonomic Sources
Synthesized synonymy: A bit of everything
Informed by: Algorithm, Nomenclators,Taxonomic Sources
Another example
Aedes calopus | Stegomyia Aegypti | Culex aegypti
Synonymy: Inclusive
Classifications
Catalogue of Life Integrated Classification
Annotated Checklist of the Neuroptera - Mansell 2006
NCBI Taxonomy
Cladograms, Phylograms
Regional lists
Phylogenetic representations
Cetacea of the Hebrides
Flora of China
Thematic Lists
2006 IUCN RedList of Threatened and Endangered Species
WoRMS/OBIS Marine Taxa
100 of the World’s Worst Invasive Alien Species (in GISIN)
Implications for data retrieval
Frost 2005 AMNH
• Notopthalmus viridescens
•Triturus viridescens
• Notopthalmus viridescens
• Notophthalmus viridescens
• Notophthalma viridescens
• Diemyctylus viridescens
• Triton viridescens
• Molge viridescens
• Diemyctylus minatus viridescens
• Triturus viridescens dorsalis
• Diemyctylus viridescens dorsalis
• Notophthalmus viridescens
dorsalis
•… 24 others
Dolbe 2004
• Notopthalmus viridescens viridescens
•Triturus viridescens
• Notopthalmus viridescens
• Notophthalmus viridescens
• Notophthalma viridescens
• Diemyctylus viridescens
• Triton viridescens
• Molge viridescens
• Notophthalmus viridescens dorsalis
• Triturus viridescens dorsalis
• Diemyctylus viridescens dorsalis
• Notophthalmus viridescens
louisianensis
Breakdown of the names problem
Homography (Homonymy)
A single name may refer to multiple concepts
Homonyms & Disambiguation
Homographs
Homonym
Virginia (the state) & Virginia Baird & Girard 1853 (the genus)
Tumor (cancer) & Tumor Huang in Huang Dawei 1990
Informed by: Algorithms/Lexicons (word sense disambiguation)
Agathis montana (the conifer) & Agathis montana (the wasp)
Wagneria Meladze 1967 & Wagneria Heilprin 1887 & 12 other Wagneria
Informed by: Nomenclators and Taxonomists
Nomenclators establish the factual basis of homonyms and partial
disambiguation method
Taxonomy provides a disambiguation method
Taxon Concept (Polysemes)
Gorilla gorilla Wilson and Reeder 1992 vs Gorilla gorilla Groves 2003
Informed by: Taxonomic opinion via monographs, floras, faunas, derived lists
Take home message
The names problem is inherent to all taxon data
We need a Global Taxonomic Resource
Needs to treat all names
Support multiple taxonomic opinion
Depends on many different source data
The Informatics sum is more than the content parts
Can only work in a federated enviroments
Requires communal
exchange data standards
communications protocols
What GBIF is doing about the
names problem
Current GBIF Taxonomic Infrastructure
(ECAT)
Catalogue of Life
International Plant Names Index (IPNI)
Index Fungorum
Is not enough
EXPAND to Global Taxonomic Infrastructure
Mobilize wide array of “checklist resources”
Promote the use of nomenclatoral GUIDS in
all taxonomic checklists
Enable synthesis of resources
Enable informatics web services
Address Synonymy
Wider access to, and explicit classing of synonyms
Access to multiple lexical grouping algorithms
Access to, and support of, development of
nomenclators
Promote the use of nomenclatoral GUIDS in all
taxonomic checklists
More Taxonomic, Regional, Thematic checklists
Comprehensive Vernacular Names catalogue
Address Homography, Polysemy
Rapid cataloguing of homography
Access to multiple lexical grouping algorithms
Catalogue and classify all genera
Development of multiple disambiguation methods
Standardized representation of taxon concepts
Development of taxon concept comparators
Explicit assertions of concept relations
All Genus Index
Wider Implications of our
efforts
GBIF and Phyloinformatics
As a consumer of taxonomic data resources
As a consumer of name services
As a provider of taxonomic metadata
Increased interoperability
How to contact GBIF:
Web site:
www.gbif.org
Data portal: www.gbif.net
GBIF Secretariat
Universitetsparken 15
2100 Copenhagen
Denmark
E-mail: [email protected]
Phone: +45 3532 1470
Fax:
+45 3532 1480
GBIF Secretariat building, supported by a grant from
the Aage V. Jensens Fonde