GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.

Download Report

Transcript GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.

GLOBAL
BIODIVERSITY
INFORMATION
FACILITY
Developing taxonomic names services
to enhance findability
David Remsen
ECAT Programme Officer
October 22, 2007
WWW.GBIF.ORG
Presentation
Overview of GBIF and data portal
 Informatics challenges relating to taxon
data
 What we are doing about it
 Wider implications of our efforts

GBIF mission
…to make the world’s biodiversity
data freely and universally available
via the Internet
What is biodiversity?
GBIF follows the broadly outlined CBD
recognition of levels of biological diversity:
• Molecules / genes
• Species
• Ecosystems / ecology
New GBIF data portal
http://data.gbif.org/
GBIF Data Types
Core data
types on
GBIF
network

Taxon names

Taxon occurrence information


Fields used
in indexing
records
specimen records from natural
history collections
observational records
Mandatory
Highly desirable

Scientific name

Geospatial location

Institutional code

Collection date

Collection code

Higher taxon info

Catalogue number

Date last modified
GBIF network today
Users
List species recorded
in Costa Rica
Find all occurrences
of Papilio machaon
Registry
Portal
Mirror
Databases
Find occurrences
of Primates from
Madagascar
Find type specimen
for Coffea odorata
DiGIR
DiGIR
DiGIR
TAPIR
DiGIR
TAPIR
DiGIR
X
X
X
X
X
X
X
Find occurrences from
Antananarivo Province
Index
Mirror
http://data.gbif.org/
Species: Achillea millefolium
Kingdom: Animalia
Country: Madagascar
Dataset: Continuous Plankton
Recorder Database
Maps - occurrence density
Actions
http://data.gbif.org/
Occurrence download
Names and type specimens
Images
occurrence record data
http://data.gbif.org/ws/rest/occurrence
occurrence density data
taxon data
http://data.gbif.org/ws/rest/density
http://data.gbif.org/ws/rest/taxon
GBIF
Data Portal
Web Services
http://data.gbif.org/ws/rest/resource
dataset metadata
http://data.gbif.org/ws/rest/provider
data provider metadata
http://data.gbif.org/ws/rest/network
data network metadata
Web services
http://data.gbif.org/ws/rest/occurrence/list/?taxonConceptKey=14724348&format=darwin
Embedding in other sites
iSpecies
Portal Summary

Hundreds of Institutional providers

Thousands of Resources

Millions of Records

Collective & Integrated Access

Wide Taxonomic, Temporal and Geographic Scope

Free and Open Access to all

Go forth and integrate!
Parallel: GenBank
Informatics challenges relating
to taxon data
The Makings of a problem
Everything I Just Said
Meets
The names problem in biology
All accumulated information of a species is tied to a scientific
name, a name that serves as a link between what has been
learned in the past and what we today add to the body of
knowledge.
- Grimaldi & Engel, 2005, Evolution of the Insects
Nature of the problem?
Access to data is via limited points of entry
 Biology has a “names problem.”
 This names problem impacts these data
access entry points.
 Exacerbated by:

Wide taxonomic, temporal scope
 Federated origins of data

Limited points of entry: search
Limited points of entry: browse
Breakdown of the names problem

Synonymy

A single concept may reference multiple names
 Equivalent
 Inclusive

Homography (Homonymy)


A single name may refer to multiple concepts
Definition

A single name may refer to multiple KINDS of
concepts
“Name” refers to:

A lexical “concept”


Synonyms
A nomenclatural concept


A set of character strings
A Code-regulated fact
A taxonomic concept

Homonyms
A Hypothesis or opinion
All of these are important to distinguish
Synonyms: Equivalence



Lexical/Orthographic
 Informed by: Nomenclators, Taxonomies, Algorithm
Nomenclatural
 Informed by: Nomenclators, Monographs (with
interpretation)
Taxonomic
 Informed by: Monographs, Floras, Faunas, derived
checklists
Different classes of equivalence are addressed by different
resources
Lexical synonym: A single concept may reference
multiple names
ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam
IPNI Gerardia paupercula var borealis (Pennell) Deam
MOBOT Gerardia paupercula Britt. var borealis Deam
Informed by: Nomenclators, Taxonomies, Algorithm
Identifies the preferred lexigraphy of the name
Automates the grouping of lexical variation
Orthographic synonym: A single concept may
reference multiple names
Loligo pealeii
Loligo pealei
Loligo pealii
Loligo plei
Informed by: Nomenclators, Taxonomies, Algorithm
Vernacular synonym: A single concept may
reference multiple names
Nomenclatural synonym: A single concept may
reference multiple names
Nomenclatural synonym: A single concept may
reference multiple names
ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam
IPNI Gerardia paupercula var borealis (Pennell) Deam
MOBOT Gerardia paupercula Britt. var borealis Deam
MOBOT Agalinis paupercula (Gray) Britton var. borealis Pennell (Zenkert 1934)
OHIO DNR Agalinis paupercula (Gray) Britt. var. borealis Pennell
IPNI Agalinis paupercula Britton var. borealis Pennell
ITIS Agalinis paupercula var. borealis Pennell
Informed by: Nomenclators and generally NOT by taxonomy
Taxonomic synonym: A single concept may
reference multiple names (or it may not)
Informed by: Taxonomic Sources
Synthesized synonymy: A bit of everything
Informed by: Algorithm, Nomenclators,Taxonomic Sources
Another example
Aedes calopus | Stegomyia Aegypti | Culex aegypti
Synonymy: Inclusive

Classifications
 Catalogue of Life Integrated Classification
 Annotated Checklist of the Neuroptera - Mansell 2006
 NCBI Taxonomy

Cladograms, Phylograms


Regional lists



Phylogenetic representations
Cetacea of the Hebrides
Flora of China
Thematic Lists



2006 IUCN RedList of Threatened and Endangered Species
WoRMS/OBIS Marine Taxa
100 of the World’s Worst Invasive Alien Species (in GISIN)
Implications for data retrieval
Frost 2005 AMNH
• Notopthalmus viridescens
•Triturus viridescens
• Notopthalmus viridescens
• Notophthalmus viridescens
• Notophthalma viridescens
• Diemyctylus viridescens
• Triton viridescens
• Molge viridescens
• Diemyctylus minatus viridescens
• Triturus viridescens dorsalis
• Diemyctylus viridescens dorsalis
• Notophthalmus viridescens
dorsalis
•… 24 others
Dolbe 2004
• Notopthalmus viridescens viridescens
•Triturus viridescens
• Notopthalmus viridescens
• Notophthalmus viridescens
• Notophthalma viridescens
• Diemyctylus viridescens
• Triton viridescens
• Molge viridescens
• Notophthalmus viridescens dorsalis
• Triturus viridescens dorsalis
• Diemyctylus viridescens dorsalis
• Notophthalmus viridescens
louisianensis
Breakdown of the names problem
Homography (Homonymy)
A single name may refer to multiple concepts
Homonyms & Disambiguation

Homographs




Homonym




Virginia (the state) & Virginia Baird & Girard 1853 (the genus)
Tumor (cancer) & Tumor Huang in Huang Dawei 1990
Informed by: Algorithms/Lexicons (word sense disambiguation)
Agathis montana (the conifer) & Agathis montana (the wasp)
Wagneria Meladze 1967 & Wagneria Heilprin 1887 & 12 other Wagneria
Informed by: Nomenclators and Taxonomists
 Nomenclators establish the factual basis of homonyms and partial
disambiguation method
 Taxonomy provides a disambiguation method
Taxon Concept (Polysemes)


Gorilla gorilla Wilson and Reeder 1992 vs Gorilla gorilla Groves 2003
Informed by: Taxonomic opinion via monographs, floras, faunas, derived lists
Take home message

The names problem is inherent to all taxon data

We need a Global Taxonomic Resource
Needs to treat all names
Support multiple taxonomic opinion
Depends on many different source data





The Informatics sum is more than the content parts
Can only work in a federated enviroments



Requires communal
exchange data standards
communications protocols
What GBIF is doing about the
names problem
Current GBIF Taxonomic Infrastructure
(ECAT)
Catalogue of Life
 International Plant Names Index (IPNI)
 Index Fungorum


Is not enough
EXPAND to Global Taxonomic Infrastructure
Mobilize wide array of “checklist resources”
 Promote the use of nomenclatoral GUIDS in
all taxonomic checklists
 Enable synthesis of resources
 Enable informatics web services

Address Synonymy
Wider access to, and explicit classing of synonyms

Access to multiple lexical grouping algorithms

Access to, and support of, development of
nomenclators

Promote the use of nomenclatoral GUIDS in all
taxonomic checklists

More Taxonomic, Regional, Thematic checklists

Comprehensive Vernacular Names catalogue
Address Homography, Polysemy
Rapid cataloguing of homography

Access to multiple lexical grouping algorithms

Catalogue and classify all genera

Development of multiple disambiguation methods

Standardized representation of taxon concepts

Development of taxon concept comparators

Explicit assertions of concept relations
All Genus Index
Wider Implications of our
efforts
GBIF and Phyloinformatics
As a consumer of taxonomic data resources
 As a consumer of name services
 As a provider of taxonomic metadata
 Increased interoperability

How to contact GBIF:
Web site:
www.gbif.org
Data portal: www.gbif.net
GBIF Secretariat
Universitetsparken 15
2100 Copenhagen
Denmark
E-mail: [email protected]
Phone: +45 3532 1470
Fax:
+45 3532 1480
GBIF Secretariat building, supported by a grant from
the Aage V. Jensens Fonde