GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.
Download ReportTranscript GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG.
GLOBAL BIODIVERSITY INFORMATION FACILITY Developing taxonomic names services to enhance findability David Remsen ECAT Programme Officer October 22, 2007 WWW.GBIF.ORG Presentation Overview of GBIF and data portal Informatics challenges relating to taxon data What we are doing about it Wider implications of our efforts GBIF mission …to make the world’s biodiversity data freely and universally available via the Internet What is biodiversity? GBIF follows the broadly outlined CBD recognition of levels of biological diversity: • Molecules / genes • Species • Ecosystems / ecology New GBIF data portal http://data.gbif.org/ GBIF Data Types Core data types on GBIF network Taxon names Taxon occurrence information Fields used in indexing records specimen records from natural history collections observational records Mandatory Highly desirable Scientific name Geospatial location Institutional code Collection date Collection code Higher taxon info Catalogue number Date last modified GBIF network today Users List species recorded in Costa Rica Find all occurrences of Papilio machaon Registry Portal Mirror Databases Find occurrences of Primates from Madagascar Find type specimen for Coffea odorata DiGIR DiGIR DiGIR TAPIR DiGIR TAPIR DiGIR X X X X X X X Find occurrences from Antananarivo Province Index Mirror http://data.gbif.org/ Species: Achillea millefolium Kingdom: Animalia Country: Madagascar Dataset: Continuous Plankton Recorder Database Maps - occurrence density Actions http://data.gbif.org/ Occurrence download Names and type specimens Images occurrence record data http://data.gbif.org/ws/rest/occurrence occurrence density data taxon data http://data.gbif.org/ws/rest/density http://data.gbif.org/ws/rest/taxon GBIF Data Portal Web Services http://data.gbif.org/ws/rest/resource dataset metadata http://data.gbif.org/ws/rest/provider data provider metadata http://data.gbif.org/ws/rest/network data network metadata Web services http://data.gbif.org/ws/rest/occurrence/list/?taxonConceptKey=14724348&format=darwin Embedding in other sites iSpecies Portal Summary Hundreds of Institutional providers Thousands of Resources Millions of Records Collective & Integrated Access Wide Taxonomic, Temporal and Geographic Scope Free and Open Access to all Go forth and integrate! Parallel: GenBank Informatics challenges relating to taxon data The Makings of a problem Everything I Just Said Meets The names problem in biology All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. - Grimaldi & Engel, 2005, Evolution of the Insects Nature of the problem? Access to data is via limited points of entry Biology has a “names problem.” This names problem impacts these data access entry points. Exacerbated by: Wide taxonomic, temporal scope Federated origins of data Limited points of entry: search Limited points of entry: browse Breakdown of the names problem Synonymy A single concept may reference multiple names Equivalent Inclusive Homography (Homonymy) A single name may refer to multiple concepts Definition A single name may refer to multiple KINDS of concepts “Name” refers to: A lexical “concept” Synonyms A nomenclatural concept A set of character strings A Code-regulated fact A taxonomic concept Homonyms A Hypothesis or opinion All of these are important to distinguish Synonyms: Equivalence Lexical/Orthographic Informed by: Nomenclators, Taxonomies, Algorithm Nomenclatural Informed by: Nomenclators, Monographs (with interpretation) Taxonomic Informed by: Monographs, Floras, Faunas, derived checklists Different classes of equivalence are addressed by different resources Lexical synonym: A single concept may reference multiple names ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam IPNI Gerardia paupercula var borealis (Pennell) Deam MOBOT Gerardia paupercula Britt. var borealis Deam Informed by: Nomenclators, Taxonomies, Algorithm Identifies the preferred lexigraphy of the name Automates the grouping of lexical variation Orthographic synonym: A single concept may reference multiple names Loligo pealeii Loligo pealei Loligo pealii Loligo plei Informed by: Nomenclators, Taxonomies, Algorithm Vernacular synonym: A single concept may reference multiple names Nomenclatural synonym: A single concept may reference multiple names Nomenclatural synonym: A single concept may reference multiple names ILPIN Gerardia paupercula (Gray) Britt. var borealis (Pennell) Deam IPNI Gerardia paupercula var borealis (Pennell) Deam MOBOT Gerardia paupercula Britt. var borealis Deam MOBOT Agalinis paupercula (Gray) Britton var. borealis Pennell (Zenkert 1934) OHIO DNR Agalinis paupercula (Gray) Britt. var. borealis Pennell IPNI Agalinis paupercula Britton var. borealis Pennell ITIS Agalinis paupercula var. borealis Pennell Informed by: Nomenclators and generally NOT by taxonomy Taxonomic synonym: A single concept may reference multiple names (or it may not) Informed by: Taxonomic Sources Synthesized synonymy: A bit of everything Informed by: Algorithm, Nomenclators,Taxonomic Sources Another example Aedes calopus | Stegomyia Aegypti | Culex aegypti Synonymy: Inclusive Classifications Catalogue of Life Integrated Classification Annotated Checklist of the Neuroptera - Mansell 2006 NCBI Taxonomy Cladograms, Phylograms Regional lists Phylogenetic representations Cetacea of the Hebrides Flora of China Thematic Lists 2006 IUCN RedList of Threatened and Endangered Species WoRMS/OBIS Marine Taxa 100 of the World’s Worst Invasive Alien Species (in GISIN) Implications for data retrieval Frost 2005 AMNH • Notopthalmus viridescens •Triturus viridescens • Notopthalmus viridescens • Notophthalmus viridescens • Notophthalma viridescens • Diemyctylus viridescens • Triton viridescens • Molge viridescens • Diemyctylus minatus viridescens • Triturus viridescens dorsalis • Diemyctylus viridescens dorsalis • Notophthalmus viridescens dorsalis •… 24 others Dolbe 2004 • Notopthalmus viridescens viridescens •Triturus viridescens • Notopthalmus viridescens • Notophthalmus viridescens • Notophthalma viridescens • Diemyctylus viridescens • Triton viridescens • Molge viridescens • Notophthalmus viridescens dorsalis • Triturus viridescens dorsalis • Diemyctylus viridescens dorsalis • Notophthalmus viridescens louisianensis Breakdown of the names problem Homography (Homonymy) A single name may refer to multiple concepts Homonyms & Disambiguation Homographs Homonym Virginia (the state) & Virginia Baird & Girard 1853 (the genus) Tumor (cancer) & Tumor Huang in Huang Dawei 1990 Informed by: Algorithms/Lexicons (word sense disambiguation) Agathis montana (the conifer) & Agathis montana (the wasp) Wagneria Meladze 1967 & Wagneria Heilprin 1887 & 12 other Wagneria Informed by: Nomenclators and Taxonomists Nomenclators establish the factual basis of homonyms and partial disambiguation method Taxonomy provides a disambiguation method Taxon Concept (Polysemes) Gorilla gorilla Wilson and Reeder 1992 vs Gorilla gorilla Groves 2003 Informed by: Taxonomic opinion via monographs, floras, faunas, derived lists Take home message The names problem is inherent to all taxon data We need a Global Taxonomic Resource Needs to treat all names Support multiple taxonomic opinion Depends on many different source data The Informatics sum is more than the content parts Can only work in a federated enviroments Requires communal exchange data standards communications protocols What GBIF is doing about the names problem Current GBIF Taxonomic Infrastructure (ECAT) Catalogue of Life International Plant Names Index (IPNI) Index Fungorum Is not enough EXPAND to Global Taxonomic Infrastructure Mobilize wide array of “checklist resources” Promote the use of nomenclatoral GUIDS in all taxonomic checklists Enable synthesis of resources Enable informatics web services Address Synonymy Wider access to, and explicit classing of synonyms Access to multiple lexical grouping algorithms Access to, and support of, development of nomenclators Promote the use of nomenclatoral GUIDS in all taxonomic checklists More Taxonomic, Regional, Thematic checklists Comprehensive Vernacular Names catalogue Address Homography, Polysemy Rapid cataloguing of homography Access to multiple lexical grouping algorithms Catalogue and classify all genera Development of multiple disambiguation methods Standardized representation of taxon concepts Development of taxon concept comparators Explicit assertions of concept relations All Genus Index Wider Implications of our efforts GBIF and Phyloinformatics As a consumer of taxonomic data resources As a consumer of name services As a provider of taxonomic metadata Increased interoperability How to contact GBIF: Web site: www.gbif.org Data portal: www.gbif.net GBIF Secretariat Universitetsparken 15 2100 Copenhagen Denmark E-mail: [email protected] Phone: +45 3532 1470 Fax: +45 3532 1480 GBIF Secretariat building, supported by a grant from the Aage V. Jensens Fonde