GBIF Status and Information Infrastructure

Download Report

Transcript GBIF Status and Information Infrastructure

WWW.GBIF.ORG
GLOBAL
BIODIVERSITY
INFORMATION
FACILITY
GBIF network status,
current developments,
and plans for 2004
Hannu Saarenmaa, Donald Hobern, Larry Speers,
Per Bjørn & Giorgos Ksouris
TDWG 2003
Global Biodiversity Information Facility
Oieras, 25-26
October 2003
GBIF objective



GBIF is an international collaborative megascience
project
based on a multilateral agreement (MoU)
between countries, economies and international
organisations,
dedicated to establishing an distributed
information infrastructure containing primary
biodiversity data,



with initial focus on species- and specimen-level data,
with links to molecular, genetic and ecosystems levels
in order to make the world’s scientific biodiversity
information freely and universally available to all.
Global Biodiversity Information Facility
GBIF needs TDWG
GBIF builds on standards developed and
reviewed by the global biodiversity
informatics community, i.e., the TDWG.
 GBIF is building global information
infrastructure for biodiversity. That
requires choosing some standards for
implementation.

Global Biodiversity Information Facility
GBIF is
a global
integrator
Global Biodiversity Information Facility
GBIF DiGIR Architecture
( UDDI )
Provider
query
Registry
Institutions
Providers
Services
Metadata
and name
query
Index
Metadata
Accounting
DiGIR
SOAP
HTTP
Synonyms
Name provider
Provider Services
Portal
Request
Marshaller
Cache
Available
providers
Publish
availability
User
Metadata
response
Metadata
and
statistics Full data
query
Query
Engine
Full data
response
Data provider
Provider Services
other
Resource
Resource
Metadata
Metadata
Resource
Resource
Metadata
Metadata
Global Biodiversity Information Facility
UDDI –
Institution
Universal Description, Discovery
and Integration of Web Services
Collection
Collection
businessEntity
businessEntity
businessEntity
businessService
businessService
bindingTemplate
bindingTemplate
InstanceDetails
InstanceDetails
categoryBag
keyedReference
keyedReference
identifierBag
keyedReference
keyedReference
BioCASE
digir.php
Schemas, protocols,
interchange specifications
tModels
GEO Code
Biological taxonomies
GBIF Participant IDs
Collection, institution codes
Thematic network IDs
Global Biodiversity Information Facility
(MaNIS,…)
What’s in the Registry?
Standards bodies and
developers register information
about their technical models,
including data standards,
specifications, taxonomies, etc.
Businesses register public
information about themselves
and the services they offer
Technical
Models
a.k.a.
“tModels”
White Pages
Yellow Pages
Green Pages
Global Biodiversity Information Facility
How does the GBIF UDDI registry work?
6) Scientists, decisionmakers, and others use
portals to build data sets
for analysis and synthesis
1) GBIF Secretariat and other
developers create and populate
the registry with descriptions of
standards (tModels)
GBIF UDDI Registry
Provider
Registrations
2) Museums and other
data providers install
data provider packages
which are automatically
registered
Services
Registrations
3) GBIF Participant is
notified of new provider in
their domain, for endorsement as a GBIF data provider
5) Specialised
portals and search
engines can be
built to query the
registry and the
index
4) A global index queries the
registry, caches metadata, and
creates a unique identifier for
each record (and name)
Global Biodiversity Information Facility
Shall we register this new DiGIR provider in GBIF UDDI Registry
(http://registry.gbif.net)?
Registration is necessary so that users, search engines, and portals can
locate your provider. By registering with GBIF you agree that your defined
DiGIR-Provider-resources can be queried by applications using the DiGIR
protocol, within the limitations that you can set up later and modify as
needed. Furthermore, you allow GBIF to publicly serve metadata describing
your data.
The GBIF Participant Node in your domain will be informed of your
registration.
Global Biodiversity Information Facility
Global Biodiversity Information Facility
Global Biodiversity Information Facility

Supported by
[email protected]

Turn-key package

Based on PHP and
DiGIR project code

Available now for
Linux and Windows

Registration with
GBIF UDDI registry
Global Biodiversity Information Facility
Data Repository Tool





Upload and manage datasets in
document format such as spreadsheet
and XML
Parses the data into embedded MySQL
database that becomes available to the
public as a DiGIR resource
Revoke release (data is deleted from
database)
Enable data custodians to manage
and publish their own data
Make available a simple data
warehouse tool for those who
want to host datasets for the
community
Global Biodiversity Information Facility
What is my institution code and
collection code?


The most common question at helpdesk and training now.
Why should each record have a globally unique ID?





How: LSID/URN with 4+ elements



To trace downloaded data back to the original source
Support data usage accounting services
To allow for updates/corrections
To create a URL to view the data
Format network:institution:collection:key
Example gbif:helsinki.fi:Lepi-SPS:44622
Issues


Global lists of institution and collection codes for UDDI?
Work with Index Herbariorum & others to standardise codes and
related information exchange
Global Biodiversity Information Facility
GBIF node responsibilities
1.
2.
3.
4.
Network
Registry
Standards
Tools
GBIF
Registry,
Index, and
Portal
1.
2.
3.
1.
2.
Data
Node
Metadata
Data
1.
2.
1.
2.
3.
4.
5.
6.
Coordination
Network
Registry
Standards
Tools
Consolidated Data
Identify Data Nodes
Endorse and quality assure
data nodes
National Language Interfaces
Participant
Node
Portal
Encourage participation
Manage registration of Data Nodes
Global Biodiversity Information Facility
Participants are currently working
on their network topologies
GBIF
Registry
Participant
Portal A
GBIF
Index
GBIF
Portal
Participant
Portal B
Participant
Portal C
Data
Warehouse
Decentralised
Data
Warehouse
Centralised
Global Biodiversity Information Facility
GBIF network status

NODES committee set its goal to have a DiGIR
network up and running by end of 2003



GBIF UDDI registry is up and running





Seven regional workshops and training events
Two DiGIR provider implementations available
Think of it as The One Global Marketplace of
biodiversity data and services
Register your data now
Global index Q4/2003
Portal to browse and search data Q4/2003,
toolkit Q1/2004
[email protected]
Global Biodiversity Information Facility
GBIF Development Status
Presentation by Donald Hobern
GBIF Program Officer for Data Access and
Database Interoperability
Global Biodiversity Information Facility
GBIF Aggregated Data Portal
HTML
(Forms)
Interface
User
Feedback
Service
Metadata
Index
Data
Biodiversity
Data
Name
Data
Presentation
Service
Biodiversity Access Framework (XML Services)
Session
Manager
Index
Manager
Search
Engine
Taxonomic
Name Service
Registry Manager
Data Connection Framework
Service
Metadata
Access and
Feedback Data
Geographic
Service Data
Biodiversity Global Biodiversity
Name Information Facility
Data
Data
Key Standards and Elements
• XML data exchange based on Providers, Services and Biodiversity Data
Records
• UDDI registry for technical (access) metadata
• Descriptive metadata retrieved through service interfaces
• Specimen/observation exchange using DiGIR-Darwin Core or BioCASEABCD
• Taxonomic name data from Catalogue of Life (annual checklist for first
release, moving to service-based approach as possible)
• Java (and JSP) components being developed centrally for GBIF Portal
• Current portal development using Tomcat, Xerces, Log4J, MySQL
• Components to be packaged for reuse as appropriate
Global Biodiversity Information Facility
Implementation Plan for GBIF Network
2003
August
Software
Packages for
Data Providers
available
Registry of
Data Nodes
established
September
October
2004
November
December
January
February
Training Courses in use of Software Packages
GBIF Participants establish Data Nodes
Registration
tools tested
GBIF Participants register Data Nodes
Data Indexing
tools tested
GBIF Portal offers access to Biodiversity Data
Usage
Reporting
tools tested
Legend
GBIF
Secretariat
activities
GBIF
Participant
activities
GBIF Portal reports data
usage to Data Nodes
Global Biodiversity Information Facility
DADI Work Programme 2004
• Support development of DiGIR/BioCASE tools (seeking to merge protocols)
• Encourage inclusion of native DiGIR/BioCASE support in key collection/observation databasing
packages
• Investigate special requirements for observational data sets
• Work with TDWG Taxonomic Names Subgroup on standards for exchange of taxonomic name data
• Work to integrate nomenclators into names architecture
• Development of metadata standard and tools
• Work with TDWG Spatial Data Subgroup to plan (and start developing) GIS infrastructure for biodiversity
data
• Investigate long-term integration of ecological and genetic data levels
• Investigate long-term integration of Structured Descriptive Data
• Investigate long-term models for Digital Biodiversity Literature
Global Biodiversity Information Facility
DIGIT and ECAT in the GBIF work
programmes 2003 and 2004
Presentation by Larry Speers
Global Biodiversity Information Facility
DIGIT- Goals

To facilitate access to data associated with the
specimens in the world’s natural history collections

To identify efficient and cost effective ways to organize
and accelerate the specimen digitization process

To facilitate the sharing of specimen data with users in
the developing world

To facilitate the advancement of biodiversity science
through improved access to primary species occurrence
data
Global Biodiversity Information Facility
DIGIT 2003 Work Program
Purpose: Use a request for proposals (RFP) to award
seed money to partially fund different categories of
digitisation projects, including some new projects and
some projects aimed at improving existing databases
and bringing them on-line.
Time Period: Up to 18 months from date of award
Global Biodiversity Information Facility
DIGIT 2003 Work Program Results

139 Pre-proposals received


Full Proposals requested from 39 Pre-proposal applicants

2 decided not to submit

1 submitted late and not included in the final review
36 Full Proposals Reviewed

A total of $1,628,770 (US) requested

Proposals included projects for imaging of type
specimens, digitising specimen labels, improving data
quality, digitisation tool development
Global Biodiversity Information Facility
DIGIT Proposal Review Criteria

Proposals were evaluated for scientific excellence.

In addition, they were evaluated on how well they supported the
GBIF philosophy:
 Potential for the earliest possible access to large data sets
 Potential for networking and building increased 'Natural
History Collections Community' collaboration
 Potential for testing and documenting digitisation 'Best
Practices'
 Emphasis on data-sharing with countries of origin
 Potential for international collaboration
 Potential for leveraging additional long term funding to support
the specimen digitisation process
 Components for training and capacity building
Global Biodiversity Information Facility
DIGIT 2003 Work Program Results

Review committee recommendation


If funds available, fund 33 of the 36 proposals
Funding was available to fund the top 17 of the 33
recommended proposals:

Taxonomic coverage of top 17 proposals






1 tool development project
5 vascular plant collection digitisation projects
5 zoological collection digitisation projects
4 projects involving digitisation of both zoological and botanical collections
1 project each in mycology and bryophyte collections digitisation
Abstracts of funded projects are available on GBIF Website
(www.gbif.org)
Global Biodiversity Information Facility
DIGIT 2003 Work Program Results

DIGIT seed money grants will leverage approximately
$2.8 million (US) in investment in Natural History
Collections digitisation activities

DIGIT seed money awards should, by the end of 2004:

Make more than 1 million new specimen records available for
internet access

Make more than 70,000 records of type specimens including
images available for internet access

Geo-reference, quality check and make available 800,000 existing
specimen records
Global Biodiversity Information Facility
ECAT Proposal Review Criteria


Proposals were evaluated for scientific excellence.
In addition, they were evaluated on how well they
supported the GBIF philosophy:






Likelihood to produce results within a limited timeframe, and
potential for the earliest possible access to large data sets
Feasibility
Cost-effectiveness -- low cost per name
Collaboration among institutions and/or organisations
Potential for networking and building collaborative networks of
taxon specialists, with potential for training and capacitybuilding
Linkage with existing projects (including projects with a DIGIT
aspect)
Global Biodiversity Information Facility
ECAT RFP Results

67 Pre-proposals


Requested Full-Proposals from 32 applicants
32 Full-Proposals Reviewed

A total of $1,375,095 (US) requested

Proposals included digitising of printed
catalogues, GSD development at various stages,
checklist writing, and programming of
“wrappers”
Global Biodiversity Information Facility
ECAT RFP Results



Review Committee recommendation
 If funds available fund 28 of the 32 proposals
Funding was available to fund the top 12 of the 28 recommended
proposals:
 Taxonomic coverage of top 12 proposals
 2 vascular plant GSD projects
 2 projects developing nomenclators for animals and fungi
 6 projects building GSDs of insect groups, e.g. Weevils, Flies
 1 regional mollusc GSD
 1 to develop tool for producing “wrappers”
Abstracts of funded projects available on GBIF website
Global Biodiversity Information Facility
ECAT RFP Results

ECAT seed money grants will leverage approximately $7.25
million (US) investments in nomenclators, GSDs and networking
activities

ECAT seed money awards should by the end of 2004 result in

Addition of 701,000 names to existing GSDs and nomenlators

Addition of 366,850 species to existing and newly emerging
GSDs

- all available through GBIF Network

- very conservative estimate
Global Biodiversity Information Facility
DIGIT - ECAT 2004 - Request for Proposals

Call for proposals will be announced on GBIF web site in late
2003

Process and criteria currently being developed with input from
DIGIT and ECAT Science Sub-Committees

Expect to fund approximately 9 projects for DIGIT and 6 projects
for ECAT

Maximum award $50K US

Process and criteria expected to be similar to 2003 seed money
RFP
Global Biodiversity Information Facility
Guidelines

GBIF’s global perspective gives it a unique
role.

As a general principle, GBIF should support
projects that contribute to the development
of a global biodiversity information
infrastructure.

In particular, it should fund the development
of datasets, networks or tools needed for the
global effort but are difficult for local or
regional funding agencies to support.
Global Biodiversity Information Facility