GBIF4scienceMeetings

Download Report

Transcript GBIF4scienceMeetings

INFORMATION
FACILITY
GBIF efforts in digitizing
GLOBAL
and mobilising primary
BIODIVERSITY
biodiversity data
Vishwas Chavan and Nicholas King
February 12, 2008
[email protected]
WWW.GBIF.ORG
GBIF’s Mission
…to make the world’s biodiversity data freely
and universally available via the Internet
What is biodiversity?
GBIF follows the broadly outlined CBD recognition of
levels of biological diversity:
• Molecules / genes
• Species
• Ecosystems / ecology
Who needs primary biodiversity data!





Scientists, experts, consultants
Government officials at all levels
Farmers, foresters, indigenous
communities
Education at all levels
NGOs and the general public
These needs are highly varied, but
can be met by open access to the
same datasets
The same data can be analysed
differently for different uses
But this needs easy access to (digitised)
data
Screen shot: 26 Oct 2007
As of end
07 GBIF
facilitates
access to
142
million
primary
data
records
GBIF Data Portal: Dispelling Mythes!

Distributed,
Decentralised, Data
Discovery and Access
through network of
heterogenous and
multicultural partners is
possible!




Searches
 Taxonomic
 Geographic (by country or
bounding-box)
 By dataset
Taxonomic browse navigation
using choice of classification
Integration of data: DiGIR-Darwin
Core & BioCASe-ABCD (new
versions), TAPIR, tab-delimited,
TCS, SDD
Search and download by one to
many species, geography, dataset
(or combination)
Web services
Countries are organised
alphabetically on the lhs,
and show numbers of
national records on the rhs.
Here we can see that there
are more than 3.2 million
records available for South
Africa (2,8 million with
coordinates), referring to
nearly 41.500 species
Example of a country
summary page. This map
provides an overview of the
density of records currently
available.
Sample of records available for
South Africa at September 2007.
The GBIF portal offers a range of
options for further use of the
data…
It is also possible to get the full list of
organisations providing data collected in a
specific country or region
In this case 68 collections from
all over the world are making
available data for South Africa
through GBIF – a good exemplar
of data repatriation activities
promoted and facilitated by
GBIF
South African institutions are also
providing data relevant to other
countries and regions in the
world, as demonstrated in this
example from the Shark
Collection at the Iziko South
African Museum
The GBIF data portal also allows for
more detailed views of regions,
datasets, taxonomic groups, etc.
Here it is possible to see nearly
100 000 records from the Linefish
dataset collected in 1989 by the
Marine and Coastal Management
(MCM) at the Department of
Environmental Affairs and Tourism
in South Africa
Exporting data from
the GBIF data portal to
other applications
such as Google Earth is
a matter of a click!
Coverage for Africa



>5m records
currently for
Africa
> 1m from EU
country
institutions
Estimated
>100m not yet
digitised
Within Google Earth overlays it is also
possible to go down to the level of
individual primary records, getting back
to the original data provider
With the filter functionality it is possible to
perform complex queries on the data.
In this example we are looking for all records on
Lepidoptera (butterflies) collected or observed in
South Africa from 1950 to 2000.
Range changes due to Climate Change
Proteaceae in the Cape Floral Kingdom
Leucospermum tomentosum: range
centres in 10 year time slices
Present
20%
40%
60%
80%
100%
Distances
moved
(km)
0
25.3
20.0
17.2
46.4
17.4
Average
altitude
(m)
88.57
113.83
137.93
194.85
269.91
296.06
Average
latitude
(°S)
33.21
33.43
33.59
33.72
33.98
34.09
But, this is just a beginning.......
We need to cover much beyond
imagination, and much much
faster than we think?
Biological Data Domain - challenges
Greatest
Informatics
Problems
Sub-domain
Digital Status
Molecular
Sequence &
Gene/Genome
Data
95% digital
Data migration,
Persistent digital,
universally accessible data cleansing, vouchering,
taxonomy (gene &
stores
species)
Species- &
Specimen Data
<5% digital
Persistent physical data
stores, accessible with
difficulty
Digitisation, migration
of legacy data,
indexing
Ecological &
Ecosystem Data
80% ? digital
Persistent digital and
physical data stores,
moderately accessible
Migration of legacy
data, metadata
generation, taxonomy
(species)
Data Status
Primary Biodiversity Data
• Both biodiversity and biodiversity data are
unevenly distributed around the world:
Developed World
Developing World
Biodiversity
Data
Biodiversity
Digital Divide
Content Divide
Lingual Divide
Knowledge Divide
Emerging catastrophe…………
Primary Biodiversity Data
Biological
Collections
N
A
M Observations /
Monitoring
E
S
Multimedia
Resources
Growth rate of GBIF data sharing
Growth in Data Sharing Oct 2003 - Oct 2007
Providers
Records
250
160.0
140.0
Data Providers
120.0
100.0
150
80.0
100
60.0
40.0
50
20.0
0
0.0
Data Records (in millions)
200
07
J ul-
-0 7
J an
06
J ul-
-0 6
J an
05
J ul-
-0 5
J an
04
J ul-
-0 4
J an
1 Billion Record by 2008 – We need to expedite!
Goal for Growth in Occurrence Data* by End 2008
1800
1,000.0
1600
900.0
800.0
1400
Providers
Records
700.0
600.0
1000
500.0
800
400.0
600
300.0
400
200.0
200
100.0

Many specimens
remain to have their
data digitised
Many records are
already digital...

… but are not yet
being shared
Fe
b08
De
c08
O
ct
-0
7
Ja
n07
Ja
n06
Ja
n05
0.0
Ja
n04
O
ct
-0
3
0
Data Records (in millions)
1200
Data Providers

* data useful in analyses that contribute to sustainable management of biodiversity
GBIF is all about our shared vision and
partnership



28 Voting Country
Participants
15 Associate
country
Participants
35 International
Organisations and
Economies
GBIF Working Principles

Collaboration and sharing — not compilation



Ownership of data (specimens or names) remains
entirely with providers
Standardised schemata for data sharing — software
free to providers
Worldwide network of collaborating institutions that
share data (data providers)

GBIF’s Participants’ Nodes promote and coordinate
activities of data providers
GBIF Working Principles


Procedures for interoperability and data integration

Web services (mostly for machines, but for people too)

Global registry for advertisement of shared data
Vision and coordination


GBIF has a unique global mandate in both Informatics
and Content
GBIF is a multi-purpose, open-ended cyberinfrastructure that facilitates biologists serving
biodiversity and society in new ways
GBIF Strategic Areas 2007 – 2011

Informatics




Content




Data portal powerful and friendly
Consolidated infrastructure and standards
Tools and support for Nodes and providers
Data quantity and richness in priority areas
Data integration and discovery
Documented data quality
Participation


Nodes' expertise shared across the network
Guidance on setting up and maintaining Nodes
Data: Fitness for Use
• In a database, the data have no actual quality or
value; they only have potential value. That value is
realized only when someone uses the data to do
something useful (English 1999).
• The quality of data cannot be assessed
independently of the uses of that data (Strong et al.
1997).
• Data are of high quality if they are fit for their
intended use in operations, decision-making, and
planning (Juran 1964).
Data standards / protocols used by GBIF

Darwin Core (TDWG data standard)



ABCD - Access to Biological Collection Data (TDWG data standard)



More complex XML data model to represent collection or observation data
Detailed document structure including features for different communities
Taxon Concept Schema (TDWG data standard)



Simple XML data model to represent taxon occurrence records (only core attributes)
Extensions to handle e.g. curation details, geospatial data, microbial specimens
XML data model for exchange of nomenclatural/taxonomic data
Will be supported in new GBIF data portal
Tab-delimited links to species information


Lists of scientific names, URLs and key words
Will be supported in order to establish links to external resources from the
new GBIF data portal
Data standards / protocols used by GBIF

DiGIR / BioCASe / TAPIR (TDWG access protocols)




SPICE protocol (Species 2000 access protocol)



XML protocols for searching remote data resources
Suitable for use with a wide range of different data models
TAPIR (latest version) supports flexible views and simple URLs
Web service interfaces for exploring taxonomic data (hierarchies,
synonymy, common names)
Will be supported for connecting data resources to new GBIF data portal
LSIDs – Life Science Identifiers (TDWG-adopted GUID mechanism)


Globally unique identifiers to simplify tracking data records
Include protocol for resolving data for any LSID
Examples of resources provided by GBIF
free
GBIF Training Manual 1: Digitisation of
Natural History Collections
CONTENTS
 Introduction
 The Uses of Primary Species Occurrence Data

Initiating a Natural History Collection Digitisation Project

Principles of Data Quality

Principles and Methods of Data Cleaning

BioGeomancer Guide to Best Practices for Georeferencing

Guide to Best Practices for Generalizing

Glossary and Acronym Expansion

To be released by end February 2007.
Observational Data Task Force

Quantum of observational data is unprecedented

Over 60% of GBIF mediated data is observational

Observational Data Task Group
• Recommend GBIF on mobilisation of observational data
• Criteria for Observational Data Sharing Infrastructure
• Metadata Schema for Observational Schema
• Protocols / Standards for observational data exchange / sharing
• Best Practices Guide for observational data management
• Encourage participation of potential data providers

Report by September 2008
Enhanced support for data providers

Broader range of supported import formats and
protocols

Occurrence data



Taxonomic data




Darwin Core (original v1.2, MaNIS, OBIS, new v2.0 with extensions)
ABCD (v1.20, v2.06)
Catalogue of Life CD-ROM (moving to dynamic checklist)
Nomenclators via tab-delimited lists of LSIDs (work under way)
Data from ECAT projects (models and tools under way)
Other resources


Discussions under way with other resources (GenBank, BOLD, ARKive)
General support for handling XML and tab-delimited formats
Enhanced support for data providers

Validation and annotation of data during indexing




Clear separation between “raw” and “processed”
index data



Presence of required fields
Consistency between country name and coordinates
Reports for data providers
Scientific name string versus interpreted taxon
Country name string versus interpreted country
“Home page” for each data resource
Training, Capacity Building, Mentoring

Training programs on how to share data

Training on Ecological Niche Modeling

Mentoring to developing countries

Help Desk services
Call for Action!
With GBIFs’ decentralised approach
of NBIFs, RBIFs, and ThBIFs Africa
has lots to contribute.....
Individual, institutional, national,
regional and global level!
How to contact GBIF:
Web site:
www.gbif.org
Data portal: www.gbif.net
GBIF Secretariat
Universitetsparken 15
2100 Copenhagen
Denmark
E-mail: [email protected]
Phone: +45 3532 1470
Fax:
+45 3532 1480
GBIF Secretariat building, supported by a grant from
the Aage V. Jensens Fonde
Merci beau coup / Thank you
Questions?
Questions?
Questions?