20041019_csiro_powerpoint_template.pot

Download Report

Transcript 20041019_csiro_powerpoint_template.pot

Evolving concepts in the architecture of
OBIS, the Ocean Biogeographic
Information System
Tony Rees
Phoebe Zhang
CSIRO Marine Research
Rutgers University, N.J.
29 November 2004
OBIS Basics...
OBIS – the Ocean Biogeographic Information System
 Single access point for distribution records for marine species from multiple
sources over the internet, with onward access to analytical tools and maps
(including correlations with environmental data etc.) – ultimately to be a 3-d
and 4-d atlas of marine species distributions
 Designated role as the data and information management component of the
Census of Marine Life (CoML operational lifetime: 2000–2010)
OBIS (brief!) history
 Vision developed during a series of workshops, 1999–2000
 8 initial data providers funded to develop content for OBIS, 2000–2001
 Initial version of OBIS Portal went “live” in January 2002 – based at Rutgers
University, N.J. (www.iobis.org)
 Additional technical development and expanding content, 2002 – ongoing
Focus of this talk...
 “Behind the scenes” look at OBIS architecture, and how the OBIS Portal has
evolved over the past 2+ years – focus on features supporting user searches.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
OBIS Version 1 (Jan 2002–Feb 2004)
Mapping
tool 2
Mapping
tool 3
C-squares
mapper
3 (optional): pass to
3rd party tools for
mapping, etc.
www user 1
data provider 1
OBIS
Portal
www user 2
data provider 2
www user 3
(etc.)
data provider 3
1: submit
search request
2: retrieve
matching
data
(etc.)
= custom database wrapper
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Advantages, disadvantages of this approach
Advantages
 Technically simple to
implement, portal simply relays
queries and processes /
integrates results
 No custodianship issues – data
remain with providers
Disadvantages
 System performance reliant on factors outside portal’s control – provider/s
may be slow, off line, etc. at query time
 No prior knowledge of what species are worth querying on, how much data
will be returned, etc.
 Have to wait for all data to be returned before passing to mappers, etc.;
species handled serially (one at a time)
 Spatial searches slow (have to parse millions of point data records)
 No facility to search by taxonomic group, search for “near matches”, etc.
(such facilities not supported at provider end)
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Analogy with internet search engines
(e.g. Google ® etc.)
... Would be impossibly slow to search 8 billion web pages in real time
to service a user’s initial request
- Google, etc., construct locally held indexes (e.g. sorted by term etc.)
and search these in the first instance – can provide very fast results
- Indexes are constructed by continuously crawling the web for new or
updated content (note: may be a currency issue here)
- Also, Google constructs a local data cache of all content – remains
accessible even if original provider is off line.
--------------------------------Equivalents for OBIS: a name index and a spatial index – to support
name / category, and spatial searching; also a locally held data cache.
- Cache is built by crawling the remote data providers (and refreshed at
intervals)
- Name and spatial indexes are built by parsing the Cache content.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
The OBIS Index
The OBIS Index – actually a small
relational database
 Main table “obis_species” has 1
row for every species name
currently in OBIS (currently c.
110,000), i.e. substantially smaller
than overall number of records in
the system (currently 5m+ and
rising)
 “obis_groups” table has the
custom taxonomic hierarchy
currently used in OBIS
 “obis_distributions” table has the
spatial index
– higher level concept than the GBIF index (which is a data item level
listing); intended for rapid, concise taxon-level information to be returned
prior to any actual data extraction.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Fragment of the “OBIS groups”
(taxonomic categories) table
 Intention is to provide popular / recognisable groupings (not necessarily
equivalent to strict phylum, order, family treatment)
 Hierarchical coding allows simple interrogation at any level of the hierarchy.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Underpinning the spatial index ...
allocation to global 0.5º x 0.5º global
grid squares, using “c-squares” hierarchical notation...
7500:499
7500:4
7500
7500:499:4
60º N
 0.5º x 0.5º
squares
(shown in
red) are
current units
of spatial
indexing for
OBIS
50º N
010º W
000º W
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Fragment of the spatial index...
 Index “knows” all the squares within which any species has data records
(out of global set of 259,200)
 Now simple to retrieve either:
- all species for a given square (at any level of the hierarchy)
- all squares for a given species
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Spatial index also supports high level
mapping direct from the index
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Spatial index also supports high level
mapping direct from the index
 “Quick maps” use the
CSIRO Marine Research
c-squares mapper,
accessed via a web call
 Again, efficiencies here
in only sending the list of
squares to be mapped
rather than all the data
points (may execute e.g.
5–50 times faster)
 Effectively a “data
preview” function, prior to
getting the data for more
sophisticated mapping /
analysis if desired
 Also functions as a
clickable GUI to query
individual data points in a
region.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Other information held in the name index...
 Full list of taxon names from data providers, plus additional “names without
data” from the Catalogue of Life compilation (to allow matching on names
even when no data yet available, status assessment, etc.)
 Metadata on every species (how many records, which data providers, what
date range), for user display prior to “get OBIS data” request
 Taxonomic group allocation for all taxa, plus common names as available
 “Near match” versions of all names to support “fuzzy” search option
 Hiding / screening of species deemed “non-marine” – according to preformatted list/s, also of some “junk” data – e.g. “unknown species A” etc. –
supplied by data providers
 Preliminary reconciliation of junior synonyms / known misspellings, based on
Cat. of Life information as available
 Onward links to Cat. of Life where appropriate, for further taxonomic
information, full synonymy, etc.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Putting it all together...
 OBIS home page – includes clickable map for spatial searching
– currently set to 10º x 10º squares (could be finer as more data available)
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Full scientific name / category search page
 (etc.)
 Examples: all fishes beginning with “a”, or “all whales”, or all species of “Lutjanus”...
 Common name, browse category searches also available
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Example search result for “all whales”
(initial portion of page)
 Note, all results presented with metadata, “Quick maps”, etc... (all from index content)
 “Get OBIS data” link initiates a data extraction from the Cache (e.g. 1 – 40,000+ points)
 Also included: names without data (from Cat. of Life), plus “near matches” if applicable
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Revised architecture:
OBIS Version 2 (March 2004-current)
data provider 1
data provider 2
Mapping
tool 3
data provider 3
Mapping
tool 2
(etc.)
Provider
crawling
C-squares
mapper
3a (optional):
“Quick map” for
any species
3b: “get OBIS data”
for relevant species,
send to mappers, etc.
Index building
www user 1
OBIS
Portal
www user 2
www user 3
(etc.)
OBIS
Cache
1: submit search
request
(“Stage 1” query)
OBIS
Index
2: retrieve matching
names + metadata
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Cat. of
Life
Comparison between versions...
 New version supports far better user
experience – improved performance, many
new features, enhanced browse / preview
functionality, etc. etc.
 Data cache and “metadata layer” (index)
mean that most / all queries have
effectively been run in advance, saving
search time
 However, new system is technically more
challenging to design / install / maintain,
with consequent resourcing implications
 Also, move away from live point-of-origin
queries (for performance reasons) means
that index, cache, etc. must be continually
updated (to avoid currency issues).
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Next steps...
 More data, data providers every month (scalability, performance
monitoring issues)
 Completion / bedding down / more automation of current functionality
(most is fairly new and still undergoing a degree of real-world testing)
 Desirable to support gazetteer, polygon-based spatial searching (how?)
 May wish to represent more than simple presence/absence data –
biomass / numbers, tracking / tagging, effort, etc. – does this affect
options for the index?
 Inclusion of some data screening / flagging at point (=Cache) level
(currently, operates only at taxon, = index level)
 Plus ++ ?? (system and concepts are still to a degree an evolving
entity).
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Contacts
Dr Tony Rees
Manager, Divisional Data Centre
CSIRO Marine Research, Australia
Phone: +61 3 6232 5318
Email: [email protected]
www.marine.csiro.au/datacentre
Dr Phoebe Zhang
OBIS Portal Manager
IMCS, Rutgers, the State University of
New Jersey, USA
Phone: +1 732 932 6555
Email: [email protected]
www.iobis.org
Thank You
Additional information:
OBIS web site – www.iobis.org
C-squares web site – www.marine.csiro.au/csquares
Supplementary slides
c-squares notation...
7500:499:4
 Square “minimum”
corner is at 59.5º N,
009.5º W, according to
the following formula:
 7500:499 is at 59º N,
009º W (7500:499,
7500:499)
 Next digit “4” indicates
next intermediate
quadrant
 Initial digit “7” indicates
NW global quadrant.
7
1
Global quadrants:
Intermediate quadrants:
5
3
4 3
2 1
3 4
1 2
2 1
4 3
1 2
3 4
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
A “different” / rapid approach to
spatial searching (programmatically speaking)
 Normal spatial query would be (example for a 10-degree square):
select distinct species code (or tax_id)
where lat between 50 and 60 and long between -10 and 0
 Equivalent c-squares spatial query:
select distinct species code (or tax_id)
where csquares like ‘%7500%’
...runs faster, and makes use of efficiencies where multiple records occur in a single
square (duplicates are eliminated during the index building process)
 Could also operate with a single, long (skinny) table of species : c-square pairs if
advantageous (indexed on both columns)
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Now, can now do mapping (“Quick maps”)
direct from the index – no need to get the data (in first instance)
 Example: for minke whale, Balaenoptera acutorostrata – 1,700 records in the
system, in 400 squares ... codes for the latter:
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System
Example “quick map”...
 “Quick maps” use the
CSIRO Marine Research
c-squares mapper,
accessed via a web call
 Again, efficiencies here
in only sending the list of
squares to be mapped
rather than all the data
points (may execute e.g.
5–50 times faster)
 Effectively a “data
preview” function, prior to
getting the data for more
sophisticated mapping /
analysis if desired
 Also functions as a
clickable GUI to query
individual data points in a
region.
Evolving concepts in the architecture of OBIS, the Ocean Biogeographic Information System