Biodiversity Information Standards: are we going wrong, or

Download Report

Transcript Biodiversity Information Standards: are we going wrong, or

Biodiversity Information
Standards: are we going
wrong, or just not quite
right?
Jim Croft
Australian National Herbarium
Australian National Herbarium
Centre for Plant Biodiversity Research
Australian National Botanic Gardens
Parks Australia
Taxonomy Research and Information
Network
Parks Australia
Department of the Environment, Water,
Heritage and the Arts
TDWG
IN
AUSTRALIA
TDWG in Australia
INSTITUTIONS – Queensland
INSTITUTIONS – Northern Territory
Darwin
Darwin
Perth
Townsville
Maroochydore
Brisbane
Lismore
Armidale
Orange
Gosford
Canberra
Sydney
Alice Springs
Adelaide
Perth
Adelaide
Melbourne
Hobart
Launceston
Devonport
Hobart
Devonport
Launceston
vAustralian National Insect Collection (CSIRO)
vAustralian National Herbarium (CSIRO)
vAustralian National Wildlife Collection (CSIRO)
vGAUBA Herbarium
vAustralian Biological Resources Study
Australian examples
• Australian Plant Name Index
– Australian Plant Census
• Australian Fauna Directory
• Australia’s Virtual Herbarium
• Online Zoological Catalogue of
Australian Museums
• Flora of Australia On-line
• Atlas of Living Australia
• Identify Life
• Taxonomy Research and Information
Network
Australian examples
• Australian Plant Name Index
– Australian Plant Census
• Australian Fauna Directory
• Australia’s Virtual Herbarium
• Online Zoological Catalogue of
Australian Museums
• Flora of Australia On-line
• Atlas of Living Australia
• Identify Life
• Taxonomy Research and Information
Network
HISCOM
• Herbarium Information Systems Committee
– Representatives at TDWG 2008
–
–
–
–
–
–
–
–
Ben Richardson, Alex Chapman (PERTH)
Bill Barker (AD)
Alison Vaughan (MEL)
Karen Wilson (NSW)
Donna Lewis (DNA)
Jerry Cooper (CHR, NZ)
Helen Thompson (ABRS)
Greg Whitbread, Jim Croft (CANB)
– The crucible of biodiversity informatics
creativity
TDWG principle # 0
• A good idea has a thousand fathers
• A bad one is a bastard
TDWG: making anarchy
chaos the standard
TDWG principle # VI-a
“Before the beginning of great
brilliance, there must be chaos.
Before a brilliant person begins
something great, they must look
foolish in the crowd.”
- I Ching
TDWG: the art of herding
cats
TDWG: changing
standards, or making
change the standard?
TDWG: Standardizing
stuff...
or
stuffing standards?
Outline
•
•
•
•
•
•
•
What is TDWG?
TDWG and ‘Standards’
Where TDWG Standards are needed
Some TDWG projects
TDWG Standards compliance
Tensions for TDWG
Future
WHAT
IS
TDWG?
TDWG Mission
• Develop, adopt and promote standards
and guidelines for the recording and
exchange of data about organisms
• Promote the use of standards through
the most appropriate and effective
means and
• Act as a forum for discussion through
holding meetings and through
publications
TDWG Mission
• Develop, adopt and promote standards
and guidelines for the recording and
exchange of data about organisms
• Promote the use of standards through
the most appropriate and effective
means and
• Act as a forum for discussion through
holding meetings and through
publications
Who are we?
‘TDWG is us’
Who are we?
• Intersection of specimens, taxonomy,
knowledge, information management
• Biologists, taxonomists, computer
scientists
– Each with an interest in the other’s
domains
– Each with something to offer each
other’s domains
Who are we?
• If TDWG did not exist, we would have
to invent it
• Successful
– Enduring
– Popular
– Moderately well recognized
When are we?
• Phases of TDWG
– Phase 0
(1985)
• seemed like a good idea at the time
– phase 1
(first decade)
• Data dictionaries, data models
– phase 2
(second decade)
• E-R models, DIGIR, DwC, XML, etc.
– phase 3
(nowish)
• Schemas, ontologies, RDF
– Phase 4
• ?
(?)
Why are we?
• Collaboration and sharing is
essential
–
–
–
–
–
Taxonomy has become too big
Too diverse
Too complex
No one person can do it all
A ‘complete’ treatment requires
collaboration
– Collaboration requires consistency,
standards
Biodiversity
Tower of Babel
** notes **
Why are we?
• Untangle the ‘biodiversity Babel’
• Develop common communication
• Harness efficiency of collaboration
• Economic pressures to reduce
duplication
Why are we?
• Science of information meets science
of information technology
• Take advantage of new technology
• Taxonomy needs to be seen to be
evolving
• “Business as usual is not an option”
Why are we?
• An annual excuse to meet in warm
places when it is cold elsewhere?
Where do we fit?
xkcd.com
taxonomists
TDWG
informaticists
computerists
Where have we come from?
• Frustrated taxonomists
– Looking for a better way
– Largely self taught
• Bored computer scientists
– Looking for excitement, challenge
• Misfits and visionaries
– In search of a ‘Brave New World’
• Egomaniacs
– In search of glory, fame, power, riches
What are we now?
• Frustrated taxonomists
– Looking for a better way
– Largely self taught
• Bored computer scientists
– Looking for excitement, challenge
• Misfits and visionaries
– In search of a ‘Brave New World’
• Egomaniacs
– In search of glory, fame, power, riches
Where are we going?
?
Where are we going?
• Did we go wrong?
– Where did we go wrong?
– Why did we go wrong?
• Lost the plot?
– Regain credibility?
• Our community?
• Our funders?
• Ourselves?
Where are we going?
• Perceptions of TDWG?
– First decade
• Taxonomists organizing their domain
• Content focused
• Understandable by taxonomists
– Second decade
•
•
•
•
Taxonomists reaching limitations
Engaging technologists
Protocol and systems focussed
Opaque to taxonomists
– Third decade?
Where are we going?
• Perceptions of TDWG?
– First decade
• Content
• Data dictionaries
• Lists, vocabularies
– Second decade
• Protocols
• Formats, structure
• Applications
– Third decade?
• Ontologies?
• Semantics?
Where are we going?
• What should TDWG be about?
– The data?
– The technology?
– The applications?
– The community?
TDWG Impediments
•
•
•
•
•
•
•
Resources, funds
Time
Impetus, will, drive
Complexity, domain knowledge
Conservatism
Rivalry
Intellectual property, revenue
advantage
THE
TDWG
VISION
A vision for TDWG
• Our domain in biodiversity?
–
–
–
–
–
–
–
Taxonomy?
Systematics?
Collections?
Biodiversity?
Publications?
Knowledge Management?
Knowledge discovery?
– All of the above?
A vision for TDWG
• Our Community?
–
–
–
–
–
–
–
Herbaria and museums?
Researchers?
Government and policy?
Conservation agencies? NGOs?
Natural resource management?
Education?
Public?
– All of the above?
A vision for TDWG
• Our questions?
–
–
–
–
–
–
–
–
–
What is it? How can I find out?
What does it look like?
Where does it occur?
Was it still there? When?
What occurs there with it?
What might occur there with it?
What is it related to?
Who says so?
How? Why?
– All of the above?
A vision for TDWG
• Our Products?
–
–
–
–
–
–
–
Data
Data
Data
Data
Data
Data
Data
content standards?
storage standards?
communications protocols?
management applications?
management infrastructure?
visualization applications?
analysis applications?
– All of the above?
Knowledge pyramid
Wisdom
Knowledge
Information
Data
Samples
The Real World
TDWG
AND
STANDARDS
What is a standard?
• In common English:
–
–
–
–
–
–
–
–
–
–
–
A flag
An upright pole or beam
A backing for currency
American automobile
A bush on a long stalk
An ideal to be judged against
Model of authority or excellence
A basis for comparison
1,980 board feet of wood
A newspaper
An established norm
What is a standard?
• Rarely implies:
–
–
–
–
–
Requirement
Obligation
Compulsion
Compliance
‘The law’
• But not so ‘technical standards’
– Specify behaviour
– Mandate behaviour
What is a standard?
• “an explicit set of requirements to
be satisfied by a material, product,
or service”
- (ATSM International)
TDWG
STANDARDS
TDWG Standards categories
• Technical specification (TS)
(3)
– Protocol, service, procedure, format
• Applicability statement (AS) (1
– How a tech. spec. might be applied
• Best current practice (BCP)
(0)
– A description of good behaviour
• Data standard (DS)
(0)
– Content or controlled vocabularies
draft)
TDWG Standards status
• Current standard
– (3)
• Current 2005 Standard
– (3?)
• Draft Standard
– (3)
• Prior Standard
– (7 tech specs; 6 data standards)
• Retired Standard
– (0)
THE
STANDARDS
PROCESS
ISO Standards process
• ISO standards are:
– Consensus
– Industry wide
– Voluntary
ISO Standards process
• 0 preliminary
– Study period underway
• 1 proposal
– New project under consideration
• 2 preparatory
– Working draft(s) under consideration
• 3 committee
– Committee draft(s) under consideration
• 4 approval
– Final draft standard under consideration
• 5 publication
– Standard prepared for publication
TDWG Standards process
• TDWG standards are:
– Consensus
– Community wide (+/-)
– Voluntary
TDWG Standards Process
TDWG
STANDARDS
PRESENT
TDWG standards – present
• ABCD
– Access to biological collections data
• SDD
– Structured Descriptive Data
• TCS
– Taxon Concept Schema
Not bad for 22 years work...
TDWG
STANDARDS
PAST
TDWG standards - past
• ‘Prior Standards’
• Technical Specs (protocol stds):
–
–
–
–
–
–
–
HISPID 3 (now on v.5)
POSS (Plant Occurrence and Status)
Economic Botany Data Collection Std
Plant Names in Botanical Databases
XDF – language for definition and exchange
ITF – Botanic Gardens Records
DELTA
TDWG standards - past
• ‘Prior Standards’
• Data standards (Content stds)
– Authors of Plant Names
– World Geographic Scheme for Plant
Distributions
– Botanico Periodicum Huntianum
– Index Herbariorum
– Floristic Regions of the World
– TL2 – Taxonomic Literature and suppl.
TDWG
STANDARDS
FUTURE
TDWG standards – future
• ‘Draft standards’
– Real soon now
• Standards documentation spec.
– The standard way to do standards
• LSID Applicability Statement
– How to do LSIDs
• NCD
– Natural Collections Description
TDWG standards – future
• Watch this space?
• Observation data
– Occurrence without specimens?
– Ecological metadata language
• Phylogenetics data
– Phylogeny repositories
– Trees of life
– Phylocode
TDWG standards – future
• Watch this space?
• SPM – Species Profile Model
– Online Journals; On-line Floras
– Interactive Keys
• Images and multimedia
• Ethnobotany ontology
TDWG standards – future
• How are we going to manage this?
– Activities straddle many standards
– Potential for duplication, conflict
• Technical Architecture Group
–
–
–
–
Ontologies
Vocabularies
Conflict identification, resolution
Evaluation, advice, recommendations
WHERE TDWG
ARE NEEDED
STANDARDS
Where are TDWG standards
needed?
•
•
•
•
•
•
•
•
Nomenclature
Taxonomy
Bibliographic
Specimens
Identification
Description
Images
Multimedia
•
•
•
•
•
•
•
•
Occurrence
Spatial
Observation
Molecular
Phylogeny
People
Institutions
etc.
Where are TDWG standards
needed?
• The problem:
• TDWG activities have been activity
and discipline based
– ABCD as an example
• Names, taxa, specimens, places, people, etc.
• Need to look at data from an
ontological perspective
– Data based
• Not activity based
TDWG – the 3-legged stool
• (definition of ‘stool’?)
• GUIDs
• Ontologies
• Exchange protocols
TDWG – the 3-legged stool
• Management cliche
• Planning
• Money
• Management
---
• Production
• Marketing
• Administration
---
• etc
TDWG – the 3-legged stool
TDWG STANDARDS
COMPLIANCE
TDWG standards compliance
• Pretty poor
– Within institutions / projects
– Between institutions / projects
• Partial compliance is not compliance
• Enhancement is not compliance
• Extension is not compliance
TDWG standards compliance
• Why not?
–
–
–
–
Too complicated?
Inappropriate?
Deficient?
Too costly to implement?
–
–
–
–
Conservatism?
Apathy?
Individual arrogance?
Institutional arrogance?
TDWG standards compliance
• Need for stability
• TDWG has a reputation
– Pursuing the ‘bleeding edge’
– “Keeping up with the Jones’s”
– Introducing new recommendations before
old ones settled
– Frustrating users
• Especially smaller institutions
TDWG standards compliance
• Total cost of ownership
– Ultra technical solutions
• Rare specialist skills
• Expensive contractors
– Maintenance costs
– Upgrade costs
– Migration costs
– Users get stuck
TDWG standards compliance
• What can be done?
– Rationalization of standards?
– More control of standards process?
– Seek ‘appropriate technology’?
• Not necessarily the best
–
–
–
–
Seek cheaper solutions?
Focus on the ontologies, not activities?
Apply institutional pressure?
Institutional mentorship and support?
THE
TENSIONS
FOR
TDWG
Tensions in TDWG
•
•
•
•
•
•
•
•
•
Taxonomy / technology
Innovation / stability
Innovation / conservatism
Names / taxonomy
Names / specimens
Names / names
Authority / credit
Ownership / responsibility
Data / metadata
Why not?
• Why not web 2.0 / 3.0?
• Why not annotations?
• Why not Wikipedia?
• Why not microformatting?
Disconnects
• Free access / ownership
– Licensing, attribution, IP, credit
• Taxonomy / specimens
– The big lie
• Concepts / names
– Another big lie
• Linking taxa through basionyms
– Another big lie
• Data / metadata
• Distributed systems vs cache
Metadata
• So-called ‘data about data’
• “One man’s data is another’s
metadata”
• Not a good or inspiring look
• Need a common and agreed
understanding in TDWG domain
Metadata
• Problem of LSID byte persistence
–
–
–
–
–
Applies to data
Does not apply to metadata
Redefine data as metadata?
Sophistry?
Distorting our ontologies?
• Need to sort this out
• Need to communicate the result
Metadata
Yesterday upon the stair
Metadata wasn't there
It wasn't there again today
How I wish it would go away
The 3 big lies
• Names and specimens
– That there is some real connection
between specimens bearing the same name
– That distribution maps of specimens
bearing the same name are meaningful
– That identifications bearing the same
name represent the same taxon
– The ‘taxon concept problem’
– Concept not explicit
The 3 big lies
• Names and concepts
– That names somehow imply an unambiguous
taxon concept
– That a taxon concept can be inferred
from a name
– An assumption
– The ‘taxon concept problem’
– Concept not explicit
The 3 big lies
• Names and types
– That if we are talking about names based
on the same type they are the same taxon
concept
– That lists of names and synonyms based
on the same type can be automatically
merged
– The ‘taxon concept problem’
– Concept not explicit
The 3 big lies
• What can we do?
– Taxon reporting not unambiguous
– Our results are at best indicative
• Users assume or infer concepts
– Perhaps biggest problem in taxonomy and
biodiversity informatics
– Be absolutely rigorous in talking about
names and named concepts
– Educate taxonomists
– Educate clients
• Limitations of data, applications
• Implications of using data, limitations
TDWG value for money
• Are we worth it?
– This meeting cost c. $ 1,000,000
• Airfares, accommodation, salaries, etc.
– What did we accomplish?
• Tangibles?
• Intangibles?
– What have we produced so far?
• 3 standards, several +/- standards
• Compliance?
• A ‘state of mind’?
TDWG value for money
• Can we do it better?
– Can we do it cheaper, faster?
• Use the wiki/listserv better
– Accomplish more?
• New standards
• Better standards
– Produce more?
• New standards?
• Retire standards?
• Rationalize standards?
WHERE
TO
FROM
HERE
Where to from here?
• Tools at our disposal
–
–
–
–
–
–
TWDG Executive
Technical Architecture Group
TDWG working groups
On-line forums, lists
Web and Wiki
On-line Journal
Where to from here?
• Increase TDWG Profile
– ‘Market penetration’
– Greater implementation, compliance
– Attention to smaller institutions
• ‘the long tail’
– Multilingual standards
– Strengthen partnerships, collaboration
• GBIF, EoL, etc.
• National initiatives
Where to from here?
• TAG
–
–
–
–
Coordination of standards
Ontologies
Resolve metadata issues
Retire or deprecate standards
• ‘Us’
– Participation
– Implementation
– Compliance
Where to from here?
xkcd.com
TDWG – a glass half full
• TDWG has a lot to do
• But it has accomplished a lot
• Without the foundation of TDWG there
could be:
–
–
–
–
–
No
No
No
No
No
AVH
ALA
GBIF
EoL
[name your biodiversity acronym]
TDWG – a glass half full
• TDWG has strong participant support
– C. 200 participants in TDWG 2008
• Key institutional engagement
–
–
–
–
International
National
Regional
Local
• Increasing demand for products
– Global change, habitat depletion, etc.
TDWG Mission
• Develop, adopt and promote standards
and guidelines for the recording and
exchange of data about organisms
• Promote the use of standards through
the most appropriate and effective
means and
• Act as a forum for discussion through
holding meetings and through
publications
** notes **
TDWG?