On June 22, 1799, in Paris, everything changed International System of Units.

Download Report

Transcript On June 22, 1799, in Paris, everything changed International System of Units.

On June 22, 1799, in Paris,
everything changed
3
International System of Units
4
Multiple kinds of data in multiple
kinds of silos
Lab / pathology data
EHR data
Clinical trial data
Patient histories
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
5
How to find data?
How to find other people’s data?
How to reason with data when you find it?
How to work out what data does not yet
exist?
6
How to solve the problem of making
the data we find queryable and reusable by others?
Part of the solution must involve:
standardized terminologies and
coding schemes
7
But there are multiple kinds of
standardization for biomedical data, and
they do not work well together
Terminologies (SNOMED, UMLS)
CDEs (Clinical research)
Information Exchange Standards (HL7 RIM)
LIMS (LOINC)
MGED standards for microarray data, etc.
top-down grid frameworks (caBIG)
8
most successful, thus far: UMLS
Unified Medical Language System
collection of separate terminologies built by trained
experts
massively useful for information retrieval and
information integration
UMLS Metathesaurus a system of post hoc
mappings between overlapping source vocabularies
developed according to different and sometimes
conflicting standards
9
for UMLS
local usage respected
regimentation frowned upon
cross-framework consistency not important
no concern to establish consistency with basic
science
different grades of formal rigor, different degrees of
completeness, different update policies, capricious
policies for empirical testing
10
A good solution to the silo problem
must be:
•
•
•
•
•
•
modular
incremental
bottom-up
evidence-based
revisable
incorporate a strategy for motivating
potential developers and users
11
ontologies = standardized labels
designed for use in annotations
to make the data cognitively
accessible to human beings
and algorithmically accessible
to computers
12
ontologies = high quality controlled
structured vocabularies for the
annotation (description) of data
13
Ramirez et al.
Linking of Digital Images to Phylogenetic Data Matrices Using a
Morphological Ontology
Syst. Biol. 56(2):283–294, 2007
ontologies used in curation of literatur
what cellular component?
what molecular function?
what biological process?
15
Ontologies
help integrate complex
representations of reality
help human beings find things in
complex representations of reality
help computers reason with complex
representations of reality
16
The Gene Ontology
Ontologies facilitate grouping of annotations
brain
hindbrain
rhombomere
20
15
10
Query brain without ontology 20
Query brain with ontology
45
but they succeed in this only if there is
one consensus ontology for each domain
18
19
20
People are extending the GO
methodology to other domains of
biology and of clinical and
translational medicine?
21
The standard engineering methodology
• It is easier to write useful software if one works
with a simplified model
• (“…we can’t know what reality is like in any
case; we only have our concepts…”)
• This looks like a useful model to me
• (One week goes by:) This other thing looks like
a useful model to him
• Data in Pittsburgh does not interoperate with
data in Vancouver
• Science is siloed
an analogue of the UMLS problem
proliferation of tiny ontologies by different
groups with urgent annotation needs
23
the solution
establish common rules governing best
practices for creating ontologies in
coordinated fashion, with an evidencebased pathway to incremental
improvement
25
First step (2001)
a shared portal for (so far) 58 ontologies
(low regimentation)
http://obo.sourceforge.net  NCBO BioPortal
26
27
OBO builds on the principles
successfully implemented by the GO
recognizing that ontologies need to
be developed in tandem
28
The methodology of cross-products
compound terms in ontologies to be defined
as cross-products of simpler terms:
E.g elevated blood glucose is a cross-product of
PATO: increased concentration with FMA: blood and
CheBI: glucose.
= factoring out of ontologies into disciplinespecific modules (orthogonality)
29
The methodology of cross-products
enforcing use of common relations in linking terms
drawn from Foundry ontologies serves
• to ensure that the ontologies are maintained and
revised in tandem
• logically defined relations serve to bind terms in
different ontologies together to create a network
30
Third step (2006)
The OBO Foundry
http://obofoundry.org/
31
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
32
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
initial OBO Foundry coverage
33
CRITERIA
CRITERIA
 opennness
 common formal language.
 collaborative development
 evidence-based maintenance
 identifiers
 versioning
 textual and formal definitions
34
Orthogonality = modularity
• one ontology for each domain
• no need for mappings (which are in
any case too expensive, too fragile,
too difficult to keep up-to-date as
mapped ontologies change)
• everyone knows where to look to
find out how to annotate each kind
of data
35
CRITERIA
 COMMON ARCHITECTURE: The ontology uses
relations which are unambiguously defined
following the pattern of definitions laid down in
the Basic Formal Ontology (BFO)
36
OBO Foundry
provides guidelines (traffic laws) to
new groups of ontology developers in
ways which can counteract current
dispersion of effort
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
38
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
39
Basic Formal Ontology
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
BFO: The Very Top
continuant
independent
continuant
dependent
continuant
quality
function
role
disposition
occurrent
function
- of liver: to store glycogen
- of birth canal: to enable transport
- of eye: to see
- of mitochondrion: to produce ATP
- of liver: to store glycogen
not optional; reflection of physical
makeup of bearer
role
optional:
exists because the bearer is in
some special natural, social, or
institutional set of circumstances in
which the bearer does not have to
be
role
- bearers can have more than one role
person as student and staff member
- roles often form systems of mutual
dependence
husband / wife
first in queue / last in queue
doctor / patient
host / pathogen
role
of some chemical compound: to serve as
analyte in an experiment
of a dose of penicillin in this human child:
to treat a disease
of this bacteria in a primary host: to cause
infection
A good solution to the silo problem must
be:
•
•
•
•
•
•
modular
incremental
bottom-up
evidence-based
revisable
incorporate a strategy for motivating
potential developers and users
46
Because the ontologies in the
Foundry
are built as orthogonal modules which form an
incrementally evolving network
• scientists are motivated to commit to
developing ontologies because they will need in
their own work ontologies that fit into this
network
• users are motivated by the assurance that the
ontologies they turn to are maintained by
experts
47
More benefits of orthogonality
• helps those new to ontology to find what they
need
• to find models of good practice
• ensures mutual consistency of ontologies
(trivially)
• and thereby ensures additivity of annotations
48
More benefits of orthogonality
• it rules out the sorts of simplification and
partiality which may be acceptable under
more pluralistic regimes
• thereby brings an obligation on the part of
ontology developers to commit to scientific
accuracy and domain-completeness
49