ncor.buffalo.edu

Download Report

Transcript ncor.buffalo.edu

Distributed Common Ground System – Army
(DCGS-A)
The Role of Ontology in the Era
of Big (Military) Data
Barry Smith
Director
National Center for Ontological Research
1
Distributed Development of a
Shared Semantic Resource (SSR)
in support of US Army’s Distributed Common
Ground System Standard Cloud (DSC) initiative
with thanks to: Tanya Malyuta, Ron Rudnicki
Background materials: http://x.co/yYxN
2
3
Making data (re-)usable through
common controlled vocabularies
• Allow multiple databases to be treated as if
they were a single data source by eliminating
terminological redundancy in ways data are
described
– not ‘Person’, and ‘Human’, and ‘Human Being’, and
‘Pn’, and ‘HB’, but simply: person
• Allow development and use of common tools
and techniques, common training, single
validation of data, focused around
– semantic technology
– coordinated ontology development and use
4
Ontology =def.
• controlled vocabulary organized as a graph
• nodes in the graph are terms representing types
in reality
• each node is associated with definition and
synonyms
• edges in the graph represent well-defined
relations between these types
• the graph is structured hierarchically via subtype
relations
5
Ontologies
• computer-tractable representations of types
in specific areas of reality
• divided into more and less general
– upper = organizing ontologies, provide common
architecture and thus promote interoperability
– lower = domain ontologies, provide grounding in
reality
• reflecting top-down and bottom-up strategy
6
Success story in biomedicine
Goal: integration of biological and clinical data
– across different species
– across levels of granularity (organ,
organism, cell, molecule)
– across different perspectives (physical,
biological, clinical)
– within and across domains (growth, aging,
environment, genetic disease, toxicity …)
8
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
9
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
COMPLEX OF
ORGANISMS
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Population
Population
Phenotype
Organ
Anatomical
Function
Organism
Entity
(FMP, CPRO)
(NCBI
(FMA,
Phenotypic
Taxonomy)
CARO)
Quality
(PaTO)
Cellular
Cellular
Cell
Component Function
(CL)
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Population-level ontologies
Population
Process
Biological
Process
(GO)
Molecular Process
(GO)
10
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Organism
(NCBI
Taxonomy)
Anatomical
Entity
(FMA,
CARO)
Cell
(CL)
Cellular
Component
(FMA, GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Environment
Ontology
GRANULARITY
Organ
Function
(FMP, CPRO)
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular Process
(GO)
Environment Ontology
11
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
12
OBO Foundry approach extended into
other domains
NIF Standard
ISF Ontologies
OGMS and Extensions
IDO Consortium
cROP
Neuroscience
Information Framework
Integrated Semantic
Framework
Ontology for General
Medical Science
Infectious Disease
Ontology
Common Reference
Ontologies for Plants
13
Modular organization + Extension strategy
top level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
domain
level
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
14
~100 ontologies using BFO
US Army Biometrics Ontology
Brucella Ontology (IDO-BRU)
eagle-i and VIVO (NCRR)
Financial Report Ontology (to support SEC through XBRL)
IDO Infectious Disease Ontology (NIAID)
Malaria Ontology (IDO-MAL)
Nanoparticle Ontology (NPO)
Ontology for Risks Against Patient Safety
(RAPS/REMINE)
Parasite Experiment Ontology (PEO)
Subcellular Anatomy Ontology (SAO)
Vaccine Ontology (VO)
15
…
Basic Formal Ontology
BFO:Entity
BFO
BFO:Continuant
BFO:Independent
Continuant
BFO:Dependent
Continuant
BFO:Occurrent
BFO:Process
BFO:Disposition
Tuesday, July 21, 2015
16
Basic Formal Ontology
and Mental Functioning Ontology (MFO)
BFO:Entity
BFO
BFO:Continuant
BFO:Independent
Continuant
BFO:Occurrent
BFO:Dependent
Continuant
MFO
BFO:Process
Bodily Process
Organism
BFO:Disposition
BFO:Quality
Mental Functioning
Related Anatomical
Structure
Tuesday, July 21, 2015
Cognitive
Representation
Mental Process
Behaviour
inducing state
Affective
Representation
17
Emotion Ontology extends MFO
BFO
BFO:Entity
MFO
BFO:Continuant
BFO:Independent
Continuant
Organism
BFO:Occurrent
MFO-EM
BFO:Dependent
Continuant
BFO:Process
BFO:Disposition
Physiological
Response to
Emotion Process
Mental Process
Cognitive
Representation
inheres_in
Emotional Action
Tendencies
Bodily Process
Appraisal
Process
Affective
Representation
is_output_of
Appraisal
Emotional
Behavioural Process
Subjective
Emotional Feeling
has_part
agent_of
Emotion Occurrent
Sample from Emotion Ontology: Types of Feeling
Tuesday, July 21, 2015
19
The problem of joint / coalition operations
Intelligence
Fire
Support
Targeting
Maneuver
&
Blue
Force
Tracking
Air
Operations
Civil-Military
Operations
Logistics
23
US DoD Civil Affairs strategy for non-classified
information sharing
24
Ontologies / semantic technology
can help to solve this problem
Intelligence
Fire
Support
Targetin
g
Maneuver
&
Blue Force
Tracking
Air
Operations
Civil-Military
Operations
Logistics
25
But each community produces its own ontology,
this will merely create new, semantic siloes
Intelligence
Fire
Support
Targeting
Maneuver
&
Blue
Force
Tracking
Air
Operations
Civil-Military
Operations
Logistics
26
What we are doing to avoid the
problem of semantic siloes
Distributed Development of a Shared
Semantic Resource
Pilot testing to demonstrate feasibility
27
creating the analog of this in the military domain
top level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
domain
level
Cell
Ontology
(CL)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
28
Semantic Enhancement
Annotation (tagging) of source data models using
terms from coordinated ontologies
– data remain in their original state (are treated at arms
length)
– tagged using interoperable ontologies created in tandem
– can be as complete as needed, lossless, long-lasting
because flexible and responsive
– big bang for buck – measurable benefit even from first
small investments
Coordination through shared governance and
training
29
Main challenge: Will it scale?
The problem of scalability turns on
• the ability to accommodate ever increasing
volumes and types of data and numbers of
users
• can we preserve coordination (consistency,
non-redundancy) as ever more domains
become involved?
• can we respond in agile fashion to ever
changing bodies of source data?
31
Strategy for agile ontology creation
• Identify or create carefully validated general
purpose plug-and-play reference ontology
modules for principal domains
• Develop a method whereby these reference
ontologies can be extended very easily to cope
with specific, local data through creation of
application ontologies
32
Reference Ontology
vehicle =def: an object used for
transporting people or goods
tractor =def: a vehicle that is used for
towing
crane =def: a vehicle that is used for
lifting and moving heavy objects
vehicle platform=def: means of providing
mobility to a vehicle
wheeled platform=def: a vehicle
platform that provides mobility through
the use of wheels
tracked platform=def: a vehicle
platform that provides mobility through
the use of continuous tracks
Application Ontology
artillery vehicle = def. vehicle designed for
the transport of one or more artillery
weapons
wheeled tractor = def. a tractor that has a
wheeled platform
Russian wheeled tractor type T33 =
def. a wheeled tractor of type T33
manufactured in Russia
Ukrainian wheeled tractor type T33
= def. a wheeled tractor of type T33
manufactured in Ukraine
Reference Ontology
vehicle =def: an object used for
transporting people or goods
tractor =def: a vehicle that is
used for towing
crane =def: a vehicle that is used for
lifting and moving heavy objects
vehicle platform=def: means of providing
mobility to a vehicle
wheeled platform=def: a vehicle
platform that provides mobility through
the use of wheels
tracked platform=def: a vehicle
platform that provides mobility through
the use of continuous tracks
Application Ontology
artillery vehicle = def. vehicle designed for
the transport of one or more artillery
weapons
wheeled tractor = def. a tractor that has a
wheeled platform
Russian wheeled tractor type T33 =
def. a wheeled tractor of type T33
manufactured in Russia
Ukrainian wheeled tractor
type T33 = def. a wheeled
tractor of type T33
manufactured in Ukraine
Basic Formal
Ontology
(BFO)
Extended
Relation
Ontology
Agent
Ontology
Artifact
Ontology
Event
Ontology
Geospatial
Ontology
Information
Entity
Ontology
Quality
Ontology
Time
Ontology
http://milportal.org
40
41
42
43
An example of agile application
ontology development:
The Bioweapons Ontology (BWO)
44
Kinds of chemical and biological
weapons
Chemical
Nerve agents (sarin gas)
Blister agents (mustard gas)
Blood agents (cyanide gas)
Biological
Infectious agents – BWO(I)
Toxic agents (botulinum toxin, ricin) – BWO(T)
45
We focus here on BWO(I)
Infectious agents
–Bacterial (anthrax, bubonic plague,
tularemia, brucellosis, cholera …)
–Viral (Ebola, Marburg …)
46
Examples of ontology terms
BFO
Independent
Continuant
Dependent
Continuant
Occurrent
IDO
StaphIDO
Infectious
disorder
Staph. aureus
disorder
Infectious
disease
MRSA
Protective
resistance
Methicillin
resistance
Infectious
disease
course
MRSA course
47
Infectious
Disease Ontology (IDO)
•
with thanks to Lindsay Cowell (University of Texas SW
Medical Center) and Albert Goldfain (Blue Highway, Inc.)
IDO Core (Reference Ontology)
• General terms in the ID domain.
IDO Extensions (Application Ontologies)
• Disease-, host-, pathogen-specific.
• Developed by subject matter experts.
The hub-and-spokes strategy ensures that logical
content of IDO Core is automatically inherited by
the IDO Extensions
IDO Core
• Contains general terms in the ID domain:
– E.g., ‘colonization’, ‘pathogen’, ‘infection’
• A contract between IDO extension ontologies
and the datasets that use them.
• Intended to represent information along
several dimensions:
– biological scale (gene, cell, organ, organism, population)
– discipline (clinical, immunological, microbiological)
– organisms involved (host, pathogen, and vector types)
Examples of ontology terms
BFO
Independent
Continuant
Dependent
Continuant
Occurrent
IDO
StaphIDO
Infectious
disorder
Staph. aureus
disorder
Infectious
disease
MRSA
Protective
resistance
Methicillin
resistance
Infectious
disease
course
MRSA course
50
IDO Extensions
IDO – Brucellosis
IDO – Dengue Fever
IDO – Influenza
IDO – Malaria
IDO – Staphylococcus Aureus Bacteremia
IDO – Vector Surveillance and Management
IDO – Plant
VO – Vaccine Ontology
BWO(I) – Bioweapons Ontology (Infectious Agents)
51
How IDO evolves: the case of Staph.
aureus
IDOMAL
IDOFLU
IDOCore
IDORatSa
IDORatStrep
HUB and
SPOKES:
Domain
ontologies
IDOStrep
IDOSa
IDOMRSa
IDOHumanSa
IDOHIV
IDOAntibioticResistant SEMI-LATTICE:
By subject matter
experts in different
communities of
IDOHumanStrep
interest.
IDOHumanBacterial
54
BWO:disease by infectious agent
= def. a disease that is the consequence of the presence of
pathogenic microbial agents, including pathogenic viruses,
pathogenic bacteria, fungi, protozoa, multicellular parasites,
and aberrant proteins known as prions
Strategy used to build BWO(I)
with thanks to Lindsay Cowell and Oliver He (Michigan)
1. Start with a glossary such as:
http://www.emedicinehealth.com/biological_warfare/
2. Select corresponding terms from IDO core and
related ontologies such as the CHEBI Chemistry
Ontology terms needed to describe bioweapons
3. All ontology terms keep their original definitions
and IDs.
4. The result is a spreadsheet
57
5. Where glossary terms have no ontology
equivalent, create BWO ontology terms and
definitions as needed
no corresponding
ontology term
58
6. Use the Ontofox too to create the first version of
the BWO(I) application ontology
(http://ontofox.hegroup.org/)
7. Use BWO(I) in annotations, and where gaps are
identified create extension terms, for instance
– weaponized brucella
– aerosol anthrax
– smallpox incubation period
This establishes a virtuous cycle between ontology
development and use in annotations
59
Potential uses of BWO
– semantic enhancement of bioweapons
intelligence data
– results will be automatically interoperable with
relevant bioinformatics and public health IT tools
for dealing with infections, epidemics, vaccines,
forensics, …
–to annotate research literature and research data
on bioweapons
– to create computable definitions to substitute for
definitions in free text glossaries
60
Why do people think they need lexicons
• Training
• Compiling lessons learned
• Compiling results of testing, e.g. of proposed new
doctrine
• Collective inferencing
• Official reporting
• Doctrinal development
• Standard operating procedures
• Sharing of data
• People need to (ensure that they) understand
each other