Joanne S. Luciano`s Slide

Download Report

Transcript Joanne S. Luciano`s Slide

Ontologies in the
BioMedical
Domain
Joanne S. Luciano, PhD
Predictive Medicine, Inc., Belmont, MA (predmed.com)
Rensselaer Polytechnic Institute, Troy, NY (twc.rpi.edu)
Semantic Computing in Healthcare Industry
September 16-18, 2013
Hyatt Regency Irvine, Irvine, CA, USA
PRESENTER
BS, MS Computer Science
PhD
Cognitive and Neural Systems
Enabling
(Computational Neuroscience)
Health and Wang Labs
Wellbeing
Harvard Medical School
through
Knowledge MITRE
Technology Lotus Development
Predictive Medicine, Inc.
Rensselaer Polytechnic Institute
Joanne S. Luciano
Email:
[email protected]
[email protected]
Interests
Flying planes, climbing rocks, traveling,
balancing rocks, art & music, tbd
Community
BioPathways Consortium, BioPAX, W3C
HCLSIG
Overview
1.
Introduction (10 minutes)
1.
Scope
1.
2.
3.
BioMed Domain not Ontology Building
Overview
Background
1.
2.
3.
2.
Health Care, Life Science, and People
Reference or Application
Function (Use Case)
Examples 3 reference, 1 application, Best Practices (40 Minutes)
1.
2.
3.
4.
5.
UMLS – High level across biomedicine (5)
BioPAX – Mid level – biological pathways (10)
Gene Ontology (“GO”) – Gene annotation (5)
Influenza Ontology (5)
Best Practices (10)
1.
2.
3.
4.
Process: Start with Use Case, develop prototype
Details: Converting Data to RDF to support Linked Open Data
Standards: BioMedical Ontology Best practices (BioPortal, BFO, SIO)
Get Involved!
3. Conclusion (5 minutes)
1.
Recap, Where to go for more information
Introduction
Scope
 BioMed Domain
Overview
 What will be presented
Background
1. Domain: Health Care, Life Science, and People
2. What are you building: Reference vs. Application
3. Why: Function (Use Case)
Scope
Health Care & Life Science
The Open Biological and Biomedical Ontologies
http://www.obofoundry.org
Goal: a suite of orthogonal interoperable reference ontologies
Barry Smith
U Buffalo, NCBO
From: Nat Biotechnol. 2007 November; 25(11): 1251.
doi: 10.1038/nbt1346
Scope
Ontology Uses
 Knowledge Management
 Annotate data (such as genomes)
 Access information (search, find, and access)
 Map across ontologies relate
 Data integration and exchange
 Model dynamic cellular processes
 Identify Drug Interactions
 Decision support
 SafetyCodes
 Diabetic Care
 Lab Alerts
(Bodenreider YBMI 2008)
http://themindwobbles.wordpress.com/2009/05/04/olivier-bodenreider-nlm-bestpractices-pitfalls-and-positives-cbo-2009/
What will be presented
3 Reference Ontology Examples
 UMLS – High level across biomedicine
 BioPAX – Mid level – biological pathways
 Gene Ontology (“GO”) – Gene annotation
2 Application Ontology Example
 Influenza Ontology
 Translational Medicine Ontology
Background
1. Domain: Health Care, Life Science, and People
1. Times have changed
2. Data Driven Medicine
3. Health Care Singularity
2. What are you building: Reference vs. Application
1. Ontology Spectrum
2. Reference vs Application Ontology
3. Why: Function (Use Case)
1. Link, Aggregate, Search, Integrate, etc.
Times have changed




Aging population (end of life costly)
More people with chronic illnesses
The end of the blockbuster era
Personalized Medicine (right treatment to
the right patient at the right time)
 Need lower cost drug development
 Improved patient response to treatment
(Evidence Based)
 Web and Mobile
 The technology itself (ubiquitous)
 Patients increasingly engaging
Photo: http://www.flickr.com/photos/sepblog/4014143391/
9
Data Driven Medicine:
Shifts in thinking and practice:
Data, Not Programs
Sharing, Not Hoarding
Personal, Not Population
10
Ontology Spectrum
Existing formalisms
Weak
Semantics
Strong
Semantics
Reuse of terminological resources for efficient ontological
engineering in Life Sciences
by Jimeno-Yepes, Antonio; Jiménez-Ruiz, Ernesto; Berlanga-Llavori,
Rafael; Rebholz-Schuhmann, Dietrich Journal: BMC
Bioinformatics Vol. 10 Issue Suppl 10 DOI: 10.1186/1471-2105-10S10-S4
http://www.mkbergman.com/wpcontent/themes/ai3v2/images/2007Posts/070501d_SemanticSpe
Application vs. Reference
Ontology
Reference Ontology
 Intended as an authorative source
 True to the limits of what is known
 Used by others
 Application Ontology




Built to support a particular application (use case)
Reused rather than define terms
Skeleton structure to support application
Terms defined refine or create new concepts directly or through
new classes based on inference
http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
What is UMLS?
The UMLS, or Unified Medical Language System
Enables Interoperability between computer systems
 Files
 Software
that brings together many health and biomedical
 vocabularies and standards
You can use the UMLS to enhance or develop
applications, such as electronic health records,
classification tools, dictionaries and language translators.
http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
http://www.nlm.nih.gov/research/umls/quickstart.html
Unified Medical Language System In
Use
Use UMLS to link health information, medical terms, drug names, and
billing codes across different computer systems.
Some examples:
 Linking terms and codes between doctor, pharmacy, and insurance
company
 Patient care coordination among several departments within a
hospital
 SNOMED, ICD-9, LOINC, RxNorm – clinical setting, more about
this later in the next part of the tutorial
The UMLS has many other uses, including search engine retrieval,
data mining, public health statistics reporting, and terminology
research.
http://www.nlm.nih.gov/research/umls/presentations/2004-medinfo_tut.pdf
Unified Medical Language System
Knowledge Sources
The UMLS has three tools, called the
Knowledge Sources:
UMLS
 Metathesaurus: Terms and codes from many
vocabularies, including CPT®, ICD-10-CM, LOINC®,
MeSH®, RxNorm, and SNOMED CT®
 Semantic Network: Broad categories (semantic types)
and their relationships (semantic relations)
 SPECIALIST Lexicon and Lexical Tools: Natural
language processing tools
Unified Medical Language System
Metathesaurus
NLM uses the Semantic Network and Lexical Tools to produce
the Metathesaurus.
Metathesaurus production involves:
 Processing the terms and codes using the Lexical Tools
 Grouping synonymous terms into concepts
 Categorizing concepts by semantic types from the Semantic
Network
 Incorporating relationships and attributes provided by
vocabularies
 Releasing the data in a common format
They can be accessed separately or in any combination
according to your needs.
Unified Medical Language System
Access to the UMLS
The UMLS Terminology Services (UTS) provides three ways to
access the UMLS:
 Web Browsers You can search the data through UTS
applications:
 Metathesaurus Browser - Retrieve UMLS concept information,
including CUIs, semantic types, and synonymous terms.
 Semantic Network Browser - View the names, definitions, and
hierarchical structure of the Semantic Network.
 Local Installation download, customize and load into your
database system, or browse your data using the MetamorphoSys
RRF browser.
 Web Services APIs You can use NLM’s application programming
interfaces (APIs) to query the UMLS data within your own
application.
Unified Medical Language System
License Required
 Request a license (FREE)
 Sign up for a UMLS Terminology Services (UTS)
account.
 UMLS licenses are issued only to individuals
 NLM is a member of theIHTSDO (owner of SNOMED CT),
and there is no charge for SNOMED CT use in the United
States and other member countries. Some uses of the
UMLS may require additional agreements with individual
terminology vendors.
The UTS account allows you to browse, download, and query
the UMLS.
BioPAX
Biological PAthway eXchange
An abstract data model for biological pathway
integration
Initiative arose from the community
19
Biological Pathways of the Cell
BioPAX
What’s a pathway?
Depends on who you ask!
A series of
chemical reactions
catalyzed
by enzymes
and connected
Metabolic
Pathways
BioPAX
Level 1
The products of one are the reactants
of the next
e.g. Conversion, Transport
20
Biological Pathways of the Cell
BioPAX
BioPAX
Level 2
Molecular Interaction Networks
All molecular interactions are
fundamentally electrostatic
in nature and can
be described by
Coulombs Law:
Interactions that Bind
One classification:
•Short range repulsive interactions
•Electrostatics (charge-charge
interactions)
•Dipolar interactions
•Fluctuating dipole
•Hydrogen bonding interactions
•Solvent, counter ion, and entropic
effects
•Water and the hydrophobic effect
Cells are complex systems
whose physiology is governed
by an intricate network of
molecular
interactions (MIs) of
•Electrons to nuclei
in atoms
•Atoms to atoms in molecules
which a relevant subset are
•Molecules to molecules in liquids
and solids.
protein–protein interactions
(PPIs).
http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1751541/?tool=pubmed
21
BioPAX
Biological Pathways of the Cell
BioPAX
Level 3
Signaling
molecules
trigger
cellular
responses.
They bind to
the
cell surface
causing
a cascade
of activation
reactions:
Signaling
Pathways
A activates B activates C….
Adapted from Cell Signalling Biology - Michael J. Berridge - www.cellsignallingbiology.org - 2012
and http://www.hartnell.edu/tutorials/biology/signaltransduction.html
22
Biological Pathways of the Cell
BioPAX
The modulation of any of the stages of
gene expression that control:
which genes are switched on and off
when, how long, and how much
Gene regulation may occur in the following
stages of gene expression:
Transcription
Post-transcriptional modification
RNA transport
Translation
mRNA degradation
Post-translational modifications
Gene
Regulation
http://en.wikipedia.org/wiki/Regulation_of_gene_expression
http://www.biology-online.org/dictionary/Gene_regulation
23
Biological Pathways of the Cell
BioPAX
Metabolic
Pathways
BioPAX
Level 1
What’s a pathway?
Depends on who you ask!
Molecular
Interaction
Networks
BioPAX
Level 2
Signaling
Pathways
BioPAX
Level 3
Gene
Regulation
BioPAX
Level 4
24
BioPAX Biochemical Reaction
OWL
(schema)
Instances
(Individuals)
(data)
phosphoglucose
isomerase
5.3.1.9
25
BioPAX Ontology
a set of
interactions
parts
how the parts are known to interact
Level 1 v1.0 (July 7th, 2004)
26
BioPAX - Simplify
>200 DBs and tools
Application
Database
User
Before BioPAX
With BioPAX
Common “computable semantic” enables scientific
discovery
Gene Ontology
Standard representation of gene and gene
product attributes across species and
databases
Structured and controlled vocabularies
Organized as 3 independent Ontologies
 Molecular Interactions
 Biological Processes
 Cellular Location
Gene Ontology
http://bib.oxfordjournals.org/content/early/2
011/02/16/bib.bbr002.full
http://bib.oxfordjournals.org/content/early/2011/02/16/bib.bbr002.full
Use of the Gene Ontology
Annotation of genomes to enable the analysis of the
genome through the annotation terms.
Why annotate?
•Assess quality of assembly
•Characterize assembly
•Identify genes/suites of genes which are of a priori interest.
•Identify genes/suites of genes which have been experimentally
determined to be of interest (i.e., significantly differentially
expressed).
•Gene enrichment analysis (comparison of a set of genes of
interest to a null set).
Gene Ontology
Evidence Codes Decision Tree
Manually-assigned evidence codes fall into four general categories:
experimental, computational analysis, author statements, and
curatorial statements.
Adapted from: http://people.oregonstate.edu/~knausb/rna_seq/annot.pdf
Rhee, S.Y, Wood, V., Dolinski, K. and Draghici, S. 2008. Use and misuse of the gene ontology annotations. Nature Reviews Genetics 9:509-515.
See also: http://www.geneontology.org/GO.evidence.shtml
Sequence Ontology
Sequence Ontology (SO) ‘terms and relationships used to
describe the features and attributes of biological
sequence.’ (E.g., binding_site, exon, etc.)
sequence_attribute
feature_attribute
polymer_attribute
sequence_location
variant_quality
sequence_collection
contig_collection
genome
peptide_collection
variant_collection
sequence_feature
junction
region
sequence_alteration
sequence_variant
functional_variant
structural_variant
SO Obsolete
http://www.sequenceontology.org/
Terms
Gene Ontology (GO)
Directed, Acyclic Graph
‘standardizing the representation of gene and gene
product attributes across species and databases’
Structured and controlled vocabularies
[1] Rhee, S.Y, Wood, V., Dolinski, K. and Draghici, S. 2008. Use and misuse of the gene ontology annotations. Nature
Reviews Genetics 9:509-515.
[2] http://people.oregonstate.edu/~knausb/rna_seq/annot.pdf
Gene Ontology
Gene Ontology
http://people.oregonstate.edu/~knausb/rna
_seq/annot.pdf
Personalized Medicine
Components
•

•

Understand disease heterogeneity
Comprehend disease progression
Determine genetic and environmental contributors
Create treatments against relevant targets
 drugs against relevant targets (molecular structures)
 Yoga against stress
 Exercise against obesity
 Elimination against food intolerance or allergy
• Develop markers to predict response
• Identify concrete endpoints to measure response
Individuals, Not Populations
Quickly retrieve
pharmacogenomic
markers of patients
when needed
No central storage
of data is
necessary, giving
patients full control
over their personal
health information.
http://safety-code.org/
Photo: http://www.flickr.com/photos/sepblog/4014143391/
37
Best Practices
Semantic Web Methodology & Technology Development Process
Fox, Peter & McGuinness, Deborah 2008
http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
BioPortal
http://bioportal.bioontology.org/
Provides access to commonly used biomedical ontologies and to tools for working
with them. BioPortal allows you to
 Browse



the library of ontologies
mappings between terms in different ontologies
a selection of projects that use BioPortal resources
 Search


biomedical resources for a term
for a term across multiple ontologies
 Receive recommendations

on which ontologies are most relevant for a corpus
 Annotate text

with terms from ontologies
All information available through the BioPortal Web site is also available through
the NCBO Web service REST API. Please see REST API documentation for more
information.
http://www.bioontology.org/wiki/index.php/NCBO_REST_services
Healthcare Singularity
and the age of Semantic Medicine
2,300 years after the first report of
angina for the condition to be
commonly taught in medical
curricula, modern discoveries are
being disseminated at an
increasingly rapid pace.
Focusing on the last 150 years, the
trend still appears to be linear,
approaching the axis around 2025.
http://research.microsoft.com/enus/collaboration/fourthparadigm/4th_paradigm_book_part2_gillam.pdf
Conclusion
Tutorial sources
 BioPortal
 W3C HCLSIG
Consortia to join





W3C HCLSIG
OpenPHACTS
Identifiers.org
Pistoia Alliance
BioPAX (check for new name)
Backup Slides
HL-7 and RIM
HL-7 and RIM: http://www.w3.org/2013/HCLStutorials/RIM/#%286%29
 RDF RIM Tutorial Eric Prud'hommeaux, <[email protected]>
 Basic understanding of the structure of how data written in
HL7's RIM can be expressed in RDF.
 It is not a substitute for HL7's documentation, but instead the
author's notion of a quick way to familiarize oneself with the
concepts and terms used in the RIM and how the graph
structure of RDF is a natural way to represent this data.
Copyright © 2013 W3C ® (MIT, ERCIM, Keio, Beihang) Usage
policies apply.
Translational Medicine
Ontology
The Translational Medicine Ontology and Knowledge
Base: driving personalized medicine by bridging the gap
between bench and bedside
Luciano et al. Journal of Biomedical Semantics 2011,
2(Suppl 2):S1
http://www.jbiomedsem.com/content/2/S2/S1
Translational Medicine
Ontology
Overview of selected types, subtypes
(overlap) and existential restrictions
(arrows) in the Translational Medicine
Ontology.
Translational Medicine
Knowledge Base
Translational
Medicine Ontology
with mappings to
ontologies and
terminologies listed
in the NCBO
BioPortal.
The TMO provides a
global schema for
Indivo-based
electronic health
records (EHRs) and
can be used with
formalized criteria for
Alzheimer’s Disease.
The TMO maps
types from Linking
Open Data sources.
Research to Practice Timeline
(earlier work: 10 years in Software Research & Development and Product Development)
World Congress
on Neural
Networks,
July 11-15, 1993,
Portland, Oregon
SIG
Mental Function
and
Dysfunction
Sam Levin
Thesis
Proposal
Approved
1995
PhD
US Patents
No. 6,063,028
Awarded
1997
Patents
Offered at
Ocean Tomo
Auction
Chicago, IL
U Pitt
Greg Siegle
Collaboration
Patents Sold
to Advanced
Biological
BioPAXEMPWR
Laboratories
Belgium
Center for
Yuezhang Proactive
Xiao Depression
Master’s
Treatment
Thesis
(RPI)
2001 2006
?
2008 2009 20102011 2012
1996
1993 1994
2000
Jackie
2013Ashby Actively
Brendan
Linked Data
Samson,
Master’s ThesisSEEKING
W3C HCLS
Poster
Mc Lean
FUNDING
(RPI)
BioDASH
Presented
Hospital
Actively Nightingal
Rensselaer
EPOS
ISMB 1997
Depression
SEEKING
(RPI)
Research
Workshop NeuralPSB 1998 US Patent No.
FUNDING
6,317,73
Modeling of Cognitive and
Nightingale
Awarded
Brain Disorders