Semantic Web for Health Care and Biomedical Informatics

Download Report

Transcript Semantic Web for Health Care and Biomedical Informatics

Semantic Web for Health Care and
Biomedical Informatics
Keynote at
NSF Biomed Web Workshop, December 4-5, 2007
Amit P. Sheth
[email protected]
Thanks Pablo Mendes, Satya Sahoo and Kno.e.sis team;
Collaborators at Athens Heart Center (Dr. Agrawal), NLM (Olivier
Bodenreider), CCRC, UGA (Will York), CCHMC (Bruce Aronow)
Knowledge Enabled Information and Services Science
Outline
• Semantic Web – very brief intro
• Scenarios to demonstrate the applications
and benefit of semantic web technologies
– Health care
– Biomedical Research
Knowledge Enabled Information and Services Science
Biomedical Informatics...
...needs a connection
Hypothesis Validation
Experiment design
Predictions
Personalized medicine
Biomedical Informatics
Semantic Web research aims at
providing this connection!
Etiology
Pathogenesis
Clinical findings
Diagnosis
Prognosis
Treatment
Pubmed
Clinical
Trials.gov
Medical Informatics
Genome
Transcriptome
Proteome
Genbank Metabolome
Physiome
...ome
Uniprot
More advanced capabilities for
search,
integration,
analysis,
linking to new insights
and discoveries!
Bioinformatics
Knowledge Enabled Information and Services Science
Evolution of the Web
Web as an oracle / assistant /
partner
- “ask to the Web”
- using semantics to leverage
2007
text + data + services + people
Web of people
- social networks, user-created content
- GeneRIF, Connotea
Web of services
- data = service = data, mashups
- ubiquitous computing
1997
Web of databases
- dynamically generated pages
- web query interfaces
Web of pages
- text, manually created links
- extensive navigation
Knowledge Enabled Information and Services Science
Semantic Web Enablers and Techniques
• Ontology: Agreement with Common Vocabulary &
Domain Knowledge; Schema + Knowledge base
• Semantic Annotation (meatadata Extraction): Manual,
Semi-automatic (automatic with human verification),
Automatic
• Reasoning/computation: semantics enabled search,
integration, complex queries, analysis (paths, subgraph),
pattern finding, mining, hypothesis validation, discovery,
visualization
Knowledge Enabled Information and Services Science
Maturing capabilites and ongoing research
• Text mining: Entity recognition, Relationship
extraction
• Integrating text, experimetal data, curated
and multimedia data
• Clinical and Scientific Workflows with
semantic web services
• Hypothesis driven retrieval of scientific
literature, Undiscovered public knowledge
Knowledge Enabled Information and Services Science
Metadata and Ontology: Primary Semantic Web enablers
Deep semantics
Shallow semantics
Knowledge Enabled Information and Services Science
Characteristics of Semantic Web
Self
Describing
Easy to
Understand
Semantic Web:
Machine &
IssuedThe
by
Human
a XML,
Trusted
RDF & Ontology
Readable
Authority
Convertible
Can be
Secured
Adapted from William Ruh (CISCO)
Knowledge Enabled Information and Services Science
Many ontologies exist
Open Biomedical Ontologies
Knowledge Enabled Information and Services Science
Open Biomedical Ontologies, http://obo.sourceforge.net/
Drug Ontology Hierarchy
(showing is-a relationships)
non_drug_
reactant
interaction_
property
formulary_
property
formulary
indication
monograph
_ix_class
prescription
_drug_
property
cpnum_
group
property
indication_
property
brandname_
individual
brandname_
undeclared
prescription
_drug_
brand_name
brandname_
composite
generic_
composite
prescription
_drug
prescription
_drug_
generic
owl:thing
interaction
interaction_
with_prescri
ption_drug
generic_
individual
Knowledge Enabled Information and Services Science
interaction_
with_non_
drug_reactant
interaction_
with_mono
graph_ix_cl
ass
N-Glycosylation metabolic pathway
N-glycan_beta_GlcNAc_9
GNT-I
attaches GlcNAc at position 2
N-acetyl-glucosaminyl_transferase_V
N-glycan_alpha_man_4
GNT-V
attaches
GlcNAc at position 6
UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2
<=>
UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
Knowledge Enabled Information and Services Science
Opportunity: exploiting clinical and biomedical data
binary
text
Scientific
Literature
Health
Information
Services
PubMed
300 Documents
Published Online
each day
Elsevier
iConsult
NCBI
User-contributed
Content (Informal) Public Datasets
GeneRifs
Genome,
Protein DBs
new sequences
daily
Clinical Data
Personal
health history
Laboratory
Data
Lab tests,
RTPCR,
Mass spec
Search, browsing, complex query, integration, workflow,
analysis, hypothesis validation, decision support.
Knowledge Enabled Information and Services Science
Scenario 1:
• Status: In use today
• Where: Athens Heart Center
• What: Use of semantic Web technologies
for clinical decision support
Knowledge Enabled Information and Services Science
Operational since January 2006
Knowledge Enabled Information and Services Science
Active Semantic Electronic Medical Records (ASEMR)
Goals:
• Increase efficiency with decision support
• formulary, billing, reimbursement
• real time chart completion
• automated linking with billing
• Reduce Errors, Improve Patient Satisfaction & Reporting
• drug interactions, allergy, insurance
• Improve Profitability
Technologies:
• Ontologies, semantic annotations & rules
• Service Oriented Architecture
Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paper
Knowledge Enabled Information and Services Science
Demonstration
Knowledge Enabled Information and Services Science
ASMER Efficiency
Chart Completion before the preliminary deployment
600
400
Same Day
300
Back Log
200
100
0
M
ar
04
M
ay
04
Ju
l0
Se 4
pt
04
N
ov
04
Ja
n
05
M
ar
05
M
ay
05
Ju
l0
5
Chart Completion after the preliminary deployment
Charts
04
Ja
n
Charts
500
700
600
500
400
300
200
100
0
Month/Year
Same Day
Back Log
Sept
05
Nov 05
Jan 06
Mar 06
Month/Year
Knowledge Enabled Information and Services Science
Scenario 2:
• Status: Demonstration
• Where: W3C Health Care and Life
Sciences (HCLS) interest group
• What: Using semantic web to aggregate
and query data about Alzheimer’s
• http://www.w3.org/2001/sw/hcls/
Knowledge Enabled Information and Services Science
Scenario 2: Scientific Data Sets for Alzheimer’s
Knowledge Enabled Information and Services Science
SPARQL Query spanning multiple sources
Knowledge Enabled Information and Services Science
Scenario 3
• Status: Completed research
• Where: NIH
• What: Understanding the genetic basis of
nicotine dependence. Integrate gene and
pathway information and show how three
complex biological queries can be answered by
the integrated knowledge base.
• How: Semantic Web technologies (especially
RDF, OWL, and SPARQL) support information
integration and make it easy to create semantic
mashups (semantically integrated resources).
Knowledge Enabled Information and Services Science
Motivation
• NIDA study on nicotine dependency
• List of candidate genes in humans
• Analysis objectives include:
o Find interactions between genes
o Identification of active genes – maximum
number of pathways
o Identification of genes based on anatomical
locations
• Requires integration of genome and biological
pathway information
Knowledge Enabled Information and Services Science
Genome and pathway information integration
Reactome
KEGG
•pathway
•pathway
•protein
•protein
•pmid
•pmid
HumanCyc
•pathway
•protein
•pmid
Entrez Gene
•GO ID
•HomoloGene ID
GeneOntology
HomoloGene
Knowledge Enabled Information and Services Science
JBI
Knowledge Enabled Information and Services Science
Entrez
Knowledge
Model
(EKoM)
BioPAX
ontology
Knowledge Enabled Information and Services Science
Deductive Reasoning
Protein-Protein
Interaction
RULE:
given that
two genes interact with each other, given certain number of
parameters being met, we can assert that the gene products also interact with
each other
IF (x have_common_pathway y) AND (x rdf:type gene) AND (y rdf:type gene) AND
(x has_product m) AND (y has_product n) AND (m rdf:type gene_product)
AND (n rdf:type gene_product) THEN (m ? n)
has_product
gene_product
associated_with
associated_with
has_product
gene1
database_identifier 2
interacts_with
have_common_pathway
gene2
gene_product
database_identifier 1
Knowledge Enabled Information and Services Science
Scenario 4
• Status: Completed research
• Where: NIH
• What: queries across integrated data
sources
– Enriching data with ontologies for integration,
querying, and automation
– Ontologies beyond vocabularies: the power of
relationships
Knowledge Enabled Information and Services Science
Use data to test hypothesis
Link between glycosyltransferase activity and
congenital muscular dystrophy?
Gene name
Interactions
Glycosyltransferase
GO
gene
Sequence
PubMed
OMIM
Congenital muscular dystrophy
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Knowledge Enabled Information and Services Science
In a Web pages world…
(GeneID: 9215)
has_associated_disease
Congenital muscular
dystrophy,
type 1D
has_molecular_function
Acetylglucosaminyltransferase activity
Knowledge
Enabled
and Services
Science at HCLS Workshop, WWW07
Adapted
from: Information
Olivier Bodenreider,
presentation
With the semantically enhanced data
SELECT DISTINCT ?t ?g ?d {
?t is_a GO:0016757 .
GO:0016757
?g has molecular functionglycosyltransferase
?t .
?g has_associated_phenotype ?b2 .
isa
?b2 has_textual_description ?d .
FILTER (?d, “muscular distrophy”, “i”) . GO:0008194
FILTER (?d, “congenital”,GO:0016758
“i”)
}
GO:0008375
acetylglucosaminyltransferase
GO:0008375
acetylglucosaminyltransferase
MIM:608840
Muscular dystrophy,
congenital, type 1D
has_molecular_function
LARGE
EG:9215
has_associated_phenotype
From medinfo paper.
Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Knowledge Enabled Information and Services Science
Scenario 5
• Status: Research prototype and in progress
• Workflow withSemantic Annotation of Experimental
Data already in use
• Where: UGA
• What:
– Knowledge driven query formulation
– Semantic Problem Solving Environment (PSE)
for Trypanosoma cruzi (Chagas Disease)
Knowledge Enabled Information and Services Science
Knowledge driven query formulation
Complex queries can also include:
- on-the-fly Web services execution to retrieve additional data
- inference rules to make implicit knowledge explicit
Knowledge Enabled Information and Services Science
T.Cruzi PSE Query Interface
Figure 4: Semantic annotation of ms scientific data
Knowledge Enabled Information and Services Science
N-Glycosylation Process (NGP)
Cell Culture
extract
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
n
Separation technique I
Glycopeptides Fraction
n
PNGase
Peptide Fraction
Separation technique II
n*m
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
ms peaklist
Data reduction
ms/ms peaklist
binning
Glycopeptide identification
and quantification
N-dimensional array
Peptide identification
Peptide list
Data correlation
Knowledge Enabled Information and Services Science
Signal integration
Semantic Web Process to incorporate provenance
Agent
Biological
Sample
Analysis
by MS/MS
O
Semantic
Annotation
Applications
Agent
Raw
Data to
Standard
Format
I
Data
Preprocess
O
Raw
Data
Agent
I
Standard
Format
Data
(Mascot/
Sequest)
O
Filtered
Data
Agent
DB
Search
I
Search
Results
O
Final
Output
Storage
Biological Information
Knowledge Enabled Information and Services Science
Results
Postprocess
(ProValt)
I
O
ProPreO: Ontology-mediated provenance
parent ion charge
830.9570
194.9604
2
580.2985
0.3592
parent ion m/z
688.3214
0.2526
779.4759
38.4939
784.3607
21.7736
1543.7476
1.3822
fragment ion m/z
1544.7595
2.9977
1562.8113
37.4790
1660.7776
476.5043
parent ion
abundance
fragment ion
abundance
ms/ms peaklist data
Mass Spectrometry (MS) Data
Knowledge Enabled Information and Services Science
ProPreO: Ontology-mediated provenance
<ms-ms_peak_list>
<parameter
instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”
mode=“ms-ms”/>
<parent_ion m-z=“830.9570” abundance=“194.9604” z=“2”/>
<fragment_ion m-z=“580.2985” abundance=“0.3592”/>
<fragment_ion m-z=“688.3214” abundance=“0.2526”/>
<fragment_ion m-z=“779.4759” abundance=“38.4939”/>
Ontological
<fragment_ion m-z=“784.3607” abundance=“21.7736”/>
Concepts
<fragment_ion m-z=“1543.7476” abundance=“1.3822”/>
<fragment_ion m-z=“1544.7595” abundance=“2.9977”/>
<fragment_ion m-z=“1562.8113” abundance=“37.4790”/>
<fragment_ion m-z=“1660.7776” abundance=“476.5043”/>
</ms-ms_peak_list>
Semantically Annotated MS Data
Knowledge Enabled Information and Services Science
Scenario 6
• When: Research in progress
• Where: Athens Heart Center and Cincinatti
Children’s Hospital Medical Center
• What: scientific literature mining
– Dealing with unstructured information
– Extracting knowledge from text
– Complex entity recognition
– Relationship extraction
Knowledge Enabled Information and Services Science
Heart Failure Clinical Pathway
Disease
causes
Angiotension
Receptor
Blocker (ARB)
Ontology: A Framework for Schema-Driven Relationship Discovery from Unstructured Text, Ramakrishnan, et. al., ISWC 2006, LNCS 4273, pp. 583-596
Knowledge Enabled Information and Services Science
Contextual delivery of information
Knowledge Enabled Information and Services Science
• Two technical challenges
– Text mining
– Workflow adaptation
Knowledge Enabled Information and Services Science
Extracting the Relationship
Diabetes mellitus adversely affects the outcomes in patients with myocardial infarction (MI), due in part to the exacerbation of left
ventricular (LV) remodeling. Although angiotensin II type 1 receptor blocker (ARB) has been demonstrated to be effective in the
treatment of heart failure, information about the potential benefits of ARB on advanced LV failure associated with diabetes is lacking.
To induce diabetes, male mice were injected intraperitoneally with streptozotocin (200 mg/kg). At 2 weeks, anterior MI was created by
ligating the left coronary artery. These animals received treatment with olmesartan (0.1 mg/kg/day; n = 50) or vehicle (n = 51) for 4
weeks. Diabetes worsened the survival and exaggerated echocardiographic LV dilatation and dysfunction in MI. Treatment of diabetic
MI mice with olmesartan significantly improved the survival rate (42% versus 27%, P < 0.05) without affecting blood glucose, arterial
blood pressure, or infarct size. It also attenuated LV dysfunction in diabetic MI. Likewise, olmesartan attenuated myocyte hypertrophy,
interstitial fibrosis, and the number of apoptotic cells in the noninfarcted LV from diabetic MI. Post-MI LV remodeling and failure in
diabetes were ameliorated by ARB, providing further evidence that angiotensin II plays a pivotal role in the exacerbated heart failure
after diabetic MI.
ARB
causes
heart failure
Angiotensin II type 1 receptor blocker attenuates exacerbated left ventricular remodeling and failure in diabetes-associated myocardial infarction.,
Matsusaka H, et. al.
Knowledge Enabled Information and Services Science
Problem – Extracting relationships
between MeSH terms from PubMed
Biologically
active substance
UMLS
Semantic Network
complicates
affects
causes
causes
Lipid
affects
Disease or
Syndrome
instance_of
instance_of
???????
Fish Oils
Raynaud’s Disease
MeSH
9284
documents
5
documents
Knowledge Enabled Information and Services Science
4733
documents
PubMed
Background knowledge used
• UMLS – A high level schema of the biomedical
domain
– 136 classes and 49 relationships
– Synonyms of all relationship – using variant lookup
(tools from NLM)
– 49 relationship + their synonyms = ~350 mostly verbs
• MeSH
– 22,000+ topics organized as a forest of 16 trees
– Used to query PubMed
• PubMed
T147—effect
T147—induce
T147—etiology
T147—cause
T147—effecting
T147—induced
– Over 16 million abstract
– Abstracts annotated with one or more MeSH terms
Knowledge Enabled Information and Services Science
Method – Parse Sentences in PubMed
SS-Tagger (University of Tokyo)
SS-Parser (University of Tokyo)
• Entities (MeSH terms) in sentences occur in modified forms
• “adenomatous”
modifies
“hyperplasia”
(TOP (S
(NP (NP (DT An)
(JJ excessive)
(ADJP (JJ endogenous) (CC or) (JJ
• “An excessive
endogenous
or exogenous
modifies
exogenous)
) (NN stimulation)
) (PP
(IN by) (NPstimulation”
(NN estrogen)
) ) ) (VP (VBZ
“estrogen”
induces)
(NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT
• Entities
can also occur) as
of 2 or more other entities
the)
(NN endometrium)
) ) composites
)))
• “adenomatous hyperplasia” and “endometrium” occur as “adenomatous
hyperplasia of the endometrium”
Knowledge Enabled Information and Services Science
Method – Identify entities and Relationships in
Parse Tree
Modifiers
Modified entities
Composite Entities
TOP
S
VP
NP
VBZ
PP
NP
DT
the
JJ
excessive
JJ
endogenous
IN
by
ADJP
NP
induces
NN
estrogen
NP
NN
stimulation
JJ
adenomatous
CC
or
PP
NN
hyperplasia
IN
of
NP
JJ
exogenous
DT
the
Knowledge Enabled Information and Services Science
NN
endometrium
• What can we do with the extracted
knowledge?
• Semantic browser demo
Knowledge Enabled Information and Services Science
Evaluating hypotheses
Migraine
affects
Magnesium
Stress
inhibit
Patient
isa
Calcium Channel
Blockers
Complex
Query
Keyword query: Migraine[MH] + Magnesium[MH]
PubMed
Supporting
Document
sets
retrieved
Knowledge Enabled Information and Services Science
Workflow Adaptation: Why and How
• Volatile nature of execution environments
– May have an impact on multiple activities/ tasks in the
workflow
• HF Pathway
– New information about diseases, drugs becomes
available
– Affects treatment plans, drug-drug interactions
• Need to incorporate the new knowledge into
execution
– capture the constraints and relationships between
different tasks activities
Knowledge Enabled Information and Services Science
Workflow Adaptation Why?
New knowledge about
treatment found during
the execution of the pathway
New knowledge about drugs,
drug drug interactions
Knowledge Enabled Information and Services Science
Workflow Adaptation: How
• Decision theoretic approaches
– Markov Decision Processes
• Given the state S of the workflow when an
event E occurs
– What is the optimal path to a goal state G
– Greedy approaches rely on local optimization
• Need to choose actions based on optimality across
the entire horizon, not just the current best action
– Model the horizon and use MDP to find the
best path to a goal state
Knowledge Enabled Information and Services Science
Conclusion
• semantic web technologies can help with:
– Fusion of data: semi-structured, structured,
experimental, literature, multimedia
– Analysis and mining of data, extraction,
annotation, capture provenance of data
through annotation, workflows with SWS
– Querying of data at different levels of
granularity, complex queries, knowledge-driven
query interface
– Perform inference across data sets
Knowledge Enabled Information and Services Science
Take home points
• Shift of paradigm: from browsing to
querying
• Machine understanding:
– extracting knowledge from text
– Inference, software interoperation
• Semantic-enabled interfaces towards
hypothesis validation
Knowledge Enabled Information and Services Science
References
1.
2.
3.
4.
5.
6.
•
A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active
Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006.
Satya Sahoo, Olivier Bodenreider, Kelly Zeng, and Amit Sheth, An Experiment in Integrating
Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and
Phenotype Information
WWW2007 HCLS Workshop, May 2007.
Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit Sheth, From "Glycosyltransferase to
Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene
Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4
Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An
ontology-driven semantic mash-up of gene and biological pathway information: Application to the
domain of nicotine dependence, submitted, 2007.
Cartic Ramakrishnan, Krzysztof J. Kochut, and Amit Sheth, "A Framework for Schema-Driven
Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583596
Satya S. Sahoo, Christopher Thomas, Amit Sheth, William S. York, and Samir Tartir, "Knowledge
Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World
Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.
Demos at: http://knoesis.wright.edu/library/demos/
Knowledge Enabled Information and Services Science