Text-Based Discovery in Biomedicine The Architecture of

Download Report

Transcript Text-Based Discovery in Biomedicine The Architecture of

Text-based Discovery in
Biomedicine
The Architecture of the DAD-system
Marc Weeber1,2, Henny Klein1,
Alan R. Aronson2, Jim G. Mork2,
Lolkje T. W. de Jong - van den Berg1,
Rein Vos1,3
1Department
2Lister
3Health
of Social Pharmacy and Pharmacoepidemiology, Groningen University
Institute for Drug Exploration, The Netherlands
Hill National Center for Biomedical Communication, National Library of
Medicine, Bethesda, MD
Ethics and Philosophy, Faculty of Health Sciences, University of Maastricht,
The Netherlands
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Introduction
• Goal:
Finding new biomedical knowledge through the
combination of existing knowledge as represented
in the medical literature
• Motivation:
Prevention of re-inventing the wheel, re-usage of
specific knowledge outside the original domain of
discovery
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Swanson
A
B
C
?
• AB: Raynaud’s disease is characterized by high
blood viscosity and high platelet aggregation
• BC: Fish oil is known to reduce blood viscosity and
platelet aggregation
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Vos and Rikken
• Drugs instead of diet factors
• Intermediate (B) terms are adverse drug
reactions
• Drug – Adverse drug reactions – Disease:
The DAD-system
• Vos (1991) Drugs looking for diseases
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Existing Techniques
• Swanson & Smalheiser:
• Single words/multi word terms
• MEDLINE titles
• No statistics
• Gordon & Lindsay:
• Single words/multi word terms
• Information Retrieval statistics
• Replication of Swanson’s discoveries
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
New Techniques
• Use of UMLS concepts
• PubMed
• MetaMap: mapping free text (MEDLINE titles
and abstracts) to concepts
• Interactive web interface
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Two-step Approach
• Open discovery, generating a hypothesis
A
?
?
• Closed discovery, testing a hypothesis
A
Social Pharmacy and
Pharmacoepidemiology
?
C
Lister Hill National Center for
Biomedical Communications
Why UMLS Concepts?
• Use of only biomedically relevant information
• Useful transition from single word to multi
word term
• Semantic information (semantic types) for
filtering (e.g. select only Disease or
Syndrome)
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
DAD-system
Metathesaurus
KS
Specialist
Lexicon
PubMed
Semantic
Network
MetaMap
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
DAD-system
Metathesaurus
Specialist
Lexicon
KS
PubMed
Semantic
Network
MetaMap
Query
Txt2Con
Filter
Select
MySQL
Database
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Show
DAD-system
Metathesaurus
Specialist
Lexicon
KS
PubMed
Semantic
Network
MetaMap
Query
Txt2Con
Filter
Select
MySQL
Database
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Show
Open Discovery
A
•Query (user input):
raynaud’s disease
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
•Mapping text to concept through MetaMap:
Raynaud's Disease [Disease or Syndrome]
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
•Synonym lookup:
Raynaud's syndrome
Raynaud's disease /phenomenon
•Variant generation:
e.g. syndrome / syndromes
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
•PubMed query:
raynaud OR raynauds
•Processing: query in titles and abstracts
•Result: 1,246 MEDLINE citations
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
• Text to concept mapping of all citations
• Sentences with Raynaud’s disease
• Result: 1,278 UMLS concepts
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
• Select functional/physiological concepts
• Semantic types in filter:
Body Location or Region
Biologic Function
Cell Function
Phenomenon or Process
Physiologic Function
Tissue
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
•Result: 57 Concepts
•Frequency range:
1- 18
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
• Selected B-concepts:
Plasma Viscosity Level
Blood Viscosity
Platelet Adhesiveness
Platelet Aggregation
Effects, Blood Coagulation
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
• Variants:
plasma, plasmas
viscosity, viscous,
aggregation, aggregations, aggregating
coagulation, coagulating
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
•PubMed query:
blood coagulation OR blood viscosity
OR plasma viscosity OR platelet
adhesiveness OR platelet aggregation
•Result: 10,611 MEDLINE citations
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
• Concepts in sentences with B-concepts:
7,702
• Concepts not in Raynaud sentences:
6,747
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
• Filter for dietary related concepts
• Semantic types in filter:
Vitamin
Lipid
Element, Ion, or Isotope
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Open Discovery
A
B
•Result: 206 Concepts
•Rank order on relations
•Fish oil related concepts:
Social Pharmacy and
Pharmacoepidemiology
C
Eicosapentaenoic Acid
Fish Oil
Fatty Acids, Omega 3
MAXEPA
Omega-3 Polyunsaturated
Fatty Acid
Cod Liver Oil
Salmon Oil
Lister Hill National Center for
Biomedical Communications
Closed Discovery
A
C
Raynaud’s Disease
Social Pharmacy and
Pharmacoepidemiology
Eicosapentaenoic Acid
Fish Oil
Fatty Acids, Omega 3
MAXEPA
Omega-3 Polyunsaturated
Fatty Acid
Cod Liver Oil
Salmon Oil
Lister Hill National Center for
Biomedical Communications
Closed Discovery
A
C
1,246 citations
1,278 concepts
463 citations
1,795 concepts
479 common concepts
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Closed Discovery
A
C
Functional / Physiological
Filter
45 B-concepts
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Closed Discovery
A
B
• Known concepts:
Plasma viscosity level
Blood Viscosity
Platelet Adhesiveness
Platelet Aggregation
Effects, Blood
Coagulation
Social Pharmacy and
Pharmacoepidemiology
C
• New concepts:
Vasodilatation
Veins, Capillaries
Dinoprostone
Fibrinolysis
Deformability
Rheology
Lister Hill National Center for
Biomedical Communications
Juxtaposition
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Success / Failure
+ Simulation of Raynaud’s disease – fish oil
and migraine – magnesium
+ Discovery of new therapeutic applications for
thalidomide
- Mapping (Mg = milligram / magnesium)
- Association defined by co-occurrence
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications
Future
• Better semantic analysis:
increase(A,B) and decrease(B,C)
• Better user interface
• More databases
e.g. finding genetic bases for diseases
Social Pharmacy and
Pharmacoepidemiology
Lister Hill National Center for
Biomedical Communications