Transcript Document
ISMB 2002 Fifth Annual Bio-Ontologies Meeting August 8, 2002 Experiences in visualizing and navigating biomedical ontologies and knowledge bases
Olivier Bodenreider
Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
Introduction 1
Biomedical knowledge Terminologies Ontologies Knowledge bases (names) (objects) (facts) Common features Terms / Concepts Inter-concept relationships Hierarchical Associative Lister Hill National Center for Biomedical Communications 2
Introduction 2
Challenges Volume of information 10 4 - 10 6 concepts 10 5 - 10 7 relationships Orientation Mapping to concepts Visualizing concept spaces Navigating concept spaces knowledge term Lister Hill National Center for Biomedical Communications 3
Introduction 3
SemNav UMLS browser Entry point: biomedical term Display related concepts Display properties of interconcept relationships Allow navigation among concepts GenNav GO browser Entry point: GO term or gene product name/symbol Display related GO terms and gene products Display properties of term/term and term/gene product relationships Allow navigation between GO terms and gene products Lister Hill National Center for Biomedical Communications 4
Outline
Background Unified Medical Language System (UMLS) Gene Ontology Overview of the browsers SemNav GenNav Common features Differences Lister Hill National Center for Biomedical Communications 5
UMLS and GO
U nified M edical L anguage S ystem
Developed at NLM since 1990 13 th edition in 2002 Integrates some 60 terminological resources Clinical vocabularies (including specialties) Core terminologies (anatomy, drugs, med. devices) Administrative terminologies, standards Integration Synonymous terms are clustered in a concept Hierarchies (trees) are combined in a graph structure Lister Hill National Center for Biomedical Communications 7
Terminology integration Terms
Duchenne muscular dystrophy Duchenne’s muscular dystrophy Duchenne de Boulogne muscular dystrophy Duchenne type progressive muscular dystrophy pseudohypertrophic muscular dystrophy X-liked recessive muscular dystrophy severe generalized familial muscular dystrophy MeSH, SNOMED CTV3, Jablonski, CRISP, DxPlain, MedDRA, LOINC COSTAR Jablonski SNOMED MeSH, CTV3 SNOMED Jablonski SNOMED Lister Hill National Center for Biomedical Communications 8
Terminology integration Relationships
Adrenal Gland Diseases Adrenal Cortex Diseases
SNOMED MeSH AOD Read Codes UMLS
Hypoadrenalism Adrenal Gland Hypofunction Adrenal cortical hypofunction Addison’s Disease Lister Hill National Center for Biomedical Communications 9
UMLS
Two-level structure Semantic Network 134 Semantic Types (STs) 54 types of relationships among STs Metathesaurus 800,000 concepts ~10 M inter-concept relationships Link = categorization Semantic Network Semantic Type
categorization
Concept Metathesaurus Lister Hill National Center for Biomedical Communications 10
Semantic Types Fully Formed Anatomical Structure Anatomical Structure Embryonic Structure Body Part, Organ or Organ Component Disease or Syndrome Pharmacologic Substance Population Group
Semantic Network Metathesaurus
12
Esophagus Left Phrenic Nerve
9
Heart Valves Concepts
4
Medias tinum Heart
31
Fetal Heart Saccular Viscus
97
Angina Pectoris Cardiotonic
225
Agents
22
Tissue Donors
Gene Ontology
Developed by the GO Consortium Several components Ontology (~11,000 concepts) Molecular functions Cellular components Biological processes Gene products (~125,000) Associations between Gene products and GO concepts (~357,000) Lister Hill National Center for Biomedical Communications 12
SemNav
MeSH Browser
SemNav Visualization options
Lister Hill National Center for Biomedical Communications 18
SemNav Relationships
Biologically Active Substance Amino Acid, Peptide or Protein Semantic Types Disease or Syndrome
Dystrophin
Concepts 55 Muscular Dystrophy, Duchenne Lister Hill National Center for Biomedical Communications 23
GenNav
Material and Methods
Common features and differences
Mapping query terms
Mapping terms to concepts Matching criteria (exact, approximate) Normalization techniques work well on clinical terms less applicable to gene names Query disambiguation With semantic type in SemNav With species in GenNav Lister Hill National Center for Biomedical Communications 31
Visualization
Graph vs. Trees (Forest) Multiple inheritance is better visualized by graphs than by trees Off-the-shelf, freely available graph visualization packages are available (GraphViz) Need to reduce complexity Transitive reduction on complex graphs Feature selection e.g., a given vocabulary in SemNav e.g., a given species in GenNav Lister Hill National Center for Biomedical Communications 32
Navigation
Tool for exploration Navigation among concepts ( SemNav and GenNav ) Navigation between two poles (Gene products and GO concepts in GenNav ) Self-contained ( SemNav ) or opened to external resources ( GenNav ) Lister Hill National Center for Biomedical Communications 33
Conclusions
Conclusions
Most of the lessons learned while developing SemNav (for browsing general biomedical knowledge) were applicable to GenNav (for browsing molecular biology knowledge) The lexical techniques suitable for mapping text to clinical terminologies require adaptation to the specificity of molecular biology terminologies Lister Hill National Center for Biomedical Communications 35
Olivier Bodenreider
Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA
Contact:
[email protected]
SemNav GenNav http://umlsks.nlm.nih.gov
* ► Resources ► Semantic Navigator ( * free UMLS registration required) http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl