Transcript Document

ISMB 2002 Fifth Annual Bio-Ontologies Meeting August 8, 2002 Experiences in visualizing and navigating biomedical ontologies and knowledge bases

Olivier Bodenreider

Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

Introduction 1

 Biomedical knowledge  Terminologies  Ontologies  Knowledge bases (names) (objects) (facts)  Common features  Terms / Concepts  Inter-concept relationships  Hierarchical  Associative Lister Hill National Center for Biomedical Communications 2

Introduction 2

 Challenges  Volume of information   10 4 - 10 6 concepts 10 5 - 10 7 relationships  Orientation  Mapping to concepts  Visualizing concept spaces  Navigating concept spaces knowledge term Lister Hill National Center for Biomedical Communications 3

Introduction 3

 SemNav  UMLS browser   Entry point: biomedical term Display related concepts  Display properties of interconcept relationships  Allow navigation among concepts  GenNav  GO browser     Entry point: GO term or gene product name/symbol Display related GO terms and gene products Display properties of term/term and term/gene product relationships Allow navigation between GO terms and gene products Lister Hill National Center for Biomedical Communications 4

Outline

 Background  Unified Medical Language System (UMLS)  Gene Ontology  Overview of the browsers  SemNav  GenNav  Common features  Differences Lister Hill National Center for Biomedical Communications 5

UMLS and GO

U nified M edical L anguage S ystem

 Developed at NLM since 1990  13 th edition in 2002  Integrates some 60 terminological resources  Clinical vocabularies (including specialties)  Core terminologies (anatomy, drugs, med. devices)  Administrative terminologies, standards  Integration  Synonymous terms are clustered in a concept  Hierarchies (trees) are combined in a graph structure Lister Hill National Center for Biomedical Communications 7

Terminology integration Terms

Duchenne muscular dystrophy Duchenne’s muscular dystrophy Duchenne de Boulogne muscular dystrophy Duchenne type progressive muscular dystrophy pseudohypertrophic muscular dystrophy X-liked recessive muscular dystrophy severe generalized familial muscular dystrophy MeSH, SNOMED CTV3, Jablonski, CRISP, DxPlain, MedDRA, LOINC COSTAR Jablonski SNOMED MeSH, CTV3 SNOMED Jablonski SNOMED Lister Hill National Center for Biomedical Communications 8

Terminology integration Relationships

Adrenal Gland Diseases Adrenal Cortex Diseases

SNOMED MeSH AOD Read Codes UMLS

Hypoadrenalism Adrenal Gland Hypofunction Adrenal cortical hypofunction Addison’s Disease Lister Hill National Center for Biomedical Communications 9

UMLS

 Two-level structure  Semantic Network  134 Semantic Types (STs)  54 types of relationships among STs  Metathesaurus  800,000 concepts  ~10 M inter-concept relationships  Link = categorization Semantic Network Semantic Type

categorization

Concept Metathesaurus Lister Hill National Center for Biomedical Communications 10

Semantic Types Fully Formed Anatomical Structure Anatomical Structure Embryonic Structure Body Part, Organ or Organ Component Disease or Syndrome Pharmacologic Substance Population Group

Semantic Network Metathesaurus

12

Esophagus Left Phrenic Nerve

9

Heart Valves Concepts

4

Medias tinum Heart

31

Fetal Heart Saccular Viscus

97

Angina Pectoris Cardiotonic

225

Agents

22

Tissue Donors

Gene Ontology

 Developed by the GO Consortium  Several components  Ontology (~11,000 concepts)  Molecular functions  Cellular components  Biological processes  Gene products (~125,000)  Associations between Gene products and GO concepts (~357,000) Lister Hill National Center for Biomedical Communications 12

SemNav

MeSH Browser

SemNav Visualization options

Lister Hill National Center for Biomedical Communications 18

SemNav Relationships

Biologically Active Substance Amino Acid, Peptide or Protein Semantic Types Disease or Syndrome

Dystrophin

Concepts 55 Muscular Dystrophy, Duchenne Lister Hill National Center for Biomedical Communications 23

GenNav

Material and Methods

Common features and differences

Mapping query terms

 Mapping terms to concepts  Matching criteria (exact, approximate)  Normalization techniques  work well on clinical terms  less applicable to gene names  Query disambiguation  With semantic type in SemNav  With species in GenNav Lister Hill National Center for Biomedical Communications 31

Visualization

 Graph vs. Trees (Forest)  Multiple inheritance is better visualized by graphs than by trees  Off-the-shelf, freely available graph visualization packages are available (GraphViz)  Need to reduce complexity  Transitive reduction on complex graphs  Feature selection   e.g., a given vocabulary in SemNav e.g., a given species in GenNav Lister Hill National Center for Biomedical Communications 32

Navigation

 Tool for exploration  Navigation among concepts ( SemNav and GenNav )  Navigation between two poles (Gene products and GO concepts in GenNav )  Self-contained ( SemNav ) or opened to external resources ( GenNav ) Lister Hill National Center for Biomedical Communications 33

Conclusions

Conclusions

 Most of the lessons learned while developing SemNav (for browsing general biomedical knowledge) were applicable to GenNav (for browsing molecular biology knowledge)  The lexical techniques suitable for mapping text to clinical terminologies require adaptation to the specificity of molecular biology terminologies Lister Hill National Center for Biomedical Communications 35

Olivier Bodenreider

Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA

Contact:

[email protected]

SemNav GenNav http://umlsks.nlm.nih.gov

* ► Resources ► Semantic Navigator ( * free UMLS registration required) http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl