Clinical Bioinformatics for combatting cancer: the place

Download Report

Transcript Clinical Bioinformatics for combatting cancer: the place

ECO
R
European Centre for
Ontological Research
Application of Ontology in
Cancer Bioinformatics.
Dr. Werner Ceusters, MD
Executive Director
European Centre for Ontological Research
Saarland University
Saarbrücken, Germany
ECO
R
European Centre for
Ontological Research
11th World
Conference on
Medical Informatics
San Francisco 7-11/9/2004
• 759 papers
• 48 contain word “bioinformatics”
• 124 contain
“cancer”
•
1 contains
“cancer bioinformatics”
• But: about 50 deal with cancer bioinformatics
• 89 contain
“ontology”
ECO
R
European Centre for
Ontological Research
•
•
•
•
•
•
•
•
•
•
•
•
Ontology related
Cancer Bioinformatics
at MEDINFO 2004
A Log Likelihood Predictor for Genomic Classification of Oral Cancer
using Principle Component Analysis for Feature Selection
Methods for Multi-Category Cancer Diagnosis from Gene Expression
Data: A Comprehensive Evaluation to Inform Decision Support System
Development
A Text Mining Approach to Enable Detection of Candidate Risk Factors
Cancer-related Complementary and Alternative Medicine Online: Factors
Affecting Information Retrieval (by patients)
Development of the ICNP based cancer nursing information system
NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer
Research Results
Extraction of Diagnosis Related Terminological Info from Discharge
Summary
Automated Clinical Annotation of Tissue Bank Specimens
Mining OMIM for Insight into Complex Diseases
A new parameter enhancing breast cancer detection in computer aided
diagnosis of X-ray mammograms
Tools for the Performance of Clinical Trials Research
Formal Representation of Medical Goals for Medical Guidelines
ECO
R
European Centre for
Ontological Research
Goals of Cancer
Bioinformatics
• To integrate molecular, biological and
clinical knowledge about cancer with
analytic methods from bioinformatics.
• The ultimate aim is to create
comprehensive prognostic and predictive
models as aids to diagnosis, treatment and
the design of new therapeutics.
ECO
Task descriptions
R • Sequence similarity searching
European Centre for
Ontological Research
•
•
•
•
•
•
•
•
•
•
•
•
•
•
– Nucleic acid vs nucleic acid
28
– Protein vs protein
39
– Translated nucleic acid vs protein
6
– Unspecified sequence type
29
– Search for non-coding DNA
9
Functional motif searching
35
Sequence retrieval
27
Multiple sequence alignment
21
Restriction mapping
19
Secondary and tertiary structure prediction
14
Other DNA analysis including translation 14
Primer design
12
ORF analysis
11
Literature searching
10
Phylogenetic analysis
9
Protein analysis
10
Sequence assembly
8
Location of expression
7
Miscellaneous
7
Stevens R, Goble C,
Baker P, and Brass A.
A Classification of
Tasks in
Bioinformatics.
Bioinformatics 2001: 17
(2):180-188.
ECO
R
European Centre for
Ontological Research
Three major challenges
• Analyse massive amounts of data:
– Eg: high throughput technologies based upon cDNA or
oligonucleotide microarrays for analysis of gene
expression, analysis of sequence polymorphisms and
mutations, and sequencing
• Appropriately link clinical histories to molecular or
other biomarker data generated by genomic and
proteomic technologies.
• Development of user-friendly computer-based
platforms
– that can be accessed and utilized by the average
researcher for searching, retrieval, manipulation, and
analysis of information from large-scale datasets
ECO
R
European Centre for
Ontological Research
Words of Wisdom
• “Ontology” is too often not taken seriously, and
only few people understand that. But there is
hope:
– The promise of Web Services, augmented with the
Semantic Web, is to provide THE major solution for
integration, the largest IT cost / sector, at $ 500
BN/year. The Web Services and Semantic Web trends
are heading for a major failure (i.e., the most recent
Silver Bullet). In reality, Web Services, as a technology,
is in its infancy. ... There is no technical solution (i.e., no
basis) other than fantasy for the rest of the Web
Services story. Analyst claims of maturity and adoption
(...) are already false. ... Verizon must understand it so
as not to invest too heavily in technologies that will fail
or that will not produce a reasonable ROI.
Dr. Michael L. Brodie, Chief Scientist, Verizon IT
ECO
R
Setup of this presentation
European Centre for
Ontological Research
• Look at some popular views, statements, claims,
systems, beliefs, ... about “ontology”, and indicate
where and how they fail to do justice to what
ontology is actually about;
• Explain the basics of the principled approach that
we use and give examples of practical
applications;
• Some comments on the future of ontology in
Buffalo and the US.
ECO
R
System
Integrationapproaches
approaches
Data Integration
European Centre for
Ontological Research
1.
2.
3.
4.
5.
Data Warehousing :
Data from various data sources are converted, merged and stored in a
centralized DBMS. (Examples) Integrated Genomic Database
Hyperlinking approaches:
Where links are set up between related information and data sources.
SRS, Entrez (NCBI)
Standardization:
Efforts which address the need for a common metadata model for various
application domains.
Integration systems:
Systems that can gather and integrate information from multiple sources.
Some of these systems have a Mediator-Wrapper Architecture others are
language based systems like Bio-Kleisli.
Federated Database:
Cooperating, yet autonomous, databases map their individual schema’s to
a single global schema. Operations are preformed against the federated
schema.
Steve Brady
ECO
R
Data integration approaches
European Centre for
Ontological Research
at least, the beginnings of ...
•
•
•
•
•
•
Protein interaction databases
Small molecule databases
Genome databases
Pathway databases
Protein databases
Enzyme databases
Gene
Ontology
ECO
R
European Centre for
Ontological Research
GO deals with basic
ontological notions
very haphazardly
• GO’s three main term-hierarchies are:
• component, function and process
• But GO confuses functions with structures,
and also with executions of functions
• and has no clear account of the relation
between functions and processes
ECO
R
European Centre for
Ontological Research
A flavour of ontology
<!-- ****************************************************************
Description of a location in a lipid bilayer membrane
Field description for BIND-membrane
–
–
–
–
–
not-specified = somewhere in membrane
outer-surface = on the outer surface of the membrane
within = within the bilayer
inner-surface = on the inner surface of the membrane
lumen = in the lumen that the membrane surrounds
*************************************************************** -->
<!ELEMENT BIND-membrane %ENUM; >
<!ATTLIST BIND-membrane value ( not-specified | outersurface | within | inner-surface | lumen ) #REQUIRED >
ECO
R
Mereo-topology
European Centre for
Ontological Research
HASOVERLAPPING
-REGION
HASPARTIALSPATIALOVERLAP
ISSPATIAL
-PARTOF
ISPROPERSPAT.PART-OF
HAS-DISCRETEDREGION
HASSPATIAL
-PART
HASPROPERSPATIAL
-PART
HAS-SPATIALPOINTREFERENCE
HASCONNECTINGREGION
HASDISCONNECTEDREGION
HASEXTERNALIS-NONCONNECTINGTANG.ISREGION
SPAT.TANG.IS- HAS-NON- HASPART-OF
SPAT.- SPAT.- TANG.- TANG.PART-OF EQUIV.- SPAT.SPAT.OF
PART
PART
ISIS-PARTLYIN-CONVEX- INSIDECONVEXISHULL-OF
HULL-OF
OUTSIDECONVEXHULL-OF
ISIS-GEOINSIDE- TOPOINSIDEOF
OF
ECO
R
European Centre for
Ontological Research
caCORE:
The NCICB Cancer Informatics
Infrastructure Backbone
cancer Bioinformatics Infrastructure Objects :
Biomedical objects to facilitate the communication and
integration of information from the various initiatives
supported by the NCICB
cancer Data Standards Repository:
meta-data used for cancer research
NCI Enterprise Vocabulary Services :
standard vocabularies for a variety of
settings in the life sciences
ECO
R
European Centre for
Ontological Research
caBIO architecture
Connectivity at programming interface level, NOT content
ECO
R
European Centre for
Ontological Research
CoMeDIAS (France)
ECO
R
European Centre for
Ontological Research
GenesTraceTM:
Biological Knowledge
Discovery via Structured
Terminology
ECO
R
European Centre for
Ontological Research
But ....
Talking to each other
does not mean
Understanding each other
ECO
R Pray your computer isn’t Irish ...
European Centre for
Ontological Research
X: “Hallo stranger, you appear to be traveling?”
Y:
“Yes, I always travel when on a journey.”
X: “And pray, what might your name be?”
Y:
“It might be Sam Patch, but it isn't.”
X: “Have you been long in these parts?”
Y:
“Never longer than at present—5 feet 9.”
X: “Do you get anything new?”
Y:
“Yes, I bought a new whetstone this morning.”
Copyright © 1996 Electronic Historical Publications
ECO
R
European Centre for
Ontological Research
Cancer Data Standards
Repository (caDSR)
• One of the problems confronting the biomedical
data management community is the panoply of
ways that similar or identical concepts are
described.
• Amen !?
• But more appropriate would it be to say:
– THE problem confronting the biomedical data
management community is that concepts are
described.
ECO
R
Triadic models of meaning:
European Centre for
Ontological Research
The Semiotic/Semantic triangle
Reference:
Concept / Sense / Model / View
Sign:
Language/
Term/
Symbol
Referent:
Reality/
Object
ECO
R
European Centre for
Ontological Research
“Ontology”
• In Information Science:
– “An ontology is a
description (like a formal
specification of a program)
of the concepts and
relationships that can exist
for an agent or a community
of agents.”
• In Philosophy:
– “Ontology is the science of
what is, of the kinds and
structures of objects,
properties, events,
processes and relations in
every area of reality.”
concept
definition
term
referent
ECO
R
European Centre for
Ontological Research
Why are concepts
not enough?
• Why must our theory address also the
referents in reality?
– Because referents are observable fixed
points in relation to which we can work out
how the concepts used by different
communities relate to each other ;
– Because only by looking at referents can
we establish the degree to which concepts
are good for their purpose.
ECO
NCI Enterprice Vocabulary
R
European Centre for
Ontological Research
Services environment
ECO
R
European Centre for
Ontological Research
NCI Thesaurus
• a biomedical thesaurus created
specifically to meet the needs of the
NCI
• semantically modeled cancer-related
terminology built using description
logic
ECO
R
European Centre for
Ontological Research
Why description logics
are not enough
SNOMED-RT (2000)
SNOMED-CT (2003)
ECO
R
European Centre for
Ontological Research
Underspecification
new-1
new-2
ECO
R
Use of description logics does not
European Centre for
Ontological Research
guarantee correct representations !
ECO
R
European Centre for
Ontological Research
It’s not just a problem
in Healthcare
Ontologies for Legal Information
Serving and Knowledge Management
Joost Breuker, Abdullatif Elhag, Emil
Petkov and Radboud Winkels
ECO
R
European Centre for
Ontological Research
Ontology versus
Description Logics
• In the Description Logic world
– terms and definitions come first,
– the job is to validate them and reason with
them
• In the realist ontology world
– robust ontology (with all its reasoning power)
comes first
– and terms and term-hierarchies must be
subjected to the constraints of ontological
coherence
ECO
R
European Centre for
Ontological Research
Search for “cancer”
ECO
R NCI Thesaurus Root concepts
European Centre for
Ontological Research
Anatomic
Or
? Does Substance
Structure,
the NCI not
Anatomic
? know
If yes,towhy
System,
which
is gene
category
or
Anatomic
Any
product
itemnot
classified
Substance
subsumed
there
?by belongs
it ? If no,? why are
drugs and chemicals not subsumed by it ?
ECO
R
European Centre for
Ontological Research
Conceptual entity
• Definition: none
• Semantic type:
– Conceptual entity
– Classification
• Subconcepts:
– Action:
• definition: action; a thing done
– And:
• Definition: an article which expresses the relation of
connection or addition, used to conjoin a word with a word, ...
– Classification
• Definition: the grouping of things into classes or categories
ECO
R
Definition of “cancer gene”
European Centre for
Ontological Research
ECO
R
NCI Thesaurus architecture
European Centre for
Ontological Research
Findings-AndDisorders-Kind
Anatomy-Kind
What diseases have a diameter of over 3 cm ?
Disease
ISA
“Kinds”
“Associative”
“Formal
restrict the
relationships
subsumption”
domain andproviding
range
or of
associative
“inheritance”
“differentiae”
relationships
Breast neoplasm
Breast
Disease-has-associated-anatomy
ECO
R
Problems with C - rel - C
European Centre for
Ontological Research
• Ad hoc readings of statements of the type C1-relationshipC2
– Human has-part head
// Human has-part finger
– California is-part-of United States // California isa name
– labial vein isa vein of head
// labial vein isa vulval vein
• Concepts not necessarily correspond to something that
(will) exist(ed)
– Sorcerer, unicorn, leprechaun, ...
• Definitions set the conditions under which terms may be
used, and may not be abused as conditions an entity must
satisfy to be what it is
• Language can make strings of words look as if it were
terms
– “Middle lobe of left lung”
ECO
R
European Centre for
Ontological Research
NCI Metathesaurus
• based on NLM's Unified Medical Language
System Metathesaurus supplemented with
additional cancer-centric vocabulary
• a database of many biomedical
terminologies, mapped where possible to
NCI Thesaurus terms and shared
conceptual meanings
ECO
R
NCI and Partner Data Sources
European Centre for
Ontological Research
• SAGE Data (CGAP) – NCI and Duke university SAGE
experiment data
• Expression Measurements (NCICB GEDP) - Probe sets
• Sequence Trace Files (GAI) - EST traces and full-length
mRNA clone traces
• Genetic Annotation Initiative (GAI) - SNPs
• Sequence Verified Clones (as of caBIO version 2.0)
(NCICB internal pre-processed) - Human and mouse
sequence-verified clone information
• Cancer Clinical Trials (NCI CTEP and PDQ) - Trials and
drug agent information
• CMAP Annotation Data (CMAP) - Drug targets, anomalies
• Cancer Vocabulary (NCI) - Cancer related terminology
and concepts
ECO
R
European Centre for
Ontological Research
External Data Sources
• Unigene (NCBI) - Human and mouse genes, sequences,
map locations, clones, proteins and protein homologs
• Homologene (NCBI) - Human and mouse gene homologs
• LocusLink (NCBI) - Genes, gene ontologies, gene
aliases, taxons
• RefSeq (NCBI) - Reference sequences
• EST Data (NCICB) - Tissue-specific expression level ESTs
• cDNA library information (NCICB) - cDNA libraries for
disease and tissue
• Human Genome via UCSC DAS server (UCSC) Genomic sequences, annotations, and map coordinates
• BioCarta (BioCarta) - Pathways
• Gene Ontology - Hierarchy of gene functions
ECO
R
European Centre for
Ontological Research
Metathesaurus traps
UMLS example
ECO
R
European Centre for
Ontological Research
IFOMIS:
Institute for Formal Ontology and Medical Information Science
The Institute for Formal Ontology and Medical
Information Science was founded in April 2002 as
part of the Faculty of Medicine of the University of
Leipzig utilizing a grant of the Alexander von
Humboldt Foundation. It comprehends an
interdisciplinary research group with members from
Philosophy, Computer and Information Science,
Logic, Medicine, and Medical Informatics. IFOMIS
established itself as a center of theoretically grounded
research in both formal and applied ontology. Its goal
is to develop a formal ontology that will be applied
and tested in the domain of medical and biomedical
information science.
In August 2004 IFOMIS moved its base of operations
from Leipzig to Saarland University in Saarbrücken.
IFOMIS
Universität des Saarlandes
Postfach 151150
D-66041 Saarbrücken
Germany
Secretariat
Tel.: +49 (0)681-302-64770
Fax: +49 (0)681-302-64772
ECO
R
European Centre for
Ontological Research
IFOMIS’s long-term goal
• Build a robust high-level BFO-MedO
framework
• THE WORLD’S FIRST INDUSTRIALSTRENGTH PHILOSOPHY
• which can serve as the basis for an
ontologically coherent unification of medical
knowledge and terminology
ECO
R
European Centre for
Ontological Research
IFOMIS’ research in
Formal Ontology
• Formal treatment of universals, individuals,
endurants, perdurants, scales, functions,
collections, ...
• Universals / Concepts
• Meriology and topology
• Vagueness and granularity
• Applicability to domain ontologies,
terminologies, ...
ECO
R
European Centre for
Ontological Research
Reference Ontology
• a theory of a domain of entities in the
world
• based on realizing the goals of maximal
expressiveness and adequacy to reality
• sacrificing computational tractability for
the sake of representational adequacy
ECO
R
European Centre for
Ontological Research
Basic Ontological Notions
• Identity
– How are instances of a class distinguished
from each other
• Unity
– How are all the parts of an instance isolated
• Essence
– Can a property change over time
• Dependence
– Can an entity exist without some others
ECO
R
(Simplified) Logic of classes
European Centre for
Ontological Research
• primitive:
– entities: particulars versus universals
– relation inst such that:
• all classes are universals; all instances are
particulars
• some universals are not classes, hence have no
instances: pet, adult, physician
• some particulars are not instances; e.g. some
mereological sums
• subsumption defined resorting to instances:
ECO
R
European Centre for
Ontological Research
Basic Formal Ontology
Basic Formal Ontology consists in a series
of sub-ontologies (most properly conceived as
a series of perspectives on reality), the most
important of which are:
– SnapBFO, a series of snapshot ontologies (Oti ),
indexed by times: continuants
– SpanBFO a single videoscopic ontology (Ov):
occurants.
Each Oti is an inventory of all entities existing
at a time. Ov is an inventory (processory) of all
processes unfolding through time.
ECO
R
European Centre for
Ontological Research
Occurants and continuants
Picture by Vladimir Brajic
ECO
R
European Centre for
Ontological Research
ECO
R
Levels of granularity in
biomedical ontology
European Centre for
Ontological Research
Granularity level
Continuants
Occurrents
Population
environment
screening
Person
Race, age, disease,
symptom
ADL, working,
treatment, prevention
Organ
Liver, lung, organ part,
sign
Heart beat, digestion,
surgery
Tissue
Elasticity,Turgor,
Strength
Resorption, protection
Cell
Bone cell, Alveolar cell
Cell size, bacterium
Fagocytosis, Cell
growth, Reparation,
hormone production
Subcellular
Cell membrane, Protein
DNA, Oncogene,
Protooncogene,
Virus, oncogenic
molecule
Transcription
Splicing
Mutation
Gene regulation
ECO
R
European Centre for
Ontological Research
Missed subsumption
detection in SNOMED-CT
Missing: ISA neoplasm of heart
ECO
R
Correction of MGED’s
ontology upper part
European Centre for
Ontological Research
MGEDOntology
SubClassOf
MGEDCoreOntology
SubClassOf
SubClassOf
BioMaterial
Package
SubClass Of
Cancer
Site
InstanceOf
BioMaterial
Characteristics
SubClassOf
OrganismPart
the organism part in which additional
tumors are identified remote from the
primary site
Primary site
InstanceOf
has_cancer_site
has-class one-of
DiseaseLocation
SubClass Of
The MGED Ontology is a top level container for the
MGEDCoreOntology and the MGEDExtendedOntology.
The
MGED
ontology
describes
microarray
experimentsand is split into
the MGEDCoreOntology, which supports MAGE-OM
v1.0 and is organized consistently with MAGE, and
the MGEDExtendedOntology, which expands MAGE
v1.0 and contains concepts and relationships which are
not included in MAGE.
Metastatic site
Anatomical location(s) of
disease.
ECO
Text mining and classification
R
European Centre for
Ontological Research
Generalised Possession
Human
Haspossessor
1
2
IS-A
1
IS-A
Healthcare phenomenon
Haspossessed
1
Having a healthcare phenomenon
IS-A
2
Is-possessor-of
Patient
3
Has-Healthcarephenomenon
IS-A
Malignant neoplasm
IS-A
3
Cancer patient
lung carcinoma
Mr. Smith has a pulmonary carcinoma
ECO
R
European Centre for
Ontological Research
The near future:
International Cancer
Ontology Project
• Healthcare Informatics call 6th FP of EU
• Applying realist ontology to:
– Connect relevant databases for combatting
cancer,
• covering all levels of granularity (from molecules to
entire patients) at deep semantic level
• Independent of the dataformat (text, structured,
coded, ...)
ECO
R Knowledge discovery and use
European Centre for
Ontological Research
ECO
R Towards a US-based “X”CORs
European Centre for
Ontological Research
• BCOR: Buffalo Centre for Ontological
Research
• NCOR: National Centre for Ontological
Research
– Involving Stanford
• Introducing realist ontology (as a sound
analytical philosophical discipline) to
improve ontologies (as representations).