Biological Ontologies - Bowling Green State University

Download Report

Transcript Biological Ontologies - Bowling Green State University

Biological Ontologies
Neocles Leontis
April 20, 2005
What Is An Ontology?
• An ontology is an explicit description of a domain
of knowledge:
–
–
–
–
Concepts -- Entities and Relations
Properties and attributes of Entities and Relations
Constraints on properties and attributes
Individuals (“Instances”)
• An ontology defines:
– a common vocabulary
– a shared understanding of the domain of knowledge
– Commitments on how to use the vocabulary
What Is “Ontology Engineering”?
Ontology Engineering: Defining terms in the
domain and relations among them
– Defining concepts in the domain (Classes)
– Arranging the concepts in a hierarchy
(Subclass-Superclass hierarchy)
– Defining which attributes and Properties
classes can have (slots) and constraints on their
values (facets)
– Defining individuals and filling in slot values
(instantiation)
Why Develop an Ontology?
• To share common understanding of
the structure of information
– among people
– among software agents
• To enable reuse of domain knowledge
– to avoid “re-inventing the wheel”
– to introduce standards to allow
interoperability between ontologies
More Reasons…
• To make domain assumptions explicit
– easier to change domain assumptions
– easier to understand and update legacy
data
• To separate domain knowledge from
the operational knowledge
– re-use domain and operational knowledge
separately
An Ontology Is Often Just the
Beginning
Declare
structure
Databases
Ontologies
Knowledge
bases
Provide
domain
description
Problemsolving
methods
Ontology-Development
Process
In Logical order:
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
define
classes
enumerate
terms
define
classes
create
instances
define
classes
create
instances
In reality - an iterative process:
determine
scope
consider
reuse
define
properties
define
classes
consider
reuse
define
properties
enumerate
terms
define
properties
define
constraints
consider
reuse
define
constraints
create
instances
Protégé
• Graphical ontology-development tool
• Supports a rich knowledge mode
• Open-source and freely available
(http://protege.stanford.edu)
Authoring Program (Protégé 2000)
• Enforces the implementation of foundational
principles and definitional desiderata
• Frame-based architecture compatible with OKBC
protocol = Open Knowledge Base Connectivity
• Frames are used to represent anatomical concepts
• Frames allow for distinguishing between class and
instance
• Protégé allows for selective inheritance of
attributes
• Protégé enhances specificity and expressivity of
attributes by assigning them their own attributes.
Determine Domain and Scope
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
• What is the domain that the ontology will
cover?
• Who is going to use the ontology?
• For what are they (we) going to use the
ontology?
• For what types of questions should the
information in the ontology provide answers
(competency questions)?
Answers to these questions may change during the
lifecycle
RNA Ontology Scope:
DOMAIN
–
–
–
–
–
RNA Sequences (1D) -- Coding and Non-Coding
RNA 2D structures
RNA 3D structures
Alignments of homologous RNA sequences
Relationships between alignments and 3D
structures
RNA Ontology Scope
WHO?
–
–
–
–
Molecular biologists & biochemists
Structural biologists
Evolutionary biologists
Nanotechnologists
RNA Ontology Scope:
WHAT?
– How to improve prediction of RNA 3D structure
– How to improve sequence alignments of
homologous RNAs
– To identify and annotate RNA genes in genomes
– How are RNA 3D structure and evolution
coupled?
– How is RNA evolution coupled to biological
evolution
Consider Reuse
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
• Why reuse other ontologies?
create
instances
– to save the effort
– to interact with the tools that use other
ontologies
– to use ontologies that have been
validated through use in applications
Enumerate Important Terms
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
• What are the terms (entities) we
need to talk about?
• What are the properties and
attributes of these entities?
• What are the relationships between
entities?
Define Classes and the Class
Hierarchy
determine
scope
consider
reuse
enumerate
terms
define
classes
define
properties
define
constraints
create
instances
• A class is a concept in the domain
– a class of wines
– a class of wineries
– a class of red wines
• A class is a collection of elements with similar
properties
• Instances of classes
– a glass of California wine you’ll have for lunch
Class Hierarchy
Class Inheritance
• Classes usually constitute a taxonomic hierarchy (a
subclass-superclass hierarchy)
• A class hierarchy is usually an IS-A hierarchy:
an instance of a subclass is an instance of
a superclass
• If you think of a class as a set of elements, a
subclass is a subset
FMA -- High Level Scheme
• FMA = (AT, ASA, ATA, Mk)
– AT = Anatomy taxonomy (assigns anatomical
entities as class concepts
– ASA = Anatomy Structural Abstraction -includes structural relationships among entities
of the AT
– ATA = Anatomical Transformation Abstraction
-- relationships that describe morphological &
physical transformations of anatomical entities
– MK = Metaknowledge -- principles and sets of
rules
ASA -- High Level Scheme
• ASA = (Dt, PPt, Bn, Pn, SAn)
–
–
–
–
–
Dt = Dimensional taxonomy
PPt = Physical Properties taxonomy
Bn = Boundary network
Pn = Partonomy network
SAn = Spatial Association network
Boundary Network (Bn)
• Specification of boundaries is critical for
segmentation of images and volumetric
datasets
• Definition: Boundary = Non-material
physical anatomical entity of two
or fewer dimensions that delimits
anatomical entities that are of
one higher dimension than the
bounding entity
Boundary Network (Bn)
Inverse Relationships:
-bounded by-bounds-
Real vs. Virtual Boundaries:Rea boundaries
correspond to its surface and designate
discontinuities between constitutional
parts of anatomical entities
Partonomy Network (Pn)
Inverse Relationships:
-has part-
Rule of Dimensional
Consistency
Distinguishes between boundary and partonomy
relationships.
Parthood relations -- only allowed for entities of the
same dimension
Ex: Cavity of stomach (3D) -has part-
Cavity of pyloric antrum (3D)
Ex: Internal surface of stomach (2D) -has
part- Internal surface of pyloric
antrum (2D)
What to Reuse?
• Ontology libraries
– DAML ontology library (www.daml.org/ontologies)
– Ontolingua ontology library
(www.ksl.stanford.edu/software/ontolingua/)
– Protégé ontology library
(protege.stanford.edu/plugins.html)
• Upper ontologies
– IEEE Standard Upper Ontology (suo.ieee.org)
– Cyc (www.cyc.com)
RNA Ontology Consortium
• To share common understanding of the
structure of information
– among people
– among software agents
• To enable reuse of domain knowledge
– to avoid “re-inventing the wheel”
– to introduce standards to allow interoperability
What to Reuse? (II)
• General ontologies
– DMOZ (www.dmoz.org)
– WordNet (www.cogsci.princeton.edu/~wn/)
• Domain-specific ontologies
– UMLS Semantic Net
– GO (Gene Ontology) (www.geneontology.org)
– FMA (Foundational Model of Anatomy)
Quick Time™ a nd a
TIFF ( Un compr ess ed ) d eco mp res so r
ar e n eed ed to s ee this pi ctur e.
Foundational Model of
Anatomy
http://sig.biostr.washington.edu/projects/fm/index.html
• Reference ontology for biomedical
informatics
• Representation of Anatomical Entities and
Relationships
• Symbolic modeling of the structure of the
human body at the highest level of granularity
• Evolving Resource for knowledge-based
applications requiring anatomical information
FMA: Modeling Challenges
• Representing complex structural relations
• Representing different levels of granularity
• Developing a model that is scalable to a very
large number of concepts
• Using consistent organizational principles