GO and OBO: - Gene Ontology

Download Report

Transcript GO and OBO: - Gene Ontology

GO and OBO:
an introduction
• What is the Gene Ontology?
• What is OBO?
• OBO-Edit demo & practical
Jane Lomax EMBL-EBI
Gene Ontology
• Built for a very specific purpose:
“annotation of genes and proteins in
genomic and protein databases”
• Applicable to all species
Jane Lomax EMBL-EBI
Evolution of GO
• Original GO created in 2000
• Three databases involved:
– FlyBase (Drosophila)
– MGI (Mouse)
– SGD (S. cerevisae)
• Used immediately
Jane Lomax EMBL-EBI
Evolution of GO
• Later databases:
–
–
–
–
TAIR (Arabadopsis)
TIGR (microbes including prokaryotes)
SWISS-PROT (several thousand species inc. human)
PSU (P. falciparum)
• Recent additions
– ZFIN (zebrafish)
– PAMGO (plant pathogens)
Jane Lomax EMBL-EBI
Evolution of GO
• GO development traditionally
annotation-driven
– development directed by use
• Terms added as new species annotated
• Terms added on as as-needed basis
Jane Lomax EMBL-EBI
Evolution of GO
• Developed by an international
consortium of biologists and computer
scientists
– members from individual databases
– central office at EBI
• Development involves collaboration with
domain experts from different biological
fields
– also formal ontologists
Jane Lomax EMBL-EBI
Evolution of GO
• Resulted in ‘organic’ structure, little
formality
• Ontological formality added
subsequently
– philosophical and logical
Jane Lomax EMBL-EBI
Ja
n0
Ap 1
r0
Ju 1
l0
O 1
ct
-0
Ja 1
n0
Ap 2
r0
Ju 2
l0
O 2
ct
-0
Ja 2
n0
Ap 3
r0
Ju 3
l0
O 3
ct
-0
Ja 3
n0
Ap 4
r0
Ju 4
l0
O 4
ct
-0
Ja 4
n0
Ap 5
r0
Ju 5
l0
O 5
ct
-0
Ja 5
n0
Ap 6
r0
Ju 6
l0
O 6
ct
-0
Ja 6
n07
Number of terms
Growth of GO
GO term history 2001 - 2007
30000
25000
20000
15000
obs olete
undefined terms
defined terms
10000
5000
0
Dat e
Jane Lomax EMBL-EBI
How does GO work?
What information might we want to
capture about a gene product?
• What does the gene product do?
• Where and when does it act?
• Why does it perform these activities?
Jane Lomax EMBL-EBI
GO structure
• GO terms divided into three parts:
– cellular component
– molecular function
– biological process
Jane Lomax EMBL-EBI
Cellular Component
• where a gene product acts
Cellular Component
Cellular Component
Cellular Component
• Enzyme complexes in the component
ontology refer to places, not activities.
Molecular Function
• activities or “jobs” of a gene product
glucose-6-phosphate isomerase activity
Molecular Function
insulin binding
insulin receptor activity
Molecular Function
drug transporter activity
Molecular Function
• A gene product may have several
functions; a function term refers to
a single reaction or activity, not a
gene product.
• Sets of functions make up a
biological process.
Jane Lomax EMBL-EBI
Biological Process
a commonly recognized series of
events
cell division
Biological Process
transcription
Biological Process
regulation of gluconeogenesis
Biological Process
limb development
Biological Process
courtship behavior
Ontology Structure
• Terms are linked by two relationships
– is-a
– part-of


Jane Lomax EMBL-EBI
Ontology Structure
cell
membrane
mitochondrial
membrane
is-a
part-of
chloroplast
chloroplast
membrane
Jane Lomax EMBL-EBI
Ontology Structure
• Ontologies are structured as a
hierarchical directed acyclic graph
(DAG)
• Terms can have more than one
parent and zero, one or more
children
Jane Lomax EMBL-EBI
Ontology Structure
cell
membrane
mitochondrial
membrane
Directed Acyclic Graph
(DAG) - multiple
parentage allowed
chloroplast
chloroplast
membrane
Jane Lomax EMBL-EBI
Open Biomedical Ontologies (OBO)
• GO is a member of OBO
• An umbrella project for grouping
different ontologies in
biological/medical field
– a repository for ontologies with defined
set of standards
• Available from a single source:
http://obo.sourceforge.net/
Jane Lomax EMBL-EBI
Why do we need OBO?
• GO covers small area of biology:
– molecular function of a protein
– biological function of a protein
– cellular location of a protein
Jane Lomax EMBL-EBI
Why do we need OBO?
• Lots of other aspects that also need
to be captured, e.g.:
–
–
–
–
phenotype
anatomy
genomic
taxonomy
Jane Lomax EMBL-EBI
Why do we need OBO?
• Many groups develop their own ontologies
– e.g. plant ontology, anatomies for specific organisms
• No standardisation of ontologies with respect to:
– format
– scope
– relationships
• No way of knowing whether such ontologies
already exist
• No mechanism of distribution for other groups
Jane Lomax EMBL-EBI
Why do we need OBO?
• Creating ontologies takes a lot of
work
– Makes sense to reuse existing
ontologies where possible
• Improves data integration where
small set of ontologies used
• Allows ontologies to be made
available from a single place
Jane Lomax EMBL-EBI
Why do we need OBO?
• Ultimate aim: a complete set of
integrated ontologies completely
covering the biomedical domain
Jane Lomax EMBL-EBI
OBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any
constraint
Jane Lomax EMBL-EBI
OBO requirements: open
• Ontologies can be used by anyone
without any constraints, except:
– original authors are acknowledged
– cannot be edited and then released
under same name
Jane Lomax EMBL-EBI
OBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any
constraint
• Be in a common shared syntax
Jane Lomax EMBL-EBI
OBO requirements: syntax
• Usually the OBO format, same as primary
GO format
– and adaptions of OBO format
• Also accept OWL (Web Ontology
Language) format
• Allows the same tools to be applied,
facilitating shared software
implementations
Jane Lomax EMBL-EBI
Anatomy of an OBO term
id: GO:0006094
unique ID
name: gluconeogenesis
term name
ontology
namespace: process
def: The formation of glucose from
noncarbohydrate precursors, such as
definition
pyruvate, amino acids and glycerol.
[http://cancerweb.ncl.ac.uk/omd/index.html]
exact_synonym: glucose biosynthesis
synonym
xref_analog: MetaCyc:GLUCONEO-PWY
database ref
is_a: GO:0006006
parentage
is_a: GO:0006092
Jane Lomax EMBL-EBI
OBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any
constraint
• Be in a common shared syntax
• Not overlap with other ontologies in OBO
Jane Lomax EMBL-EBI
OBO requirements:
overlapping
• Ontologies can (and should)
overlap partially, but large overlap
should be avoided
• Idea is that terms from different
ontologies can be combined to
form new terms
• Striving for accepted standards
rather than competition
Jane Lomax EMBL-EBI
OBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any
constraint
• Be in a common shared syntax
• Not overlap with other ontologies in OBO
• Share a unique identifier space
Jane Lomax EMBL-EBI
OBO requirements: id
space
• So, for example, the GO identifier is
“GO”:
– No other OBO ontology could use this
id space
• Prevents problems where multiple
ontologies are used together
Jane Lomax EMBL-EBI
OBO requirements
To be part of OBO, ontologies must:
• Be open, can be used by all without any
constraint
• Be in a common shared syntax
• Not overlap with other ontologies in OBO
• Share a unique identifier space
• Include text definitions of their terms
Jane Lomax EMBL-EBI
OBO requirements
• In addition, OBO includes ontology
of relationships
– all ontologies should use these
definitions of relationships
• For example
– part_of
– develops_from
– regulates
Jane Lomax EMBL-EBI
What’s available
• demo:
http://obo.sourceforge.net/
Jane Lomax EMBL-EBI
Editing ontologies
• GO is edited using OBO-Edit
– stand-alone Java application
– available for all platforms
– browse, create or edit any ontology in
OBO format
Jane Lomax EMBL-EBI
OBO-Edit demo
• Browsing ontologies
–
–
–
–
–
loading ontologies (including loading multiple ontologies)
graph viewer
reasoner/single relationship views
searching/filtering/rendering
help
• Creating/editing ontologies
–
–
–
–
–
–
creating a new ontology
adding terms
copying/moving/deleting terms
adding definitions, dbxrefs etc
verification plugin
saving ontologies
Jane Lomax EMBL-EBI