Transcript Document

Part II GO-Vocabulary
of Genome
S. cerevisiae
D. melanogaster
C elegans
Cells that normally survive
CED-9
ON
CED-3
CED-4
OFF
Cells that normally die
CED-9
OFF
CED-3
CED-4
ON
M. musculus
Comparison of sequences
from 4 organisms
MCM3
MCM2
CDC46/MCM5
CDC47/MCM7
CDC54/MCM4
MCM6
These proteins form a hexamer in the species that have been examined
The Gene Ontologies
A Common Language for Annotation of
Genes from
Yeast, Flies and Mice
…and Plants and Worms
…and Humans
…and anything else!
Gene Ontology - 1998
FlyBase
Drosophila
Cambridge, EBI, Harvard
Berkeley & Bloomington.
SGD
Saccharomyces
Stanford.
MGI
Mus
Jackson Labs., Bar Harbor.
Gene Ontology -now
•
•
•
•
•
•
•
•
•
•
•
Fruitfly - FlyBase
Budding yeast - Saccharomyces Genome Database (SGD)
Mouse - Mouse Genome Database (MGD & GXD)
Rat - Rat Genome Database (RGD)
Weed - The Arabidopsis Information Resource (TAIR)
Worm - WormBase
Dictyostelium discoidem - Dictybase
InterPro/UniProt at EBI - InterPro
Fission yeast - Pombase
Human - UniProt, Ensembl, NCBI, Incyte, Celera, Compugen
Parasites - Plasmodium, Trypanosoma, Leishmania - GeneDB Sanger
• Microbes - Vibrio, Shewanella, B. anthracus, … - TIGR
• Grasses - rice & maize - Gramene database
• zebra fish – Zfin
.........
To provide
structured controlled vocabularies
for the
representation of biological knowledge
in
biological databases.
• Be open source
• Use open standards
• Make data & code available
without constraint
• Involve your community
Outline
• Introduction to the Gene Ontologies (GO)
• Annotations to GO terms
• GO Tools
• Applications of GO
Gene Ontology Objectives
• GO represents concepts used to classify
specific parts of our biological knowledge:
– Biological Process
– Molecular Function
– Cellular Component
• GO develops a common language applicable
to any organism
• GO terms can be used to annotate gene
products from any species, allowing
comparison of information across species
GO: Three ontologies
What does it do?
Molecular Function
What processes is it
involved in?
Biological Process
Where does it act?
Cellular Component
gene product
Example:
Gene Product = hammer
Function (what)
Process (why)
Drive nail (into wood)
Carpentry
Drive stake (into soil)
Gardening
Smash roach
Pest Control
Clown’s juggling object
Entertainment
Biological Examples
Biological Process
Molecular Function
Cellular Component
The 3 Gene Ontologies
• Molecular Function = elemental activity/task
– the tasks performed by individual gene products; examples are
carbohydrate binding and ATPase activity
• Biological Process = biological goal or
objective
– broad biological goals, such as mitosis or purine metabolism, that are
accomplished by ordered assemblies of molecular functions
• Cellular Component = location or complex
– subcellular structures, locations, and macromolecular complexes; examples
include nucleus, telomere, and RNA polymerase II holoenzyme
Molecular Function
• A single reaction or activity, not a gene
product
• A gene product may have several functions
• Sets of functions make up a biological
process
Molecular Function
Carbonate dehydratase activity
Biological Process
Gluconeogenesis
Cellular Component
• where a gene product acts
Mitochondrial membrane
What’s in a GO term?
term: gluconeogenesis
id: GO:0006094
definition: The formation of glucose from
noncarbohydrate precursors, such as
pyruvate, amino acids and glycerol.
What’s in a name?
Content of GO
Molecular Function
Biological Process
Cellular Component
7,309 terms
10,041 terms
1,629 terms
Total
18, 975 terms
Definitions:
Obsolete terms:
94.9 %
992
As of October 2005
What’s in a name?
•
•
•
•
•
Glucose synthesis
Glucose biosynthesis
Glucose formation
Glucose anabolism
Gluconeogenesis
• All refer to the process of making glucose
from simpler components
tree
directed acyclic
graph
Parent-Child Relationships
Nucleus
Nucleoplasm
A child is a subset of
a parent’s elements
Nuclear
envelope
Nucleolus Chromosome Perinuclear space
The cell component term
Nucleus has 5 children
Ontology Relationships
Directed Acyclic Graph
Evidence Codes for GO
Annotations
http://www.geneontology.org/doc/GO.evidence.html
IEA
ISS
IEP
IMP
IGI
IPI
IDA
RCA
TAS
NAS
IC
ND
Inferred from Electronic Annotation
Inferred from Sequence Similarity
Inferred from Expression Pattern
Inferred from Mutant Phenotype
Inferred from Genetic Interaction
Inferred from Physical Interaction
Inferred from Direct Assay
Inferred from Reviewed Computational Analysis
Traceable Author Statement
Non-traceable Author Statement
Inferred by Curator
No biological Data available
IEA
Inferred from Electronic Annotation
• Sequence Similarity (BLAST)
• Automatic transfer from mappings
(InterPro2GO, EC2GO etc.)
-> Not manually reviewed
ISS
Inferred from Sequence or Structural
Similarity
• Sequence similarity
• Recognized domains
• Structural similarity
-> Use of ‘with’ column recommended
IEP
Inferred from Expression Pattern
• Transcript levels (Northerns, microarrays)
• Protein levels (Western blots)
-> Timing or localization of expression
-> Biological process annotations
IMP
Inferred from Mutant Phenotype
• Gene mutation/knockout
• Overexpression/ectopic expression
• Anti-sense experiments
• RNAi experiments
• Specific protein inhibitors
IGI
Inferred from Genetic Interaction
• Suppressors, synthetic lethals…
• Functional complementation
• Rescue experiments
-> Use of ‘with’ column recommended
IPI
Inferred from Physical Interaction
• 2-hybrid interactions
• Co-purification
• Co-immunoprecipitation
• Ion/complex/protein binding experiments
-> Use of ‘with’ column recommended
IDA
Inferred from Direct Assay
• Enzyme assays
• In vitro reconstitution (e.g. transcription)
• Immunofluorescence (for cell. comp.)
• Cell fractionation (for cell. comp.)
• Physical interaction/binding assay
RCA
Inferred from Reviewed Computational
Analysis
• Non-sequence-based computational methods
• Genome-wide analyses (e.g. 2-hybrid)
• Combinations of large-scale experiments
TAS
Traceable Author Statement
• Support from review article
• Textbook ‘common knowledge’
-> Data that can be ‘traced’ back
NAS
Non-traceable Author Statement
• Database entries that don't cite a paper
-> Data that cannot be ‘traced’ back
IC
Inferred by Curator
• Not supported by any direct evidence
• Inferred from other GO annotations
-> GO term in ‘with/from’ column required
ND
No biological Data available
Curator found no information supporting any annotation
• molecular function unknown GO:0005554
• biological process unknown GO:0000004
• cellular component unknown GO:0008372
Term Hierarchy
TAS/IDA
IMP/IGI/IPI
ISS/IEP
NAS
IEA
Annotation summaries
Meloidogyne incognita: McCarter et al. 2003
Annotation of gene products
with GO terms
Mitochondrial P450
Cellular component:
mitochondrial inner membrane
GO:0005743
Biological process:
Electron transport
GO:0006118
substrate + O2 = CO2 +H20 product
Molecular function:
monooxygenase activity
GO:0004497
Other gene products annotated to
monooxygenase activity (GO:0004497)
- monooxygenase, DBH-like 1
(mouse)
- prostaglandin I2 (prostacyclin) synthase (mouse)
- flavin-containing monooxygenase (yeast)
- ferulate-5-hydrolase 1
(arabidopsis)
Unknown v.s. Unannotated
• “Unknown” is used when the curator has
determined that there is no existing literature
to support an annotation.
– Biological process unknown GO:0000004
– Molecular function unknown GO:0005554
– Cellular component unknown GO:0008372
• NOT the same as having no annotation at all
– No annotation means that no one has looked yet
Annotation of a genome
• GO annotations are always work in progress
• Part of normal curation process
– More specific information
– Better evidence code
• Replace obsolete terms
• “Last reviewed” date
How to access the Gene ontology
and its annotations
1. Downloads
• Ontologies
• Annotations : Gene association files
• Ontologies and Annotations
2. Web-based access
• AmiGO
(http://www.godatabase.org)
• QuickGO
(http://www.ebi.ac.uk/ego)
among others…
Gene Ontology:
…analysis of high-throughput data according to GO
MicroArray data analysis
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Selected Gene
Tree:
pearson
Coloredby:
by:
ene Tree:
pearson
lw n3d
... lw n3d ... Colored
Branch color
classification:
Set_LW_n3d_5p_...
Gene
List:
r classification:
Set_LW_n3d_5p_...
Gene
List:
Bregje Wertheim at the Centre for Evolutionary Genomics,
Department of Biology, UCL and Eugene Schuster Group, EBI.
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
allall
genes
(14010)(14010)
genes
Developmental
Stage
Molecular
Disease
Metabolic
Ontologies
Pathway
Phenotype
Anatomy
Physiology