Transcript 2005-06_AnnotCamp_IntroGO_panel1
The Gene Ontologies
A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!
G
ene
O
ntology Objectives
• GO represents concepts used to classify specific parts of our biological knowledge: – Biological Process – Molecular Function – Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species
Expansion of Sequence Info
Entering the Genome Sequencing Era
Eukaryotic Genome Sequences Year Yeast (
S. cerevisiae
) Worm (
C. elegans
) Fly (
D. melanogaster
) Plant (
A. thaliana
) Human (
H. sapiens
, 1st Draft ) 1996 1998 2000 2001 2001 Genome Size (Mb) 12 97 120 125 ~3000 # Genes 6,000 19,100 13,600 25,500 ~35,000
Baldauf
et al.
Science
290
(2000) :972
Comparison of sequences from 4 organisms
MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6
These proteins form a hexamer in the species that have been examined
http://www.geneontology.org/
Outline of Topics • Introduction to the Gene Ontologies (GO) • Annotations to GO terms • GO Tools • Applications of GO
What is an Ontology ? (from OED)
1721
B AILEY ,
Ontology
, an Account of being in the Abstract .
1733
(
title
) A Brief Scheme of Ontology or the Science of Being in General.
a1832
B ENTHAM
Fragm. Ontol.
Wks. 1843 VIII. 195 The field of ontology, or as it may otherwise be termed, the field of supremely abstract entities, is a yet untrodden labyrinth.
1884
B OSANQUET tr.
Lotze's Metaph.
22 Ontology..as a doctrine of the being and relations of all reality, had precedence given to it over Cosmology and Psychology, the two branches of enquiry which follow the reality into its opposite distinctive forms.
Sriniga Srinivasan, Chief Ontologist, Yahoo!
The ontology. Dividing human knowledge into a clean set of categories is a lot like trying to figure out where to find that suspenseful black comedy at your corner video store. Questions inevitably come up, like are Movies part of Art or Entertainment? (Yahoo! lists them under the latter.) -Wired Magazine, May 1996
The 3
G
ene
O
ntologies
•
Molecular Function
= elemental activity/task – the tasks performed by individual gene products; examples are
carbohydrate binding
and
ATPase activity
•
Biological Process
= biological goal or objective – broad biological goals, such as
mitosis
or
purine metabolism
, that are accomplished by ordered assemblies of molecular functions •
Cellular Component
= location or complex – subcellular structures, locations, and macromolecular complexes; examples include
nucleus
,
telomere
, and
RNA polymerase II holoenzyme
Example: Gene Product = hammer
Function
(what)
Process
(why) Drive nail (into wood) Drive stake (into soil) Smash roach Carpentry Gardening Pest Control Clown’s juggling object Entertainment
Biological Examples
Terms, Definitions, IDs
term
: MAPKKK cascade (mating sensu Saccharomyces)
goid
: GO:0007244
definition
: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in mating pheromone signal, as described in Saccharomyces
definition_reference
: PMID:9561267
comment
: This term was made obsolete because it is a gene product specific term. To update annotations, use the biological process term 'signal transduction during conjugation with cellular fusion ; GO:0000750'.
Directed Cyclic Graph
Figure 4.1
. Life cycles of heterothallic and homothallic strains of
S. cerevisiae
. Heterothallic strains can be stably maintained as diploids and haploids, whereas homothallic strains are stable only as diploids, because the transient haploid cells switch their mating type, and mate.
An Introduction to the Genetics and Molecular Biology of the Yeast Saccharomyces cerevisiae
Fred Sherman 2000; Modified from: F. Sherman, Yeast genetics. The Encyclopedia of Molecular Biology and Molecular Medicine, pp. 302-325, Vol. 6. Edited by R. A. Meyers, VCH Pub., Weinheim, Germany,1997.
Parent-Child Relationships
A child is a subset of a parent’s elements Nucleus
Nucleoplasm Nuclear envelope Nucleolus Chromosome Perinuclear space
The cell component term
Nucleus
has 5 children
“Tree” Relationships
Derivation of Romance languages from Latin. From R.A. Hall Jr., Introductory Linguistics; originally published by Chilton Books, now distributed by Rand McNally & Co.
Ontology Relationships
Directed Acyclic Graph http://www.ebi.ac.uk/ego
Evidence Codes for GO Annotations
http://www.geneontology.org/doc/GO.evidence.html
IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND I nferred from E lectronic A nnotation I nferred from S equence S imilarity I nferred from E xpression P attern I nferred from M utant P henotype I nferred from G enetic I nteraction I nferred from P hysical I nteraction I nferred from D irect A ssay Inferred from R eviewed C omputational A nalysis T raceable A uthor S tatement N on-traceable A uthor S tatement I nferred by C urator N o biological D ata available
IEA
I nferred from E lectronic A nnotation
•
Sequence Similarity (BLAST)
•
Automatic transfer from mappings (InterPro2GO, EC2GO etc.) -> Not manually reviewed
ISS
I nferred from S equence or S tructural Similarity
•
Sequence similarity
•
Recognized domains
•
Structural similarity -> Use of ‘with’ column recommended
IEP
I nferred from E xpression P attern
•
Transcript levels (Northerns, microarrays)
•
Protein levels (Western blots) -> Timing or localization of expression -> Biological process annotations
IMP
I nferred from M utant P henotype
• •
Gene mutation/knockout Overexpression/ectopic expression
•
Anti-sense experiments
•
RNAi experiments
•
Specific protein inhibitors
IGI
I nferred from G enetic I nteraction
•
Suppressors, synthetic lethals…
•
Functional complementation
•
Rescue experiments -> Use of ‘with’ column recommended
IPI
I nferred from P hysical I nteraction
• • • •
2-hybrid interactions Co-purification Co-immunoprecipitation Ion/complex/protein binding experiments -> Use of ‘with’ column recommended
IDA
I nferred from D irect A ssay
• • • • •
Enzyme assays In vitro reconstitution (e.g. transcription) Immunofluorescence (for cell. comp.) Cell fractionation (for cell. comp.) Physical interaction/binding assay
RCA
Inferred from R eviewed C omputational A nalysis
•
Non-sequence-based computational methods
•
Genome-wide analyses (e.g. 2-hybrid)
•
Combinations of large-scale experiments
TAS
T raceable A uthor S tatement
•
Support from review article
•
Textbook ‘common knowledge’ -> Data that can be ‘traced’ back
NAS
N on-traceable A uthor S tatement
•
Database entries that don't cite a paper -> Data that cannot be ‘traced’ back
IC
I nferred by C urator
•
Not supported by any direct evidence
•
Inferred from other GO annotations > GO term in ‘with/from’ column required
ND
N o biological D ata available
Curator found no information supporting any annotation
•
molecular function unknown GO:0005554
•
biological process unknown GO:0000004
•
cellular component unknown GO:0008372
Term Hierarchy
TAS/IDA IMP/IGI/IPI
ISS/IEP NAS IEA
Qualifiers
The qualifier modifies the interpretation of a GO term
NOT
: explicit note that a gene product is
not
associated with a GO term
colocalizes_with
: only transient localization, or low resolution of an assay
contributes_to
: gene product that is part of a complex can be annotated to the process/function of the complex
http://www.geneontology.org/GO.annotation.shtml#qual
http://www.geneontology.org/doc/GO.evidence.html