No Slide Title
Download
Report
Transcript No Slide Title
Annotating with GO: an overview
http://www.geneontology.org/
What is a Gene Ontology (GO) annotation?
Databases external to GO make cross-links between GO terms and objects in
their databases (typically, gene products, or their surrogates, genes), and
then provide tables of these links to GO. The GO itself contains no
information about genes or gene products. The GO annotation (‘gene
association’) files are all publicly available:
Database name
abbreviation
A gene product is annotated to one or
more terms in each of the three
ontologies; biological process, cellular
component and molecular function.
http://www.geneontology.org/#annotations
Abbreviations used by GO are described here:
http://www.geneontology.org/doc/GO.xrf_abbs
Gene products are annotated to the
most specific GO term possible for the
information available.
Example annotation:
Database Object identifier. A
Database Object is usually a
gene product, but can also
be a gene or a transcript.
Used when it is specified
in the source that that a
gene product is NOT
associated
with
a
particular gene product
e.g. “we have found that
protein Z is not involved
in the X cascade”.
DB
DB_Object_ID
DB_Object_
Symbol
SGD
S0000296
PHO3
GO:0015888
SGD
S0000296
PHO3
GO:0003993
[NOT]
A gene product is annotated
DB:Reference
(|DB:Reference)
go_id
SGD:8789|PMID:267
6709
SGD:8789|PMID:267
6709
Evidence
With
Aspect
DB_Object_Name
(|Name)
DB_Object_Synonym
(|Synonym)
DB_Object_
Type
Taxo n
(|taxon)
Date
with terms reflecting only its normal
activities, locations and processes.
IMP
P
YBR092C
gene
taxon:4932
20001122
IMP
F
YBR092C
gene
taxon:4932
20001122
When there is no information regarding
one or more aspects of a gene product,
the gene product is annotated to the
GO term ‘unknown’.
Fields highlighted in grey are mandatory
Gene Ontology term
identifier
Object type: gene,
transcript or protein
Annotation of a gene product to one
ontology is independent of its
annotation to the other two ontologies.
The
annotation
of
P = biological process, F
Taxonomic identifier
= molecular function
gene products to GO
for gene product
and C = cellular
terms
is
performed
according
to
component.
two main principles: the recording of the
source of the annotation and the type
of evidence on which
the annotation was
based.
The source of an annotation may be a
The evidence describes how the annotation
literature reference, a database record or the
was created, and provides a way of measuring
type of computational anaylsis. Literature
its strength or reliability. GO has developed a
references are entered as an accession
set of standard evidence codes which form a
number, either from the database in question
loose hierarchy, with ‘inferred by electronic
and/or from PubMed. Annotations based on
annotation’ (IEA) being the least reliable type
computational analysis include a reference to
of evidence, followed by ‘inferred by sequence
the method of analysis.
similarity’ (ISS).
Evidence codes
IDA inferred from direct assay
IC
IEP inferred from expression pattern
inferred by curator
IMP inferred from mutant phenotype
IEA inferred from electronic annotation
IGI
inferred from genetic interaction
TAS traceable author statement
IPI
inferred from physical interaction
NAS non-traceable author statement
ISS inferred from sequence similarity
ND no biological data available
Collaborating databases
Many important databases produce GO annotations and contribute to the development of the GO. These include:
FlyBase (database for the fruitfly Drosophila melanogaster), Berkeley Drosophila Genome Project (Drosophila informatics; GO database & software), Saccharomyces Genome
Database (SGD) (database for the budding yeast Saccharomyces cerevisiae), Mouse Genome Database (MGD) & Gene Expression Database (GXD) (databases for the mouse
Mus musculus), The Arabidopsis Information Resource (TAIR) (database for the brassica family plant Arabidopsis thaliana), WormBase (database for the nematode
Caenorhabditis elegans), PomBase (database for the fission yeast Schizosaccharomyces pombe), Rat Genome Database (RGD) (database for the rat Rattus norvegicus),
DictyBase (informatics resource for the slime mold Dictyostelium discoideum), The Pathogen Sequencing Unit (The Wellcome Trust Sanger Institute), Genome Knowledge Base
(GKB) (Cold Spring Harbor Laboratory), EBI : InterPro - SWISS-PROT - TrEMBL groups, The Institute for Genomic Research (TIGR), Gramene (A Comparative Mapping
Resource for Monocots), Compugen (with its Internet Research Engine).