2005-06_AnnotCamp_IntroGO_panel1

Download Report

Transcript 2005-06_AnnotCamp_IntroGO_panel1

The Gene Ontologies

A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!

G

ene

O

ntology Objectives

• GO represents concepts used to classify specific parts of our biological knowledge: – Biological Process – Molecular Function – Cellular Component • GO develops a common language applicable to any organism • GO terms can be used to annotate gene products from any species, allowing comparison of information across species

Expansion of Sequence Info

Entering the Genome Sequencing Era

Eukaryotic Genome Sequences Year Yeast (

S. cerevisiae

) Worm (

C. elegans

) Fly (

D. melanogaster

) Plant (

A. thaliana

) Human (

H. sapiens

, 1st Draft ) 1996 1998 2000 2001 2001 Genome Size (Mb) 12 97 120 125 ~3000 # Genes 6,000 19,100 13,600 25,500 ~35,000

Baldauf

et al.

Science

290

(2000) :972

Comparison of sequences from 4 organisms

MCM3 MCM2 CDC46/MCM5 CDC47/MCM7 CDC54/MCM4 MCM6

These proteins form a hexamer in the species that have been examined

http://www.geneontology.org/

Outline of Topics • Introduction to the Gene Ontologies (GO) • Annotations to GO terms • GO Tools • Applications of GO

What is an Ontology ? (from OED)

1721

B AILEY ,

Ontology

, an Account of being in the Abstract .

1733

(

title

) A Brief Scheme of Ontology or the Science of Being in General.

a1832

B ENTHAM

Fragm. Ontol.

Wks. 1843 VIII. 195 The field of ontology, or as it may otherwise be termed, the field of supremely abstract entities, is a yet untrodden labyrinth.

1884

B OSANQUET tr.

Lotze's Metaph.

22 Ontology..as a doctrine of the being and relations of all reality, had precedence given to it over Cosmology and Psychology, the two branches of enquiry which follow the reality into its opposite distinctive forms.

Sriniga Srinivasan, Chief Ontologist, Yahoo!

The ontology. Dividing human knowledge into a clean set of categories is a lot like trying to figure out where to find that suspenseful black comedy at your corner video store. Questions inevitably come up, like are Movies part of Art or Entertainment? (Yahoo! lists them under the latter.) -Wired Magazine, May 1996

The 3

G

ene

O

ntologies

Molecular Function

= elemental activity/task – the tasks performed by individual gene products; examples are

carbohydrate binding

and

ATPase activity

Biological Process

= biological goal or objective – broad biological goals, such as

mitosis

or

purine metabolism

, that are accomplished by ordered assemblies of molecular functions •

Cellular Component

= location or complex – subcellular structures, locations, and macromolecular complexes; examples include

nucleus

,

telomere

, and

RNA polymerase II holoenzyme

Example: Gene Product = hammer

Function

(what)

Process

(why) Drive nail (into wood) Drive stake (into soil) Smash roach Carpentry Gardening Pest Control Clown’s juggling object Entertainment

Biological Examples

Terms, Definitions, IDs

term

: MAPKKK cascade (mating sensu Saccharomyces)

goid

: GO:0007244

definition

: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in mating pheromone signal, as described in Saccharomyces

definition_reference

: PMID:9561267

comment

: This term was made obsolete because it is a gene product specific term. To update annotations, use the biological process term 'signal transduction during conjugation with cellular fusion ; GO:0000750'.

Directed Cyclic Graph

Figure 4.1

. Life cycles of heterothallic and homothallic strains of

S. cerevisiae

. Heterothallic strains can be stably maintained as diploids and haploids, whereas homothallic strains are stable only as diploids, because the transient haploid cells switch their mating type, and mate.

An Introduction to the Genetics and Molecular Biology of the Yeast Saccharomyces cerevisiae

Fred Sherman 2000; Modified from: F. Sherman, Yeast genetics. The Encyclopedia of Molecular Biology and Molecular Medicine, pp. 302-325, Vol. 6. Edited by R. A. Meyers, VCH Pub., Weinheim, Germany,1997.

Parent-Child Relationships

A child is a subset of a parent’s elements Nucleus

Nucleoplasm Nuclear envelope Nucleolus Chromosome Perinuclear space

The cell component term

Nucleus

has 5 children

“Tree” Relationships

Derivation of Romance languages from Latin. From R.A. Hall Jr., Introductory Linguistics; originally published by Chilton Books, now distributed by Rand McNally & Co.

Ontology Relationships

Directed Acyclic Graph http://www.ebi.ac.uk/ego

Evidence Codes for GO Annotations

http://www.geneontology.org/doc/GO.evidence.html

IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND I nferred from E lectronic A nnotation I nferred from S equence S imilarity I nferred from E xpression P attern I nferred from M utant P henotype I nferred from G enetic I nteraction I nferred from P hysical I nteraction I nferred from D irect A ssay Inferred from R eviewed C omputational A nalysis T raceable A uthor S tatement N on-traceable A uthor S tatement I nferred by C urator N o biological D ata available

IEA

I nferred from E lectronic A nnotation

Sequence Similarity (BLAST)

Automatic transfer from mappings (InterPro2GO, EC2GO etc.) -> Not manually reviewed

ISS

I nferred from S equence or S tructural Similarity

Sequence similarity

Recognized domains

Structural similarity -> Use of ‘with’ column recommended

IEP

I nferred from E xpression P attern

Transcript levels (Northerns, microarrays)

Protein levels (Western blots) -> Timing or localization of expression -> Biological process annotations

IMP

I nferred from M utant P henotype

• •

Gene mutation/knockout Overexpression/ectopic expression

Anti-sense experiments

RNAi experiments

Specific protein inhibitors

IGI

I nferred from G enetic I nteraction

Suppressors, synthetic lethals…

Functional complementation

Rescue experiments -> Use of ‘with’ column recommended

IPI

I nferred from P hysical I nteraction

• • • •

2-hybrid interactions Co-purification Co-immunoprecipitation Ion/complex/protein binding experiments -> Use of ‘with’ column recommended

IDA

I nferred from D irect A ssay

• • • • •

Enzyme assays In vitro reconstitution (e.g. transcription) Immunofluorescence (for cell. comp.) Cell fractionation (for cell. comp.) Physical interaction/binding assay

RCA

Inferred from R eviewed C omputational A nalysis

Non-sequence-based computational methods

Genome-wide analyses (e.g. 2-hybrid)

Combinations of large-scale experiments

TAS

T raceable A uthor S tatement

Support from review article

Textbook ‘common knowledge’ -> Data that can be ‘traced’ back

NAS

N on-traceable A uthor S tatement

Database entries that don't cite a paper -> Data that cannot be ‘traced’ back

IC

I nferred by C urator

Not supported by any direct evidence

Inferred from other GO annotations > GO term in ‘with/from’ column required

ND

N o biological D ata available

Curator found no information supporting any annotation

molecular function unknown GO:0005554

biological process unknown GO:0000004

cellular component unknown GO:0008372

Term Hierarchy

TAS/IDA IMP/IGI/IPI

ISS/IEP NAS IEA

Qualifiers

The qualifier modifies the interpretation of a GO term

NOT

: explicit note that a gene product is

not

associated with a GO term

colocalizes_with

: only transient localization, or low resolution of an assay

contributes_to

: gene product that is part of a complex can be annotated to the process/function of the complex

http://www.geneontology.org/GO.annotation.shtml#qual

http://www.geneontology.org/doc/GO.evidence.html