Transcript Slide 1
GENE ONTOLOGY FOR THE NEWBIES
Suparna Mundodi, PhD
The Arabidopsis Information Resources, Stanford, CA
The Gene Ontologies
A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!
Outline of Topics
Introduction to the Gene Ontologies (GO) Annotations to GO terms GO Tools Applications of GO
G
ene
O
ntology
Gene annotation system Controlled vocabulary that can be applied to all organisms Used to describe gene products
What ’ s in a name?
What is a cell?
Cell
Cell
Cell
Cell
Cell
Image from http://microscopy.fsu.edu
Bud initiation?
= bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae
What ’ s in a name?
The same name can be used to describe different concepts
What’s in a name?
What’s in a name?
Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis All refer to the process of making glucose from simpler components
What ’ s in a name?
The same name can be used to describe different concepts A concept can be described using different
names
Comparison is difficult – in particular across species or across databases
What is the Gene Ontology?
A (part of the) solution: A controlled vocabulary that can be applied to all organisms Used to describe gene products - proteins and RNA - in any organism
How does GO work?
What information might we want to capture about a gene product?
What does the gene product do?
Why does it perform these activities?
Where does it act?
The 3
G
ene
O
ntologies
Molecular Function
= elemental activity/task the tasks performed by individual gene products; examples are
carbohydrate binding
and
ATPase activity
Biological Process
= biological goal or objective broad biological goals, such as
mitosis
or
purine metabolism
, that are accomplished by ordered assemblies of molecular functions
Cellular Component
= location or complex subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere , and
RNA polymerase II holoenzyme
Example: Gene Product = hammer
Function
(what)
Process
(why) Drive nail (into wood) Drive stake (into soil) Smash roach Carpentry Gardening Pest Control Clown’s juggling object Entertainment
Ontology Structure
Ontologies can be represented as graphs, where the nodes are connected by edges Nodes = concepts in the ontology Edges = relationships between the concepts
node node
edge
node
Ontology Structure
The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) Terms can have more than one parent and zero, one or more children Terms are linked by two relationships
is-a
part-of
Directed Acyclic Graphs (DAG)
protein complex organelle mitochondrion [other organelles] [other protein complexes] fatty acid beta-oxidation multienzyme complex is-a part-of
Parent-Child Relationships
A child is a subset of a parent’s elements Nucleus
Nucleoplasm Nuclear envelope Nucleolus Chromosome Perinuclear space
The cell component term
Nucleus
has 5 children
True Path Rule
The path from a child term all the way up to its top-level parent(s) must always be true cell cytoplasm chromosome nuclear chromosome cytoplasmic chromosome mitochondrial chromosome nucleus nuclear chromosome is-a part-of
What’s in a GO term?
term
: gluconeogenesis
id
: GO:0006094
definition
: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.
Annotation of gene products with GO terms
Mitochondrial P450
Cellular component: mitochondrial inner membrane GO:0005743
Biological process:
Electron transport GO:0006118
substrate + O 2 = CO 2 +H 2 0 product Molecular function:
monooxygenase activity GO:0004497
Other gene products annotated to
monooxygenase activity
(GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis)
Two types of GO Annotations:
Electronic Annotation Manual Annotation All annotations
must
: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association
IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND I nferred from E lectronic A nnotation I nferred from S equence S imilarity I nferred from E xpression P attern I nferred from M utant P henotype I nferred from G enetic I nteraction I nferred from P hysical I nteraction I nferred from D irect A ssay Inferred from R eviewed C omputational A nalysis T raceable A uthor S tatement N on-traceable A uthor S tatement I nferred by C urator N o biological D ata available
Ensuring Stability in a Dynamic Ontology
• Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component
Why modify the GO
GO reflects current knowledge of biology New organisms being added makes existing terms arrangements incorrect Not everything perfect from the outset
What can scientists do with GO?
• Access gene product functional information • Find how much of a proteome is involved in a process/ function/ component in the cell • Map GO terms and incorporate manual annotations into own databases • Provide a link between biological knowledge and … • gene expression profiles • proteomics data
Microarray analysis Whole genome analysis (J. D. Munkvold et al., 2004)
http://www.geneontology.org/GO.tools
Beyond GO – Open Biomedical Ontologies • Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions • Anatomies • Cell Types • Sequence Attributes • Temporal Attributes • Phenotypes • Diseases • More….
http://obo.sourceforge.net