Transcript Slide 1

GENE ONTOLOGY FOR THE NEWBIES

Suparna Mundodi, PhD

The Arabidopsis Information Resources, Stanford, CA

The Gene Ontologies

A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else!

Outline of Topics

 Introduction to the Gene Ontologies (GO)  Annotations to GO terms  GO Tools  Applications of GO

G

ene

O

ntology

Gene annotation system Controlled vocabulary that can be applied to all organisms Used to describe gene products

What ’ s in a name?

 What is a cell?

Cell

Cell

Cell

Cell

Cell

Image from http://microscopy.fsu.edu

Bud initiation?

= bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae

What ’ s in a name?

 The same name can be used to describe different concepts

What’s in a name?

What’s in a name?

     Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis  All refer to the process of making glucose from simpler components

What ’ s in a name?

 The same name can be used to describe different concepts  A concept can be described using different

names

 Comparison is difficult – in particular across species or across databases

What is the Gene Ontology?

A (part of the) solution: A controlled vocabulary that can be applied to all organisms Used to describe gene products - proteins and RNA - in any organism

How does GO work?

What information might we want to capture about a gene product?

What does the gene product do?

Why does it perform these activities?

Where does it act?

The 3

G

ene

O

ntologies

Molecular Function

= elemental activity/task  the tasks performed by individual gene products; examples are

carbohydrate binding

and

ATPase activity

Biological Process

 = biological goal or objective broad biological goals, such as

mitosis

or

purine metabolism

, that are accomplished by ordered assemblies of molecular functions 

Cellular Component

 = location or complex subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere , and

RNA polymerase II holoenzyme

Example: Gene Product = hammer

Function

(what)

Process

(why) Drive nail (into wood) Drive stake (into soil) Smash roach Carpentry Gardening Pest Control Clown’s juggling object Entertainment

Ontology Structure

Ontologies can be represented as graphs, where the nodes are connected by edges  Nodes = concepts in the ontology  Edges = relationships between the concepts

node node

edge

node

Ontology Structure

 The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG)  Terms can have more than one parent and zero, one or more children  Terms are linked by two relationships 

is-a

part-of

Directed Acyclic Graphs (DAG)

protein complex organelle mitochondrion [other organelles] [other protein complexes] fatty acid beta-oxidation multienzyme complex is-a part-of

Parent-Child Relationships

A child is a subset of a parent’s elements Nucleus

Nucleoplasm Nuclear envelope Nucleolus Chromosome Perinuclear space

The cell component term

Nucleus

has 5 children

True Path Rule

 The path from a child term all the way up to its top-level parent(s) must always be true cell  cytoplasm  chromosome   nuclear chromosome cytoplasmic chromosome   mitochondrial chromosome nucleus  nuclear chromosome is-a part-of  

What’s in a GO term?

term

: gluconeogenesis

id

: GO:0006094

definition

: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

Annotation of gene products with GO terms

Mitochondrial P450

Cellular component: mitochondrial inner membrane GO:0005743

Biological process:

Electron transport GO:0006118

substrate + O 2 = CO 2 +H 2 0 product Molecular function:

monooxygenase activity GO:0004497

Other gene products annotated to

monooxygenase activity

(GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast) - ferulate-5-hydrolase 1 (arabidopsis)

Two types of GO Annotations:

  Electronic Annotation Manual Annotation All annotations

must

: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association

IEA ISS IEP IMP IGI IPI IDA RCA TAS NAS IC ND I nferred from E lectronic A nnotation I nferred from S equence S imilarity I nferred from E xpression P attern I nferred from M utant P henotype I nferred from G enetic I nteraction I nferred from P hysical I nteraction I nferred from D irect A ssay Inferred from R eviewed C omputational A nalysis T raceable A uthor S tatement N on-traceable A uthor S tatement I nferred by C urator N o biological D ata available

Ensuring Stability in a Dynamic Ontology

• Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component

Why modify the GO

 GO reflects current knowledge of biology  New organisms being added makes existing terms arrangements incorrect  Not everything perfect from the outset

What can scientists do with GO?

• Access gene product functional information • Find how much of a proteome is involved in a process/ function/ component in the cell • Map GO terms and incorporate manual annotations into own databases • Provide a link between biological knowledge and … • gene expression profiles • proteomics data

Microarray analysis Whole genome analysis (J. D. Munkvold et al., 2004)

http://www.geneontology.org/GO.tools

Beyond GO – Open Biomedical Ontologies • Orthogonal to existing ontologies to facilitate combinatorial approaches - Share unique identifier space - Include definitions • Anatomies • Cell Types • Sequence Attributes • Temporal Attributes • Phenotypes • Diseases • More….

http://obo.sourceforge.net