Transcript Database Modeling in Bioinformatics
The Gene Ontology project Jane Lomax
Ontology (for our purposes)
• “an explicit specification of some topic” – Stanford Knowledge Systems Lab • Includes: – a vocabulary of terms (names) – defined logical relationships to each
GO Project Goals:
• Compile structured vocabularies describing aspects of molecular biology • Describe gene products using vocabulary terms (annotation) • Develop tools: • to query and modify the vocabularies and annotations • annotation tools for curators
The Three Ontologies
•Molecular Function — elemental activity or task •Biological Process — broad objective or goal •Cellular Component — location or complex
The Three Ontologies
•Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal •Cellular Component — location or complex
The Three Ontologies
•Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal mitosis, signal transduction, metabolism •Cellular Component — location or complex
The Three Ontologies
•Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal mitosis, signal transduction, metabolism •Cellular Component — location or complex nucleus, ribosome, origin recognition complex
DAG Structure
Directed acyclic graph: each child may have one or more parents
The True Path Rule
Every path from a node back to the root must be biologically accurate
GO process
True Path Rule
Chitin catabolism Chitin biosynthesis
GO process
New GO Terms
cell wall chitin biosynth.
cuticle synthesis chitin biosynthesis cuticle chitin metab.
cuticle chitin biosynth.
cuticle chitin catab
Relationship Types
• is-a subclass; a is a type of b • part-of physical part of (component) subprocess of (process)
What GO is NOT:
• Not a way to unify biological databases • Not a dictated standard • Does not define evolutionary relationships • Additional ontologies needed to model biology and experimentation
Terms outside the Scope of GO
• Names of gene products • Protein domains • Protein sequence features • Phenotypes; diseases • Anatomical terms (except as part of terms generated by cross-products)
Advantages of GO
• Cross-species comparisons • already used by an increasing number of databases • More comprehensive • many terms per gene product • not a strict hierarchy: many-to-many relationships possible • Simplify querying • Uses restricted vocabulary developed by curators and annotators • Use of evidence codes
Annotation Features:
• Database object: gene or gene product • GO term ID • Reference •publication or computational method • Evidence supporting annotation
DAG Structure
Annotate to any level within DAG
GOA: GO Annotation at EBI
• GO Annotations for: • Human proteins • All SWISS-PROT/TrEMBL proteins • Annotation sets for completely sequenced proteomes
GOA: GO Annotation at EBI
• Methods: • Manual curation • SWISS-PROT keyword <-> GO term mapping • EC number <-> GO term mapping • InterPro entry <-> GO term mapping
GO Tools
• Browsers: • DAG-Edit • AmiGO • “QuickGO” at EBI • EP:GO browser
The Future of GO:
• Developmental processes — DAG cross products with anatomy terms • Physiological processes • Relational database •Expand relationship types
www.geneontology.org
• FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • Mouse Genome Informatics • The Arabidopsis Information Resource • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • WormBase • DictyBase • Compugen, Inc The Gene Ontology Consortium is supported by NHGRI grant HG02273 (R01). The Gene Ontology project thanks AstraZeneca for financial support. The Stanford group acknowledges a gift from Incyte Genomics.