Database Modeling in Bioinformatics

Download Report

Transcript Database Modeling in Bioinformatics

The Gene Ontology project Jane Lomax

Ontology (for our purposes)

• “an explicit specification of some topic” – Stanford Knowledge Systems Lab • Includes: – a vocabulary of terms (names) – defined logical relationships to each

GO Project Goals:

• Compile structured vocabularies describing aspects of molecular biology • Describe gene products using vocabulary terms (annotation) • Develop tools: • to query and modify the vocabularies and annotations • annotation tools for curators

The Three Ontologies

Molecular Function — elemental activity or task •Biological Process — broad objective or goal •Cellular Component — location or complex

The Three Ontologies

Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal •Cellular Component — location or complex

The Three Ontologies

Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal mitosis, signal transduction, metabolism •Cellular Component — location or complex

The Three Ontologies

Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal mitosis, signal transduction, metabolism •Cellular Component — location or complex nucleus, ribosome, origin recognition complex

DAG Structure

Directed acyclic graph: each child may have one or more parents

The True Path Rule

Every path from a node back to the root must be biologically accurate

GO process

True Path Rule

Chitin catabolism Chitin biosynthesis

GO process

New GO Terms

cell wall chitin biosynth.

cuticle synthesis chitin biosynthesis cuticle chitin metab.

cuticle chitin biosynth.

cuticle chitin catab

Relationship Types

• is-a subclass; a is a type of b • part-of physical part of (component) subprocess of (process)

What GO is NOT:

• Not a way to unify biological databases • Not a dictated standard • Does not define evolutionary relationships • Additional ontologies needed to model biology and experimentation

Terms outside the Scope of GO

• Names of gene products • Protein domains • Protein sequence features • Phenotypes; diseases • Anatomical terms (except as part of terms generated by cross-products)

Advantages of GO

• Cross-species comparisons • already used by an increasing number of databases • More comprehensive • many terms per gene product • not a strict hierarchy: many-to-many relationships possible • Simplify querying • Uses restricted vocabulary developed by curators and annotators • Use of evidence codes

Annotation Features:

• Database object: gene or gene product • GO term ID • Reference •publication or computational method • Evidence supporting annotation

DAG Structure

Annotate to any level within DAG

GOA: GO Annotation at EBI

• GO Annotations for: • Human proteins • All SWISS-PROT/TrEMBL proteins • Annotation sets for completely sequenced proteomes

GOA: GO Annotation at EBI

• Methods: • Manual curation • SWISS-PROT keyword <-> GO term mapping • EC number <-> GO term mapping • InterPro entry <-> GO term mapping

GO Tools

• Browsers: • DAG-Edit • AmiGO • “QuickGO” at EBI • EP:GO browser

The Future of GO:

• Developmental processes — DAG cross products with anatomy terms • Physiological processes • Relational database •Expand relationship types

www.geneontology.org

• FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • Mouse Genome Informatics • The Arabidopsis Information Resource • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • WormBase • DictyBase • Compugen, Inc The Gene Ontology Consortium is supported by NHGRI grant HG02273 (R01). The Gene Ontology project thanks AstraZeneca for financial support. The Stanford group acknowledges a gift from Incyte Genomics.