Transcript Document

The Gene Ontology Project:

Content for the Semantic Web

GO Project Goals • Compile structured vocabularies describing aspects of molecular biology • Describe gene products using vocabulary terms (annotation) • Develop tools: • to query and modify the vocabularies and annotations • annotation tools for curators

GO Data GO provides two bodies of data: • Terms with definitions and cross references • Gene product annotations with supporting data

The Three Ontologies •Molecular Function — elemental activity or task nuclease, DNA binding, transcription factor •Biological Process — broad objective or goal mitosis, signal transduction, metabolism •Cellular Component — location or complex nucleus, ribosome, origin recognition complex

DAG Structure Directed acyclic graph: each child may have one or more parents

Relationship Types • is-a subclass; a is a type of b • part-of physical part of (component) subprocess of (process)

The True Path Rule Every path from a node back to the root must be biologically accurate

GO Terms: Associated Data • ID • Text string • Definition with source • Synonyms (optional) • Cross-references (optional)

GO Terms: Cross-References • Enzyme Commission (EC) • Transport Commission (TC) • University of Minnesota Biocatalysis/ Biodegradation Database (UM-BBD) • MetaCyc

GO Annotation • Association between gene product and applicable GO terms • Provided by member databases • Made by manual or automated methods

GO Annotation: Data • Database object: gene or gene product • GO term ID • Reference •publication or computational method • Evidence supporting annotation

DAG Structure Annotate to any level within DAG

The Future of GO: • Improve coverage: • Developmental processes • Physiological processes • Relational database • Support ontology development for additional domains of biology

Terms outside the Scope of GO • Names of gene products • Protein domains • Protein sequence features • Phenotypes; diseases • Anatomical terms (except as part of terms generated by cross-products)

The GOBO Proposal • Global Open Biology Ontologies • Umbrella site for shared genomics and proteomics vocabularies • Present incarnation: subdirectory within GO repository: ftp://ftp.geneontology.org/pub/go/gobo/README

www.geneontology.org

• FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • Mouse Genome Informatics • The Arabidopsis Information Resource • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit (Sanger Institute) • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • WormBase • DictyBase • Compugen, Inc The Gene Ontology Consortium is supported by NHGRI grant HG02273 (R01). The Gene Ontology project thanks AstraZeneca for financial support. The Stanford group acknowledges a gift from Incyte Genomics.

Conference: Standards and Ontologies for Functional Genomics (SOFG)

Towards unified ontologies for describing biology and biomedicine

17 – 20 November 2002 Hinxton Hall Conference Centre Hinxton, Cambridge, UK www.ebi.ac.uk/SOFG/

First Standards and Ontologies for Functional Genomics (SOFG)

17-20 November 2002, Hinxton, UK

Keynote Speakers Ken Buetow, NCI, USA Win Hide, SANBI, South Africa Peter Karp, SRI International, USA

Aims and Objectives

• Bring together scientists developing standards and ontologies, both biologists, bioinformaticians and computer scientists

Topics

• Introduction to Ontologies • Tools for building ontologies • Go and related ontologies • Species specific ontologies • Implementation • Inter-ontology mapping • Ontologies for pathology, toxicology • Chemical ontologies

Structure

• 3 keynote speakers • ~20 invited talks • 10 short talks selected from poster abstracts • Panel discussion • Parallel working groups/tutorials

Programme Committee

Michael Ashburner, University of Cambridge, UK (Chair) Cathy Ball, Stanford University, USA Mike Bittner, NHGRI, USA Alvis Brazma, EMBL-EBI, UK Catherine Brooksbank, EMBL-EBI, UK Duncan Davidson, MRC HGU, Edinburgh, UK Liz Ford, EMBL-EBI, UK Midori Harris, EMBL-EBI, UK Victor Markowitz, Gene Logic, USA Helen Parkinson, EMBL-EBI, UK John Quackenbush, TIGR, USA Martin Ringwald, The Jackson Laboratories, USA Steffen Schulze-Kremer, RZPD, Germany Paul Spellman, U.C. Berkeley, USA Robert Stevens, University of Manchester, UK Chris Stoeckert, University of Pennsylvania, USA

URL

http://www.ebi.ac.uk/microarray/General/Events/SOFG/SOFG.html

The True Path Rule cell wall biosynthesis chitin metabolism cuticle synthesis chitin biosynthesis chitin catabolism chitin metabolism: before revision

The True Path Rule chitin metabolism: after revision

The True Path Rule chitin metabolism chitin biosynthesis cuticle synthesis cuticle chitin metabolism cuticle chitin biosynthesis chitin metabolism: after revision

GOBO Criteria • Open source • Can be instantiated in DAML+OIL or GO syntax • Orthogonal • Shared ID space • Defined terms

DAG Cross-Products hexose glucose fructose metabolism biosynthesis catabolism hexose metabolism hexose biosynthesis glucose biosynthesis fructose biosynthesis hexose catabolism glucose catabolism fructose glucose ... etc.

catabolism metabolism

Some GOBO Ontologies gene gene_attribute gene_structure gene_variation SO ME gene_product gene_product_attribute molecular_function GO protein_family INTERPRO phenotype mutant phenotype anatomy For complete current draft see ftp://ftp.geneontology.org/pub/go/gobo/README

What GO is NOT: • Not a way to unify biological databases • Not a dictated standard • Does not define evolutionary relationships • Additional ontologies needed to model biology and experimentation

DAG Structure Annotate to any level within DAG

Using GO Annotation: Example Workflow text

ID synonyms definition cross-reference

Using GO Annotation: Example Workflow