Transcript Document

Using Ontologies to Annotate
Phenotypic Data
Mouse Genome Informatics
www.informatics.jax.org
Janan T. Eppig
December 2008
www.informatics.jax.org
Human FOXN1
forkhead box N1
T-CELL IMMUNODEFICIENY,
CONGENITAL ALOPECIA, AND
NAIL DYSTROPHY
Frank J, et al. Nature 398, 473 - 474 (1999)
Mouse Foxn1. Homozygous
“nude” mouse. One of 8 known
phenotypic mutations in mouse
for the forkhead box N1 gene.
Data Integration
Primary
literature
Centers:
mutagenesis, gene
trap, etc
• Gather data from multiple sources
• Factor out common objects
• Assemble integrated objects
Data Loads: GenBank,
SNPs, clone collections,
UniProt, RIKEN, etc
Processing, QC, and curation
Electronic
Submissions
(individual labs)
Integration is hard…not just a matter of
combining data sources…
• Data from multiple sources can be of differing quality
• The same data can enter the system via various paths
• Naming conventions may or may not be to standards
• Some data sources don’t maintain unique accession
numbers (or allow them to change)
• Periodic updates from data sources can cause problems
• if objects have disappeared… (or reappear)
• If objects have split in two
Data integration is hard
• “Bucketizing”
establishe types of
correspondence between
objects in the input sets.
• Allows immediate
incorporation of 1:1
corresponding data.
• Sorts conflicting data
into bins that allow
prioritization for curator
resolution.
Literature &
Loads
New Gene,
Strain or
Sequence?
Controlled
Vocabularies
Evidence &
Citation
Co-curation of shared
objects and concepts
Annotation Pipeline
• Data Acquisition
• Object Identity
• Standardizations
• Data Associations
• Integration with other
bioinformatics resources
Making semantic sense
Controlled vocabularies/nomenclatures
•
•
•
•
•
•
•
•
•
•
•
Strains
Genes
Alleles (phenotypic or variant)
Classes of genetic markers
Types of mutations
Types of assays
Developmental stages
Tissues
Clone libraries
ES cell lines
and more…
….. organized as lists or simple hierarchies
Semantics plus relationship data
Ontologies/structured vocabularies
• Gene Ontology (GO)
• Molecular function
• Biological process
• Cellular component
• Mouse Anatomy (MA)
• Embryonic
• Adult
• Mammalian Phenotype (MP)
• Sequence Ontology (SO)
• Trait Ontology
….. organized as directed acyclic graphs (DAGs)
DAGs
Vocabularies in MGI
Definition
Vocabulary
Synonyms
MP:1956
Note
DAGs
Terms
Growth retardation
Dilated renal tubules
Postnatal lethality
Respiratory failure
Genotype
EE
J:65322
IDA
J:62648
TAS
J:65378
Strain: AEJ
Alleles:bd/bd
…
…
Annotations
Strain: C57BL/6
Alleles:
Ppp1r3atm1Adpt/
Ppp1r3atm1Adpt
Common software for
users to access
vocabularies in MGI
Mammalian
Phenotype
Ontology
Synonyms
• Structured as DAG
• >6,250 terms covering
physiological systems,
behavior, survival,
and development
• Available in web
browser and in OBO
and text formats from
MGI ftp and OBO sites
• Each term linked to all
annotations to the
term or its children
• >133,00 annotations
genotype - MP
Term in context
Links to all mouse
genotypes with this
phenotype
behavior/
neurological
phenotype
abnormal
Involuntary
movement
abnormal
reflex
tremors
muscle
phenotype
abnormal
muscle
physiology
opisthotonus
myoclonus
Mammalian Phenotype (MP) Ontology
…make phenotype & disease model data robust &
accessible to researchers & computational biologists
• semantically consistent search methods
• integrated access to all phenotypic variation sources
(single-gene, genomic mutations, engineered mutations, QTL, strains)
• data on human disease correlation
• access to mouse models from various approaches
- Genetic
- Phenotypic
- Computational
MP Ontology Growth
Developing the Mammalian
Phenotype Ontology
7000
6000
5000
4000
• New terms from ongoing curation process
3000
• Collaborative community efforts
1000
2000
• identify new terms
0
2004
2005
2006
2007
• suggest improved organization of terms
• Rat Genome Database
• Mutagenesis Centers
• Human (NCBI)
• OMIA (Online Mendelian Inheritance in Animals)
• Proprietary Databases
• Future (International Mouse Knockout Projects)
• Comparisons among Ontologies (GO Process, Mouse Anatomy, FMA,
Cell Type, MPath, etc.)
• Systematic review by domain experts
2008
Making Mammalian Phenotype
Ontology Work
• accommodate bio-specific terms
• computationally useful
• human accessible
DAGs
• practical for curation
• cross-reference to other ontologies
MP term
Terms in MP
Entity
microphthalmia eye
hydrocephaly
PATO
Quality
MP def
small size
reduced average size of
the eyes
cerebroincreased,
excessive accumulation
spinal fluid excessive,
of cerebrospinal fluid
accumulated in the brain, especially
the cerebral ventricles,
often leading to
increased brain size
and other brain trauma
brain
large size
(dilated)
trauma of
brain
observed
Complex Examples:
id: MP:0006159 ! ocular albinism
intersection_of: PATO:0001558 ! lacking processual parts
intersection_of: inheres_in MA:0000261 ! eye
intersection_of: towards GO:0006582 ! melanin metabolic process
MP definition: absence of melanin (pigment) production in the eye
with identifiable melanocytes present
id: MP:0006110 ! ventricular fibrillation
!intersection_of: PATO:0000688 ! asynchronous
!intersection_of: inheres_in CL:0000746 ! cardiac muscle cell
!intersection_of: towards GO:0060048 ! cardiac muscle contraction
!intersection_of: located_in MA:0000079 ! ventricle endocardium
!intersection_of: located_in MA:0000082 ! ventricle myocardium
MP definition: asynchronous contraction or quivering of individual
cardiac muscle fibers in the ventricles
Status of Phenotype & Disease Data
Nov 2008
Phenotype terms in MP ontology
6,355
Phenotypic alleles cataloged
number of genes represented
targeted alleles
number of genes targeted
21,996
8,225
13,549
5,547
Alleles with MP annotation
Genotypes with MP annotation
Total MP annotations
19,458
27,261
137,577
Genotypes with OMIM associations
OMIM with associated genotypes
2,520
882
QTLs
4,015
Strains
>10,500
Current QTL Display
Current QTL display
+
+
Changes planned for QTL Display
Genome coordinates: 132851306-135646474
(MGI Mouse GBrowse)
Need for a trait ontology
• What is measured
– Blood pressure
– % body fat
– Coat color
• Annotation of
– QTL
– Strain characteristics / baseline
– Measurements
Some issues
•
specificity vs broad
•
synchronizing wih MP
•
“how much” cross-species?
OBO-Edit, curation tool for building ontologies
Working on Trait Ontology
•
MGI
•
IMPC
•
MPD
•
RGD
•
Domestic Species (Animal QTL)
Currently:
approx. 3600 terms, built initially by stripping MP
working systematically on branches
MGI Phenotype Data Staff
Anna Anagnostopoulos
Randal P. Babiuk
Susan M. Bello
Donna L. Burkart
Howard Dene
Michelle Knowlton
Ira Lu
Hiroaki Onda
Cynthia L. Smith
Monika Tomczuk
Linda L. Washburn
Jonathan S. Beal
Kim L. Forthofer
Peter Frost
NHGRI grant HG000330
The End