Selecting conditions and phenotpes

Download Report

Transcript Selecting conditions and phenotpes

PRO AND MEDICAL GENETICS
RESOURCES AT NCBI
DONNA MAGLOTT, PH.D.
OPPORTUNITIES
The medical genetics group is a relatively recent
addition to the suite of resources at NCBI, and
manages the NIH Genetic Testing Registry (GTR),
ClinVar, and MedGen. These databases share the
need to standardize representation of genes,
proteins, small molecules, variation, conditions, and
phenotypes, not only with respect to explicit terms,
but also the relationships among those terms. This
presentation will focus on opportunities for utilization
of PRO in the NCBI’s Medical Genetics group.
CASE STUDIES
MEDICAL GENETICS: CLINVAR, GENE, GTR, MEDGEN
A QUICK TOUR
From the home page…
USING THE RESOURCE SECTIONS
TRY ALL SECTIONS
TRY ALL SECTIONS
MAJOR DOMAINS OF INFORMATION
Concept
NCBI database/Resource Used in
Diseases and their
defining features
MedGen
(Diseases, Findings…)
ClinVar, dbVar, Gene,
GTR, PheGenI, dbGaP
Drugs
MedGen
(Pharmacologic Substance)
ClinVar, GTR
Genes and gene
products
Gene, Nucleotide,
ClinVar, dbSNP, dbVar,
Protein, HomoloGene,
GTR …
RefSeq
Records connected by reciprocal,
generic links via database identifiers
Biological processes,
--Gene
cellular components,
molecular functions
Interactions and
pathways
Biosystems, Gene
Biosystems, Gene
Variation
ClinVar, dbSNP, dbVar
ClinVar, dbSNP,
dbVar…
SOME TALKING POINTS
•
•
•
•
•
•
•
•
Except for RefSeq, curation minimal
RefSeq-based with pointers to UniProtKB
Use ontologies to acquire and represent standard terms
Point to ontologies, but not used to support node-based
query interfaces
Capturing primary data that can be used to drive
development of ontologies
Some user communities think in terms of nucleotide only
Data being submitted with uncertain significance
Look for opportunities for adding value to NCBI’s
databases and tools
GENE AND DATA STANDARDS
• Name of the gene (nomenclature committees)
• Names of protein products
• Primary product (Swiss-Prot)
• Isoforms (RefSeq)
• Names of associated conditions (multiple)
• Descriptions of pathways (submitters)
• Biological processes, cellular components and
molecular functions (GO)
• HIV interactions (NIAID)
• http://www.ncbi.nlm.nih.gov/gene?term=hiv1interactions[Properties]
• http://www.ncbi.nlm.nih.gov/projects/RefSeq/HIVInteractions/
HUMAN MISMATCH REPAIR
RESTRICT TO THOSE REPORTED TO BE
DISEASE-CAUSING
www.ncbi.nlm.nih.gov/gene/4292
Phrase found in:
Summary
Bibliography
Interactions
Pathways
Gene Ontology
General protein information
Reference sequences
Locus-specific databases
Titles of pathways
Descriptions of interactions
GENE<->PROTEIN
HOMOLOGENE
DISEASES AND PHENOTYPES
MEDGEN: UMLS, HPO, OMIM, ORDO, GTR
WHY MEDGEN?
• A stable node of identifiers within NCBI for disease
names, their clinical features, and pharmacological
substances
• Built on the foundation of a subset of UMLS, with
supplements from HPO, OMIM (between UMLS
releases), and submissions to GTR and ClinVar
• Primarily automated, but some overview by M.D.s
and genetic counselors on staff, and feedback
from the community
TERMS FROM
UMLS/OMIM/GTR/CLINVAR
HIERARCHIES: CURATED BY GTR STAFF
Guided by
OMIM’s clinical
series and user
feedback
HIERARCHIES: COMPUTED FROM
NODES IN UMLS
Hierarchy from DNA
Repair Deficiency
Disorders
USING HPO FOR CLINICAL FEATURES
• Partial display
• Organized by top nodes of the ontology
• Each specific term supports a link to
disorders manifesting that feature
CLINVAR: REPORTED VARIATIONPHENOTYPE RELATIONSHIP
CLINVAR: REPORTED VARIATIONPHENOTYPE RELATIONSHIP
Submitter archive (not curated)
• Variant
• Disease and/or phenotypes
• Interpretation
• Confidence
SUBSET OF A DETAILED RECORD
• Gene name and symbol
• Sequence ontology for
molecular and functional
consequences
• Diseases
• Identifiers and links
• Observed phenotypes
(as distinct from those
reported to be
characteristic of the
diagnostic term)
• Protein change from the
variant
DATA SOURCES AND GROWTH
SUBMISSIONS FROM UNIPROT
Summarize
submissions by genes,
diseases, and
phenotypes
CURRENT STATUS: CLINGEN-RELATED
Diseases
Genes
Variants
Predictions
• Conserved sequence
• Conserved domains
• Pathways
http://www.clinicalgenome.org/
‘PHENOTYPE’ AND CLINGEN/CLINVAR
• Working group on phenotype
• Make distinctions among
• Disease category (body system, metabolic perturbation, cancer)
• Diagnosis
• Characteristic features
• General or gene-specific
• Diseases targeted by drugs for which the response is genetically
determined
• Observed phenotypes
• HPO
• PhenoDB
• Indications for testing
• Standardization
• One ontology or many?
• Relationship to OMIM
VARIATION AND CLINGEN/CLINVAR
• Sequence Ontology for variant location and effect
• Coordinate with PharmGKB for pharmacogenomics
• Description of haplotypes
• No discussion yet about authorities for pathways, conserved
domains, post-translational modifications
CURRENT STATUS: NCBI
• Working with UMLS to improve representation of terms and
relationships
• Mapping concepts
• Reporting relationships
• Supplement current UMLS with HPO, Orphanet (ORDO, in progress),
and recent data from OMIM
• Working with Clinical Pharmacogenetics Implementation
Consortium (CPIC) and PharmGKB
• Representation of haplotypes/star alleles
• Drug responses/Disease target
• Consumer of ontologies to standardize terminology, with
definitions
• Link to resource site
• Provide attribution
• Support term-specific queries
CURRENT STATUS: NCBI
• Queries currently term by term, not by node
• Some relationships based on links in Entrez
• Gene <->disease
• Disease <->clinical feature
• Variation <-> gene
• Some relationships explicit
• Genome->transcript->protein
• Nucleotide change->protein change
• Some relationships reported as hierarchies
• GTR
• MedGen (MeSH)
• ORDO (in progress)
CURRENT STATUS: NCBI
• Maintenance
• primarily automatic
• Some curatorial review by staff of ClinVar and NIH Genetic
Testing Registry (GTR)
• Expect expanded review from the ClinGen group
• Data freely available by ftp or E-utilities
•
•
•
•
ftp://ftp.ncbi.nih.gov/pub/clinvar/
ftp://ftp.ncbi.nih.gov/gene/
ftp://ftp.ncbi.nih.gov/pub/GTR/
ftp://ftp.ncbi.nih.gov/pub/medgen/
ACKNOWLEDGEMENTS
Slava Gorelenkov
MedGen
Melissa Landrum
ClinVar
Jennifer Lee
GTR, ClinVar
Terence Murphy
Gene
Lon Phan
dbSNP/dbVar
Kim Pruitt
RefSeq
Wendy Rubinstein
GTR, MedGen
Ming Ward
dbSNP
and all their staff