Type_id - the Genome Database for Rosaceae

Download Report

Transcript Type_id - the Genome Database for Rosaceae

Integrating Phenotypic Data
With Genomic, Genetic and
Genotypic Data Using Chado
Sook Jung, Taein Lee, Stephen Ficklin,
Jing Yu, Dorrie Main
Outline
 Introduction of GDR and CottonGen
 Chado the generic schema
 Storing Stock Data
 Storing Phenotypic Data (trait, dataset, etc)
 Storing Genotypic Data
 Integration with genetic and genomic Data
 Conclusion
Database projects of Main lab

Major databases with genomic, genetic, phenotypic and genotypic data
1.
GDR: Genome Database for Rosaceae
Genomic. Gemetoc and Breeding data (Private data and data from RosBreed project)
2.

•
Fruit and Nut, Sat, 12 PM
•
Computer Demo, Mon, 1:35 PM
•
P0946, RosBreed BIM System, Mon, 10-11:30 AM
CottonGen: Replaced CottonDB and Cotton Marker Database
•
Cotton Genome Initiative, Sun, 3:50 PM
•
Computer Demo, Mon, 1:50 PM
Other databases:
 Citrus Genome Database, Cool season food legume database, Genome database for
Vacciniium

Built using Chado schema and Tripal (Drupal front end for Chado)
 Tripal presentation, GMOD workshop, Wed 11:50 AM
Chado: Modular, Generic and Ontology-driven schema
general
organism
map
stock
pub
sequence
natural diversity
mage
phenotype
genetic
companalysis
cv
Publication
Chado: Modular, Generic and Ontology-driven schema
Feature
Feature_relationship
Feature_relationship_id
Subject_id
Object_id
Type_id
Abc-mRNA
part_of
Abc-gene
Feature_id
Name
Uniquename
Type_id
Organism_id
residues
gene, mRNA,
marker, QTL, etc
Featureprop
Featureprop_id
Feature_id
Type_id
Value
rank
Repeat_motif
Product_size
cvterm
cvterm_id
Name
definition
cv_id
Dbxref_id
cv
cv_id
Name
definition
Sequence Ontology, Gene
Ontology, etc
Storing Stock (from samples to population; pedigree)
stockcollection
Population, cultivar,
breeding line, clone,
sample, etc
stock_relationship
Feature_relationship_id
Subject_id
Object_id
Type_id
Gala
Maternal_parent_of
Sonya
pedigree
Gala-001
sample_of
Gala
stock
stock_id
Name
Uniquename
Type_id
Organism_id
residues
stockcollction_id
Name
uniquename
Type_id
Contact_id
stock
center
stockprop
stockprop_id
stock_id
Type_id
value
cvterm
cvterm_id
Name
definition
cv_id
Dbxref_id
Description,
population_size
Storing phenotype data (from measurements to projects)
project
NE_project
nd_experiment
stock
Feature_id
Name
Uniquename
Type_id
Organism_id
residues
NE_stock
Nd_experiment_id
Nd_geolocation_id
Type_id
Nd_geolocation
Nd_geolocation_id
Description
Latitude
Longitude
Geodetic_datum
Featureprop_id
Feature_id
Type_id
value
NE_phenotype
Phenotyping
Genotyping
Cross_experiment
project_relationship
phenotype
phenotype_id
Uniquename
value
attr_id
cvterm
cvterm_id
Name
definition
cv_id
Dbxref_id
Storing phenotype data
(enabling comparison among datasets)
stock
Feature_id
Name
Uniquename
Type_id
Organism_id
residues
Nd_experiment
attr_id: SkinCol_0
value: 2
phenotype_id
Uniquename
value
attr_id
cvterm
cvterm_id
Name
definition
cv_id
Dbxref_id
value
rank
Orange
1
Orange-red
2
cvtermprop
Pink-red
3
Red
4
Dark red
5
cvtermprop_id
cvterm_id
Type_id
Value
rank
RB(cv), SkinCol_0(cvterm)
If skin_color_harvest
is 1-10 In Standard(cv), we
can store the value in
standard descriptor again
phenotype
attr_id: Skin_color_harvest
value: 4
cv
phenotype_id
Uniquename
value
attr_id
Genotypic data integrated with genomic/genetic data
map
stock
project
nd_experiment
Explore
sequences
around
marker in
GBrowse
Feature
Nd_experiment_id
Nd_geolocation_id
Type_id
NE_genotype
genotype
genotype_id
name
Uniquename
description
uniquename: CPSCT038_190|192
description: 190:192
Feature_id
Name
Uniquename
Type_id
Organism_id
residues
feature_genotype
Uniquename:CPSCT038
Type:microsatellite
Relationship between genotype and phenotype
(haplotype and haplotype effect)
map
Feature
stock
Feature_id
Name
Uniquename
Type_id
Organism_id
residues
project
nd_experiment
Nd_experiment_id
Nd_geolocation_id
Type_id
Uniquename:Ma
Type:MTL
genotype
NE_phenotype
NE_genotype
phenotype
phenstatement
phenotype_id
Uniquename
value
attr_id
phenstatement_id
Type_id
Genotype_id
phenotype_id
Environment
pub
attr_id: crisp
value: 2.2
genotype_id
name
Uniquename
description
feature_genotype
uniquename: MA_H3|H4b
description: H3|H4b
Germplasm with H3|H4b
alleles of MA locus has
value of 2.2 for crisp
Conclusion
 Flexibility and generic characteristic of Chado enables us to
store and integrate complex biological data from widely
different projects and species
 The ontology-driven characteristic makes adding new data
types relatively easy.
 Performance issue mostly resolved by the use of materialized
views
Acknowledgement
 Natural diversity module working group
Naama Menda, Seth Redmond, Robert M. Buels, Maren Friesen, Yuri Bendana, LaceyAnne Sanderson, Hilmar Lapp, Taein Lee, Bob MacCallum, Kirstin E. Bett, Scott Cain,
Dave Clements, Lukas A. Mueller and Dorrie Main
 Main Lab team
Dorrie Main Taein Lee
Stephen Ficklin
Jing Yu Chun-Huai Cheng Ping Zheng
Anna Blenda
Sushan Ru
 All Project CoPIs (tfGDR, RosBreed and CottonGen)
 Funding Sources
USDA NIFA SCRI, NSF Plant Genome Program, USDA-ARS, Washington Tree Fruit
Research Commission, Cotton Incorporated, Washington State University, Clemson
University, University of Florida, Boyce Thompson Institute, North Carolina State
University