High throughput genome-wide scan for epistasis with
Download
Report
Transcript High throughput genome-wide scan for epistasis with
High throughput genome-wide scan for
epistasis with implementation to
Recombinant Inbred Lines (RIL) populations
Pavel Goldstein
Dr. Anat Reiner-Benaim
Prof. Abraham Korol
1
Outline
Problem description
Modeling epistasis:
NOIA – the model for epistasis identification
Dimensionality:
•
Multi-trait complexes
•
Two-stage hypothesis testing
•
Hierarchical FDR control in eQTL analysis
Proposed algorithm for epistasis identification
Results:
•
Simulation study
•
Implementation on Arabidopsis data
Conclusions and discussion
2
eQTL analysis
The goal: find loci of which genotypic variation has an effect on
the quantitative trait of interest using gene expressions as
phenotype and molecular markers as genotype information.
3
Problem description
Epistasis – nonadditivity in the contributions of several genes to a
trait.
The number of tests involved is enormous
Error control
4
Statistical epistasis
no epistasis
epistasis
5
Natural and Orthogonal Interactions (NOIA) model
(Alvarez-Castro and Carlborg , 2007) for RIL
population
For loci A and B, trait t, loci-pair l and replicate i :
design
matrix
gene expression
Indicator of
genotype
combinations
for two loci
vector of
genetic effects
phenotypes
6
The Weighted Gene Co-Expression Network
Analysis (WGCNA) (Zhang and Horvath, 2005)
Top-down hierarchical clustering.
Dynamic Tree Cut algorithm: branch cutting method for detecting
gene modules, depending on their shape
Building up meta-genes by taking the first principal component of
the genes from every cluster.
7
Two-stage hypothesis testing
Framework marker
Secondary markers
8
False Discovery Rate(FDR) in
eQTL analysis
FDR is the expected proportion of erroneously identified epistasis
effects among all identified ones.
Hierarchical FDR control (Yekutieli, 2008) :
Full-tree FDR - all epistasis discoveries, whether in framework
or in secondary marker pairs.
9
Hierarchical FDR control
A universal upper bound is derived for the full-tree FDR:
An upper bound for 𝜹* may be estimated using:
where RtPi=0 and RtPi=1 are the number of discoveries in τt, given that Hi is a true null
hypothesis in τt, and false null hypothesis, respectively.
.
10
Simulation study
5 clusters of 10 traits each were simulated with different forms of
epistasis or no epistasis
Six configurations: effect size (1%, 2%, 3%) X two/four epistatic
clusters
Replicated 1000 times
Heritability (effect size):
11
The WGCNA hierarchical
clustering
12
Heritability gain
13
Power gain
14
Real Data of West et. al,2006
A sample of 210 RIL population individuals was derived from a
cross between two inbred Arabidopsis thaliana accessions,
Bayreuth-0 (Bay-0) and Shahdara (Sha).
Genotype map consists of 579 markers
Genome-wide transcript (mRNA) levels were quantified using
Affymetrix whole-genome microarrays
Total of 22,810 gene expressions from all five chromosomes.
15
Preprocessing
The Variance Stabilization Normalization
Gene expression filtering: 7244 genes out of 22810
Markers preprocessing
16
Two-stage hierarchical testing for
epistasis
Identified 314 gene clusters (WGSNA)
47 sparse "framework" markers that are within 10 cM of each
other
10-12 “secondary" markers related to each "framework"
marker
First step: 1981 marker pairs X 314 meta-genes =339,434
tests
17
Hierarchical FDR control
A universal upper bound is derived for the full-tree FDR:
𝜹*=1.015 (SE=0.008)
q*=q/2𝜹*=0.1/2*1.015=0.0472
18
Two-stage hierarchical testing for
epistasis
First stage – 11 significant epistatic areas
Second stage – 1141 significant epistatic effects out of 1673
(68%)
19
Epistasis detected, superimposed on the
Arabidopsis markers map
20
Computational advantage
Using the two-stage algorithm on meta-genes, 341,107
hypotheses were tests
Naive analysis: 121278 loci pairs for each of 7244 traits, namely
878,537,832 tests would have been performed
Reduction of tests number by 2575 times
21
Epistasis heritability:
meta−genes vs single genes
Meta-genes
Single genes
22
Total heritability: meta−genes vs
single genes
Meta-genes
Single genes
23
T-values of epistatic effects:
meta−genes vs single genes
Meta-genes
Single genes
24
Further research
The method by which markers are chosen may take the genomewide marker distribution into consideration.
Generalization of the NOIA model
Using GO for the validation of the approach
25
Acknowledgements
Dr. Anat Reiner-Benaim
Prof. Abraham Korol
26
Thank you
27