High throughput genome-wide scan for epistasis with

Transcript High throughput genome-wide scan for epistasis with

High throughput genome-wide scan for
epistasis with implementation to
Recombinant Inbred Lines (RIL) populations
Pavel Goldstein
Dr. Anat Reiner-Benaim
Prof. Abraham Korol
1
Outline

Problem description

Modeling epistasis:
NOIA – the model for epistasis identification
 Dimensionality:
•
Multi-trait complexes
•
Two-stage hypothesis testing
•
Hierarchical FDR control in eQTL analysis

Proposed algorithm for epistasis identification

Results:

•
Simulation study
•
Implementation on Arabidopsis data
Conclusions and discussion
2
eQTL analysis

The goal: find loci of which genotypic variation has an effect on
the quantitative trait of interest using gene expressions as
phenotype and molecular markers as genotype information.
3
Problem description

Epistasis – nonadditivity in the contributions of several genes to a
trait.

The number of tests involved is enormous

Error control
4
Statistical epistasis
no epistasis
epistasis
5
Natural and Orthogonal Interactions (NOIA) model
(Alvarez-Castro and Carlborg , 2007) for RIL
population
For loci A and B, trait t, loci-pair l and replicate i :
design
matrix
gene expression
Indicator of
genotype
combinations
for two loci
vector of
genetic effects
phenotypes
6
The Weighted Gene Co-Expression Network
Analysis (WGCNA) (Zhang and Horvath, 2005)

Top-down hierarchical clustering.

Dynamic Tree Cut algorithm: branch cutting method for detecting
gene modules, depending on their shape

Building up meta-genes by taking the first principal component of
the genes from every cluster.
7
Two-stage hypothesis testing
Framework marker
Secondary markers
8
False Discovery Rate(FDR) in
eQTL analysis

FDR is the expected proportion of erroneously identified epistasis
effects among all identified ones.
Hierarchical FDR control (Yekutieli, 2008) :

Full-tree FDR - all epistasis discoveries, whether in framework
or in secondary marker pairs.
9
Hierarchical FDR control
A universal upper bound is derived for the full-tree FDR:
An upper bound for 𝜹* may be estimated using:
where RtPi=0 and RtPi=1 are the number of discoveries in τt, given that Hi is a true null
hypothesis in τt, and false null hypothesis, respectively.
.
10
Simulation study

5 clusters of 10 traits each were simulated with different forms of
epistasis or no epistasis

Six configurations: effect size (1%, 2%, 3%) X two/four epistatic
clusters

Replicated 1000 times

Heritability (effect size):
11
The WGCNA hierarchical
clustering
12
Heritability gain
13
Power gain
14
Real Data of West et. al,2006


A sample of 210 RIL population individuals was derived from a
cross between two inbred Arabidopsis thaliana accessions,
Bayreuth-0 (Bay-0) and Shahdara (Sha).
Genotype map consists of 579 markers

Genome-wide transcript (mRNA) levels were quantified using
Affymetrix whole-genome microarrays

Total of 22,810 gene expressions from all five chromosomes.
15
Preprocessing

The Variance Stabilization Normalization

Gene expression filtering: 7244 genes out of 22810

Markers preprocessing
16
Two-stage hierarchical testing for
epistasis

Identified 314 gene clusters (WGSNA)

47 sparse "framework" markers that are within 10 cM of each
other

10-12 “secondary" markers related to each "framework"
marker

First step: 1981 marker pairs X 314 meta-genes =339,434
tests
17
Hierarchical FDR control

A universal upper bound is derived for the full-tree FDR:
𝜹*=1.015 (SE=0.008)
q*=q/2𝜹*=0.1/2*1.015=0.0472
18
Two-stage hierarchical testing for
epistasis


First stage – 11 significant epistatic areas
Second stage – 1141 significant epistatic effects out of 1673
(68%)
19
Epistasis detected, superimposed on the
Arabidopsis markers map
20
Computational advantage

Using the two-stage algorithm on meta-genes, 341,107
hypotheses were tests

Naive analysis: 121278 loci pairs for each of 7244 traits, namely
878,537,832 tests would have been performed

Reduction of tests number by 2575 times
21
Epistasis heritability:
meta−genes vs single genes
Meta-genes
Single genes
22
Total heritability: meta−genes vs
single genes
Meta-genes
Single genes
23
T-values of epistatic effects:
meta−genes vs single genes
Meta-genes
Single genes
24
Further research

The method by which markers are chosen may take the genomewide marker distribution into consideration.

Generalization of the NOIA model

Using GO for the validation of the approach
25
Acknowledgements
Dr. Anat Reiner-Benaim
Prof. Abraham Korol
26
Thank you
27