Whole Genome Association Analysis: Potentials & Pitfalls Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann EPFL 5.

Download Report

Transcript Whole Genome Association Analysis: Potentials & Pitfalls Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann EPFL 5.

Whole Genome Association
Analysis: Potentials & Pitfalls
Sven Bergmann
University of Lausanne &
Swiss Institute of Bioinformatics
http://serverdgm.unil.ch/bergmann
EPFL
5. March 2009
A Systems Biology approach
Large (genomic) systems
Small systems
• many uncharacterized
• elements well-known
elements
• relationships unknown
• computational analysis should:
• many relationships established
• quantitative modeling of
systems properties like:
 improve annotation
 Dynamics
 reveal relations
 Robustness
 reduce complexity
 Logics
Today
Whole Genome
Association Analysis
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Genetic variation in SNPs
(Single Nucleotide Polymorphisms)
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
Phenotypic variation:
What is association?
SNPs
trait variant
chromosome
Genetic variation
yields phenotypic variation
1.2
1
0.8
Population with ‘ ’ allele
Population with ‘ ’ allele
0.6
0.4
0.2
0
-6
-4
-2
0
2
Distributions of “trait”
4
6
phenotype
Association using regression
genotype
Coded genotype
Regression formalism
transformation
effect size
(regression coefficient)
error
(residual)
phenotype
(response variable)
of individual i
coded genotype
(feature) of individual i
Goal: Find effect size that explains best
all (potentially transformed) phenotypes
as a linear function of the genotypes
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Whole Genome Association
Whole Genome Association
-log10(p)
High
significance
Low
significance
Similar approach,
but looking at the entire genome!
That is: 500.000 SNPs!
Identify local regions
of interest, examine
genes, SNP density
regulatory regions, etc
Replicate the finding
-log10(p)
Scan Entire Genome
- 500,000s SNPs
-log10(p)
Whole Genome Association
*
*
**
*
GWA: >20 publications in 2006/2007
Massive!
6’189
individuals
CoLaus = Cohort Lausanne
Phenotypes
Genotypes
159 measurement
144 questions
500.000 SNPs
Collaboration with:
Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Analysis of Genotypes only
Principle Component Analysis reveals SNP-vectors
explaining largest variation in the data
PC2
PC2
Ethnic groups cluster according to
geographic distances
PC1
PC1
PCA of POPRES cohort
GWAS with different covariates indicate
importance of population stratification
Genomic Control
Principal Components
Origin of grandparents
Both
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Intensity of Allele A
Genotypes are called with varying uncertainty
Intensity of Allele G
Some Genotypes are missing at all …
… and are imputed with different uncertainties
Using Linkage Disequilibrium
Marker
1
2
3
D
n
LD
Markers close together on chromosomes
are often transmitted together, yielding a
non-zero correlation between the alleles.
Copy Number Variations are also called
with varying uncertainties
Well-separated
Reasonably
separated
Badly
separated
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Phenotypes are never exactly normal
Propose mixture model both for
phenotypes and uncertain genotypes
Comprehensive comparison of new and
existing association methods
Many existing methods produce false positives
Our propose mixture model method has
increased power
Our implementation QUICKTEST runs
faster than the standard tool SNPTEST
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Challenges
• Multiple Hypothesis testing:
-6
Is one SNP with p=10 a significant result
when testing 500.000 SNPs?
• Covariates & Interactions
For what do we have to correct the
phenotypes?
(Age, sex, treatments, other SNPs …)
• Data Integration
How to validate finding?
(Replication Studies, Meta-Analyses,
Re-sequencing, Functional Studies, …)
Modular Approach for Integrative
Analysis of Genotypes and Phenotypes
Phenotypes
Measurements
Measurements
Modular
links
Individuals
SNPs/Haplotypes
Genotypes
Modular eQTL of the extended Hapmap panel
Expression
800 Cell lines
SNPs/Haplotypes
Module expression is more significantly
associated than individual gene expression
Network Approaches for Integrative Association Analysis
Overview
•
•
•
•
•
•
•
Associations: Basics
Whole genome associations
Population stratification
Genotype imputation
Uncertain genotypes
New Methods
Challenges
Acknowledgements
People:
Zoltán Kutalik
Micha Hersch
Aitana Morton
Diana Marek
Barbara Piasecka
Bastian Peter
Karen Kapur
Alain Sewer
Toby Johnson
Armand Valsessia
Gabor Csardi
Sascha Dalessi
Funding: SNSF, SIB, Cavaglieri, Leenaards, SystemsX.ch, European FP
http://serverdgm.unil.ch/bergmann