Whole Genome Association Study: What are the next steps? Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann TC Nestle 10 Sep.

Download Report

Transcript Whole Genome Association Study: What are the next steps? Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann TC Nestle 10 Sep.

Whole Genome Association Study:
What are the next steps?
Sven Bergmann
University of Lausanne &
Swiss Institute of Bioinformatics
http://serverdgm.unil.ch/bergmann
TC Nestle
10 Sep. 2009
What is association?
SNPs
trait variant
chromosome
Genetic variation
yields phenotypic variation
1.2
1
0.8
Population with ‘ ’ allele
Population with ‘ ’ allele
0.6
0.4
0.2
0
-6
-4
-2
0
2
Distributions of “trait”
4
6
phenotype
Association using regression
genotype
Coded genotype
Regression formalism
(monotonic)
transformation
effect size
(regression coefficient)
error
(residual)
phenotype
(response variable)
of individual i
p(β=0)
coded genotype
(feature) of individual i
Goal: Find effect size that explains best all (potentially
transformed) phenotypes as a linear function of the
genotypes and estimate the probability (p-value) for the data
being consistent with the null hypothesis (i.e. no effect)
Whole Genome Association
Current insights from GWAS:
• Well-powered (meta-)studies
with (ten-)thousands of samples
have identified a few (dozen)
candidate loci with highly
significant associations
• Many of these associations
have been replicated in
independent studies
Current insights from GWAS:
• Each locus explains but a tiny (<1%)
fraction of the phenotypic variance
• All significant loci together explain
only a small (<10%) of the variance
David Goldstein:
“~93,000 SNPs would be required to explain
80% of the population variation in height.”
Common Genetic Variation and Human Traits,
NEJM 360;17
So what do we miss?
1. Other variants like Copy Number
Variations or epigenetics may play an
important role
2. Interactions between genetic variants
(GxG) or with the environment (GxE)
3. Many causal variants may be rare
and/or poorly tagged by the measured
SNPs
4. Many causal variants may have very
small effect sizes
5. Overestimation of heritabilities from
twin-studies?
Intensity of Allele A
CNVs can be called from SNP probe intensities
Intensity of Allele G
Copy Number Variations are called with
varying uncertainties
Well-separated
Reasonably
separated
Badly
separated
We propose mixture model both for
phenotypes and uncertain genotypes
Phenotype mixture components
genotype mixture components
(the mixture MLE/LRT model)
Covariates & Interactions
• For which parameters do we have to correct the
phenotypes? Age, sex, other SNPs …
• Interactions: Can we test 106 x 106 interactions
and does it make sense?
R  a  bsGs  cs 'Gs '  d s,s 'GsGs '  
Network Approaches
for Integrative Association Analysis
Using knowledge on physical gene-interactions or pathways to
prioritize the search for functional interactions
Can we reduce the complexity of the
phenotypic data?
Hundreds of samples
1000
2000
Many
3000
measurements
4000
5000
6000
200
400
600
800
1000
New Analysis and Visualization Tools are needed!
New Tools: Module Visualization
http://serverdgm.unil.ch/bergmann/Fibroblasts/visualiser.html
Data Integration: Example NCI60
60 cancer cell lines
(9 tissue types)
Drug
Response
Data
~5,000 drugs
Gene
Expression
Data
~23,000 gene probes
How to identify Co-modules?
Iteratively refine genes, cell-lines
and drugs to get co-modules
Z Kutalik, J Beckmann & SB, Nature Biotechnology (2008)
Modular Approach for Integrative
Analysis of Genotypes and Phenotypes
Phenotypes
Measurements
Modular
links
Individuals
SNPs/Haplotypes
Genotypes
Acknowledgements
Jacqui
Beckmann
People:
Zoltán Kutalik
Micha Hersch
Aitana Morton
Diana Marek
Barbara
Piasecka
Bastian Peter
Karen Kapur
Alain Sewer
Toby Johnson
Armand
Valsessia
Gabor Csardi
Funding: SNSF, SIB, Cavaglieri, Leenaards, SystemsX.ch, European FP
http://serverdgm.unil.ch/bergmann