Whole Genome Association Analysis: Potentials & Pitfalls Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann EPFL 5.
Download ReportTranscript Whole Genome Association Analysis: Potentials & Pitfalls Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann EPFL 5.
Whole Genome Association Analysis: Potentials & Pitfalls Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann EPFL 5. March 2009 A Systems Biology approach Large (genomic) systems Small systems • many uncharacterized • elements well-known elements • relationships unknown • computational analysis should: • many relationships established • quantitative modeling of systems properties like: improve annotation Dynamics reveal relations Robustness reduce complexity Logics Today Whole Genome Association Analysis Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Genetic variation in SNPs (Single Nucleotide Polymorphisms) ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… Phenotypic variation: What is association? SNPs trait variant chromosome Genetic variation yields phenotypic variation 1.2 1 0.8 Population with ‘ ’ allele Population with ‘ ’ allele 0.6 0.4 0.2 0 -6 -4 -2 0 2 Distributions of “trait” 4 6 phenotype Association using regression genotype Coded genotype Regression formalism transformation effect size (regression coefficient) error (residual) phenotype (response variable) of individual i coded genotype (feature) of individual i Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Whole Genome Association Whole Genome Association -log10(p) High significance Low significance Similar approach, but looking at the entire genome! That is: 500.000 SNPs! Identify local regions of interest, examine genes, SNP density regulatory regions, etc Replicate the finding -log10(p) Scan Entire Genome - 500,000s SNPs -log10(p) Whole Genome Association * * ** * GWA: >20 publications in 2006/2007 Massive! 6’189 individuals CoLaus = Cohort Lausanne Phenotypes Genotypes 159 measurement 144 questions 500.000 SNPs Collaboration with: Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV) Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Analysis of Genotypes only Principle Component Analysis reveals SNP-vectors explaining largest variation in the data PC2 PC2 Ethnic groups cluster according to geographic distances PC1 PC1 PCA of POPRES cohort GWAS with different covariates indicate importance of population stratification Genomic Control Principal Components Origin of grandparents Both Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Intensity of Allele A Genotypes are called with varying uncertainty Intensity of Allele G Some Genotypes are missing at all … … and are imputed with different uncertainties Using Linkage Disequilibrium Marker 1 2 3 D n LD Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles. Copy Number Variations are also called with varying uncertainties Well-separated Reasonably separated Badly separated Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Phenotypes are never exactly normal Propose mixture model both for phenotypes and uncertain genotypes Comprehensive comparison of new and existing association methods Many existing methods produce false positives Our propose mixture model method has increased power Our implementation QUICKTEST runs faster than the standard tool SNPTEST Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Challenges • Multiple Hypothesis testing: -6 Is one SNP with p=10 a significant result when testing 500.000 SNPs? • Covariates & Interactions For what do we have to correct the phenotypes? (Age, sex, treatments, other SNPs …) • Data Integration How to validate finding? (Replication Studies, Meta-Analyses, Re-sequencing, Functional Studies, …) Modular Approach for Integrative Analysis of Genotypes and Phenotypes Phenotypes Measurements Measurements Modular links Individuals SNPs/Haplotypes Genotypes Modular eQTL of the extended Hapmap panel Expression 800 Cell lines SNPs/Haplotypes Module expression is more significantly associated than individual gene expression Network Approaches for Integrative Association Analysis Overview • • • • • • • Associations: Basics Whole genome associations Population stratification Genotype imputation Uncertain genotypes New Methods Challenges Acknowledgements People: Zoltán Kutalik Micha Hersch Aitana Morton Diana Marek Barbara Piasecka Bastian Peter Karen Kapur Alain Sewer Toby Johnson Armand Valsessia Gabor Csardi Sascha Dalessi Funding: SNSF, SIB, Cavaglieri, Leenaards, SystemsX.ch, European FP http://serverdgm.unil.ch/bergmann