Transcript Document
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215 Association Studies • Association between genetic markers and phenotype – E.g. Cystic Fibrosis ~70% of Cystic Fibrosis patients have a deletion of 3 base pairs resulting in the loss of a phenylalanine amino acid at position 508 of the CFTR gene • Especially, find disease genes, SNP / haplotype markers, for susceptibility prediction and diagnosis 2 Influences individual decisions on life styles, prevention, screening, and treatment 3 Warfarin and CYP2C9: SNPs in Pharmacogenomics • Warfarin anticoagulant drug; CYP2C9 gene metabolizes warfarin. • A patient requiring low dosage warfarin compared to normal population, has an odd ratio of 6.21 for having 1 variant allele • Subgroup of patients who are poor metabolisers of warfarin are potentially at higher risk of bleeding Aithal et al., 1999, Lancet. Genome-Wide Association Studies • Quality Control – – – – Unusual similarity between individual Wrong sex Trio has non-Mendelian inheritance Genotyping quality • Two strategies: – Family-based association studies – Population-based case-control association studies 5 Quality Control: SNP calls • % SNP called Good calls! Bad calls! Family-based Association Studies Look at allele transmission in unrelated families and one affected child in each Like coin toss, likelihood of fair coin 7 TDT: Transmission Disequilibrium Test • Only heterozygote parents matters, calculate observed over expected (A- a) (9 - 2) 2 = = , ZTDT ~ c 2 ,1df A+ a 9+2 2 Z 2 TDT 2 • Could also compare allele frequency between affected vs unaffected children in the same family 8 Case Control Studies • SNP/haplotype marker frequency in sample of affected cases compared to that in age /sex /population-matched sample of unaffected controls 9 From Genotyping to Allele Counts 10 Test Significant Associations • Expected: – (24 + 278) * (24 + 86) / (24 + 278 + 86 + 296) = 49 – (278+296) * (86+296) / (24 + 278 + 86 + 296) = 321 2 • i, j 11 (eij oij )2 eij 2 = 27.5, 1df, p < 0.001 12 Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial Infarction C N (%) G N (%) 2,132 (55.4) 1,716 (44.6) Controls 2,783 (47.4) 3,089 (52.6) Cases 2 (1df) P-value 55.1 1.2 x 10-13 Allelic Odds Ratio = 1.38 • OR = 1, no disease association • OR > 1, allele increase risk of disease • OR < 1, allele decrease risk of disease Samani N et al, N Engl J Med 2007; 357:443-453. Multiple hypotheses testing? GWAS Pvalues GWAS Pvalues for Type II Diabetes • Bonferroni correction: most common, typically p < 10-7 or 10-8 Manhattan Plot McCarthy et al, Nat Rev Genetics, 2008 Size Matters Visscher, AJHG 2012 16 How to Improve Statistical Power? • Without increasing samples? • Test association of disease with haplotypes instead of individual SNPs – Also reduce genotyping errors • Split samples: – First half narrow down promising SNPs / haplotypes – Second half refining hits (much fewer multiple hypotheses) 17 Unusual Pvalue distributions • Pvalue QQ plot 18 Unusual Pvalue distributions • Pvalue QQ plot • Population stratification Marchini, Nat Genet. 2004 19 European population structure 1,387 samples ~200K SNPs UK WTCCC1 Study Afro-Caribbean samples South Asian samples Africa European Chinese + Japanese 21 Genomic control • Devlin and Roeder (1999) used theoretical arguments to propose that with population structure, the distribution of Cochran-Armitage trend tests, genome-wide, is inflated by a constant multiplicative factor λ. • We can estimate the multiplicative inflation factor using the statistic λ = median(Xi2)/0.456. • Inflation factor λ > 1 indicates population structure and/or genotyping error. • We can carry out an adjusted test of association that takes account of any mismatching of cases/controls at any SNP using the statistic Xi2/ λ. True hits? Population outliers and/or structure? Inflation factor λ = 1.11 IBD: Identity By Descent Test • If two individuals share common ancestor, they will share many SNPs / haplotype blocks on their genome (identical by state: IBS) 23 IBD: Identity By Descent Test • Pairwise IBD probability between samples • Probability two individuals share 0 (Z0), 1 (Z1), and 2 (Z2) haplotypes across the genome. • Remove IDBs 24 Manolio et al., Clin Invest 2008 Acknowledgement • • • • • • • • • 26 Tim Niu Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse Jim Stankovich Teri Manolio David Evans Guodong Wu Bo Li