Statistical Challenges in Genetic Studies of Mental Disorders Heping Zhang Collaborative Center for Statistics in Science Yale University School of Medicine June 9, 2009 Institute of.
Download ReportTranscript Statistical Challenges in Genetic Studies of Mental Disorders Heping Zhang Collaborative Center for Statistics in Science Yale University School of Medicine June 9, 2009 Institute of.
Statistical Challenges in Genetic Studies of Mental Disorders Heping Zhang Collaborative Center for Statistics in Science Yale University School of Medicine June 9, 2009 Institute of Mathematical Statistics National University of Singapore Outline • Heredity of Psychiatric Disorders – A Century Ago – One Example • Genetic Studies of Mental Disorders – As We Are Speaking – Three Examples • Statistical Challenges – Our Progress – Ordinal Traits – Multivariate Traits • Closing Comments and Acknowledgements 2 of 55 PRELIMINARY REPORT OF A STUDY OF HEREDITY IN INSANITY IN THE LIGHT OF THE MENDELIAN LAWS BY GERTRUDE L. CANNON, A.M., AND A. J. ROSANOFF, M.D. KINGS PARK STATE HOSPITAL, NEW YORK 3 of 55 Pedigrees from 11 Neuropathetic Patients Correlated Phenotypes Theoretical Conclusions The Genetics of Tourette Syndrome Tourette syndrome is a complex disorder characterized by repetitive, sudden, and involuntary movements or noises called tics. Concordance in MZ twins ~ 50% Concordance in DZ twins < 10% In 1986, Pauls and Leckman concluded that Tourette's syndrome is inherited as a highly penetrant, sex-influenced, autosomal dominant trait. Pete Bennett, winner of the 7th series of Big Brother 7 of 55 The Genetics of Tourette Syndrome In 2005, State’s lab identified mutations involving the SLITRK1 gene (13q31.1) in a small number of people with Tourette syndrome. Most people with Tourette syndrome do not have a mutation in the SLITRK1 gene. Because mutations have been reported in so few people with this condition, the association of the SLITRK1 gene with this disorder has not been confirmed. TSICG (2008): Lack of association between SLITRK1var321 and Tourette syndrome in a large family-based sample Schizophrenia Schizophrenia is a chronic, severe, and disabling brain disorder that affects about 1.1 percent of the U.S. population age 18 and older in a given year. People with schizophrenia sometimes hear voices others don’t hear, believe that others are broadcasting their thoughts to the world, or become convinced that others are plotting to harm them. These experiences can make them fearful and withdrawn and cause difficulties when they try to have relationships with others. http://www.nimh.nih.gov 9 of 55 Genetic Studies of Schizophrenia Kraepelin (Textbook of Psychiatry, 1896) described ‘Dementia Praecox’ as an inherited disorder. Kety, Rosenthal, and Wender conducted a series of adoption studies beginning in 1968, establishing genetic basis for schizophrenics. In 1987-88, it was reported “Bipolar affective disorders linked to DNA markers on chromosome 11” and “Localization of a susceptibility locus for schizophrenia on chromosome 5.” Some regions (e.g., dysbindin on chromosome 6p, neuregulin on 8p and G72 on 13q) have been more consistently identified as candidate regions. Attract a lot of publicity, but couldn’t be replicated There may not be a true sequence variation in a gene that causes illness. Rather, variable expression through epigenetic modification of gene activation may be the key (DeLisi et al. 2007). Genetic Studies of Schizophrenia Nature Online July 30, 2008: • International Schizophrenia Consortium: 3,391 schizophrenia cases and 3,181 controls in a European sample • Stefansson et al.: 1,433 schizophrenia cases and 33,250 controls 3,285 cases and 7,951 controls • Both groups report genetic deletions associated with schizophrenia in the same three locations on chromosomes 1 and 15 a third deletion on chromosome 22 that has previously been connected with increased susceptibility to schizophrenia. Genetic Studies of Schizophrenia Nature July 31, 2008: • The surveys have identified sections of the human genome that, when deleted, can elevate the risk of developing schizophrenia by up to 15 times compared with the general population. • In ISC study, a total of 890 CNVs were observed in either a case or a control as a single occurrence. This set of CNVs showed a 1.45-fold increase in cases (empirical P = 5E-6). On average, 13.1% of cases of schizophrenia possessed a deletion or duplication observed only once in the sample, in contrast to 10.4% of controls. Smoking In 1990s, a series of large-sample twin studies in the US and other countries showed repeatedly that smoking is a heritable behavior. The heritability for nicotine dependence is estimated around 50%. In the last decade, about 20 genome-wide linkage scans for smoking behavior have been reported, but only a limited number of putative genomic linkages have been replicated in independent studies (Li 2007). Challenges include genetic heterogeneity, the size of the genetic effect, the density of markers, the definition and assessment of the phenotypes, and the statistical approaches (Li 2007). 13 of 55 Diagnosis of Psychiatric Disorders Yale Global Tic Severity Scale and the symptom checklist and Yale-Brown Obsessive Compulsive Scale Ordinal scales Review with the family Perform comorbid psychiatric diagnoses using the Schedule for Affective Disorders and Schizophrenia for SchoolAge Children, the Children’s Depression Rating Scale-Revised, and the Revised Children’s Manifest Anxiety Scale. 14 of 55 Schizophrenia – DSM-IV Example 1: 295.30 Schizophrenia, Paranoid Type, Continuous • Current: – With severe psychotic dimension – With absent disorganized dimension – With moderate negative dimension • Lifetime: – With mild psychotic dimension – With absent disorganized dimension – With mild negative dimension Example 2: 295.60 Schizophrenia, Residual Type, Episodic With Residual Symptoms • Current: – With mild psychotic dimension – With mild disorganized dimension – With mild negative dimension • Lifetime: – With moderate psychotic dimension – With mild disorganized dimension – With mild negative dimension http://www. psychiatryonline.com Substance Abuse and Dependence An individual continues use of the substance despite significant substance-related problems. Dependence is defined as a cluster of three or more of the symptoms (Tolerance, Withdrawal, etc.) occurring at any time in the same 12-month period. In Summary Psychiatric disorders are generally assessed with instruments based on ordinal severity scores ComorbidFagerstrom psychiatric disorders are common: Test for Nicotine Dependence (FTND) TS, OCD, ADHD, etc. 1. How many cigarettes a day do you usually smoke? Smoking, Alcohol, Depression, etc. 1 to 10 11 to 20 0 point 1 point 21 to 30 30 or more 2 points 3 points 2. How soon after you wake up do you smoke your first cigarette? After 60 minutes 31- 60 minutes 0 point 1 point 6 - 30 minutes < 5 minutes 2 points 3 points 3. Do you smoke more during the first two hours of the day than during the rest of the day? No 0 point Yes 1 point The first cigarette in the morning 1 point 4. Which cigarette would you most hate to give up? Any other cigarette than the first one 0 point 5. Do you find it difficult to refrain from smoking in places where it is forbidden, such as public buildings, on airplanes or at work? No 0 point Yes 1 point 6. Do you still smoke even when you are so ill that you are in bed most of the day? No 0 point Yes Total points 1 point Ordinal Traits April 24, 2009 September 17, 2008 Experimental Cross 18 of 55 Genetic Analysis of Ordinal Traits Genetic Analysis of Ordinal Traits Software for Analysis of Ordinal Traits 21 of 55 LOT: Linkage Analysis of Ordinal Traits LOT is a software program that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait. Contents 1.Citation 2.Condition of use 3.Versions 4.Methodology 5.Input file formats 1..loc file 2..ped file 6.Downloads 7.Running LOT 1.Running LOT with GUI on Windows and Linux 2.Running LOT from command line in Windows 3.Running LOT from command line in Linux 4.Running LOT from command line in Mac OS X 8.Genehunter License Agreement LOT: Methodology Inference of Inheritance Vectors v(t) • Nuclear family: 2 founders and n nonfounders • Alleles of the two founders (1,2) (3,4) • v(t) = (v1, v2, …, v2n-1, v2n)’ =1, if grandpaternal allele is transmitted to the paternal meiosis to the jth sibling v2j-1 =2, if grandmaternal allele is transmitted to the paternal meiosis to the jth sibling =3, if grandpaternal allele is transmitted to the maternal meiosis to the jth siblingz v2j =4, if grandmaternal allele is transmitted to the maternal meiosis to the jth sibling • More complex pedigree: f founders and n nonfounders. Alleles of the f founders (1,2) (3,4) (5,6) … (2f-1,2f) Genetic Model and Hypothesis Testing • Latent variable • U1 : common genetic or environmental factors in a family not observed through the covariates • U2: genetic susceptibility of the family founders and nonfounders • Proportional-odds logistic model logit P Y ji k U i , v i x ij k U1i 1 U 2i j 2 , k 0,1,...,K LOT: Data Files Two input files are required: a locus data file and pedigree file. • Locus file: This file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. The format of this file is very similar from the standard GENEHUNTER (or LINKAGE) format. • Pedigree file: This file consists of columns with the following information in the correct order : Pedigree_ID Person_ID Father_ID Mother_ID Sex Phenotype Marker_genotypes Covariates LOT: Output Association Analysis … n1 siblings … ni siblings nn siblings n families 26 of 55 O-TDT General Test Statistic Assume that there are n nuclear families. In the i th family, there are ni siblings, i=1,…, n. For the j th child in the i th family, the trait value is yij , the covariates is zij and the genotype is gij . X ij is the number of allele A in the genotype gij. The association test statistic can be constructed as follows: n n ni T Ti Wij X ij , i 1 i 1 j 1 where Wij is a weight function of yij and zij. O-TDT Model and Method • Di-allelic maker with possible alleles A and a. • Assume that there is a trait increasing allele , and we use to denote the wild type allele(s) • Consider a trait taking values in ordinal responses 1,…, K. Model: logit( P(Y k | g )) k I ( g ) ' z, k 1,...,K 1, where k ' s are levelparameters, and is geneticeffect. I ( g ) is thenumber of copiesof allele D in genotypeg. O-TDT Score Statistic The score function under the null hypothesis is T E(T | Y , M P ), where n ni T w( yij , zij ) X ij , i 1 j 1 w(k , z) 1 ˆ(k , z) ˆ(k 1, z) exp(ˆk ˆ ' z ) ˆ (k , z ) , k 1, K 1 ˆ 1 exp(k ˆ ' z ) ˆ(0, z ) 0 ˆ( K , z) 1 Simulation Powers Based on 10,000 Replications – Test for Association in the Presence of Linkage #F K 200 3 4 400 3 4 Sig. level OTDT QTDT TDT 0.05 0.4067 0.2334 0.1961 0.01 0.1853 0.0842 0.0654 0.001 0.0469 0.0171 0.0116 0.05 0.4531 0.2354 0.1844 0.01 0.2201 0.0862 0.0618 0.001 0.0596 0.0164 0.0102 0.05 0.6960 0.4266 0.3471 0.01 0.4486 0.2068 0.1549 0.001 0.1887 0.0594 0.0384 0.05 0.7704 0.4609 0.3508 0.01 0.5405 0.2323 0.1556 0.001 0.2572 0.0707 0.0404 Quantitative Trait Collaborative Studies on Genetics of Alcoholism (COGA) • In United States, 12.5% of Adults has ever had alcohol dependence problem in their life time (Hasin, et al, 2007) • A large scale, multi-center study to map alcohol dependence susceptible genes. • 143 families with 1614 individuals. 4720 SNPs from Illumina genotype data set. • One ordinal trait with 4 levels was recorded (pure unaffected, never drank, unaffected with some symptoms, and affected). • FBAT was also used for comparison 32 of 55 Association Analysis of COGA Data SNP Markers That Are Significant at the 0.001 Level Based on O-TDT after Adjusting for Gender and Age SNP Markers Chromos ome Physical location rs1972373 14 rs1571423 P-values Gender and Age Adjusted Unadjusted 18435498 0.00038 0.00017 10 125256948 0.00046 0.00035 rs485874 1 18182512 0.00050 0.00101 rs619 X 29916017 0.00055 0.07736 rs718251 8 52437707 0.00067 0.01073 rs1869907 15 38835904 0.00087 0.03067 Gene Names LOC440007 GK Multivariate Traits Smoking Extraneous Variable Nicotine Drinking Comorbid psychiatric disorders are common and their determinants are multi-factorial. 34 of 55 Multivariate Traits In theory, comorbid disorders should be considered. Technically, testing multiple traits simultaneously can avoid adjusting for multiple testing. But • How beneficial is it to consider multiple traits? • In what situations, is it most beneficial to consider multiple traits? Graphical Structures for Simulation Models Although we do not observe the causal relationship between the genotypes and traits or among the traits, we generate the data from 40 directed acyclic graphs (DAGs). For example, Y1 G Y2 Y3 An arrow between any two elements points to a causal relationship DAGs 1-20 DAGs 21-40 SEMs for each DAG (quantitative traits) For Yj in a DAG, if there exist some arrows pointing to Yj , say, an arrow from gene G to Yj and an arrow from Yk to Yj , we reflect these relationships through a linear regression model as follows, Yj j j X G kjYk j , for j, k 1,2,3 w here j is distributed as N (0, 2j ), 1, 2 , 3 are mutually independent. Conditiona l on X G and Yk , Y j can be generated from the normal distribution N ( j j X G kjYk , 2j ). If there are no arrows pointing to Yj , Yj is independent of the disease gene and other traits, and distributed as N ( j , 2j ). Heritability and Interability Without loss of generality, we use the following models for illustration Y1 1 X G 1 , Y2 2 X G 12Y1 2 , Y3 3 X G 13Y1 23Y2 3 , Heritabili ty : h2j Var ( j X G ) Var (Y j ) , Interabili ty : tkj2 Var ( kjYk ) Var (Y j ) . Af ter some simple algebra, w ehave 1 h12 1 h22 t122 β1 , β2 , 12 , 2 2 2 2 2 1 t12 h2 2 p (1 p ) 1 h1 2 p (1 p ) 1 t12 h2 2 h32 t132 t 23 1 β3 , 13 , 23 . 2 2 2 2 2 2 2 2 2 1 t13 t 23 h3 1 t13 t 23 h3 2 p (1 p ) 1 t13 t 23 h3 Extraneous Variables (EV) There may exist one or more extraneous variables that are not included in the traits under consideration and that results in correlations among the traits under consideration Y1 G EV Y2 Y3 To accommodat e this situation, w econsider that ( 1 , 2 , 3 )' is distributed as N (0, ), w here ( kj2 ) 33 represents the correlation among the traits under consideration that is induced by extraneous variables. Simulation Design and Settings • Generate the parent’s genotype via the haplotype frequencies (AD=0.2, Ad=0.1, aD=0.1, ad=0.6, where D is the minor allele in trait locus G and A is the minor allele in the marker locus) • Given the parental genotypes, generate the offspring genotype using 1cM between trait locus and marker locus • Conditional on the trait genotype, using the SEMs of each DAG discussed above to generate the trait values for different scenario. Let j 0, 2j 1, h2j 0.05, tkj2 0.05,0.15, and 0.35. In the presenceof extraneous variables, w elet 2jj 1, kj2 kj 0.2 and 0.2 for k j. 42 of 55 Testing Strategies • Univariate FBAT Rabinowitz, 1997; Whittaker and Lewis 1998 • FBAT-GEE for multiple traits Lange et al. 2003 Type I Errors: Quantitative Traits (alpha=0.01) kj tkj2 0.15 tkj2 0.05 tkj2 0.35 Structure No. Un-FBAT FBAT-GEE Un-FBAT FBAT-GEE Un-FBAT FBAT-GEE S1 0.0099 0.0100 0.0099 0.0100 0.0099 0.0100 S2 0.0096 0.0096 0.0085 0.0092 0.0101 0.0097 S3 0.0088 0.0095 0.0092 0.0091 0.0081 0.0089 S4 0.0098 0.0095 0.0095 0.0098 0.0092 0.0093 S5 0.0095 0.0091 0.0094 0.0091 0.0098 0.0099 S6 0.0090 0.0093 0.0091 0.0091 0.0070 0.0085 S1 0.0090 0.0097 0.0090 0.0097 0.0090 0.0097 S2 0.0100 0.0101 0.0094 0.0097 0.0094 0.0097 S3 0.0101 0.0101 0.0092 0.0096 0.0084 0.0096 S4 0.0095 0.0099 0.0101 0.0102 0.0087 0.0102 S5 0.0099 0.0100 0.0092 0.0101 0.0085 0.0095 S6 0.0093 0.0092 0.0080 0.0092 0.0078 0.0096 S1 0.0095 0.0097 0.0095 0.0097 0.0095 0.0097 S2 0.0102 0.0101 0.0095 0.0097 0.0094 0.0097 S3 0.0104 0.0089 0.0098 0.0096 0.0093 0.0096 S4 0.0098 0.0096 0.0094 0.0097 0.0103 0.0102 S5 0.0090 0.0096 0.0095 0.0097 0.0093 0.0097 S6 0.0093 0.0091 0.0094 0.0096 0.0078 0.0097 -- 0.2 -0.2 Power: Quantitative Traits (Alpha=0.01) Black : kj , Red : kj 0.2, Green : kj 0.2. t Power 2 0.35 0.0 0.2 0.4 0.6 0.8 1.0 FBAT: dots and FBAT-GEE: triangles. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 t Power 2 0.15 0.0 0.2 0.4 0.6 0.8 1.0 Structure No. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 t Power 2 0.05 0.0 0.2 0.4 0.6 0.8 1.0 Structure No. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Structure No. Multivariate Trait Kendall’s Tau: a non-parametric statistic measuring the strength of the relationship between two variables Let ( X i , Yi ) and ( X j , Y j ) be a pair of observations. If X j X i and Yj Yi have thesame sign, we say that he t pair is concordant. If they havedifferentsign, we say that he t pair is disconcordant. For a samplesize n. T heKendallT auis defined as 2(C - D)/n(n-1 ) where C and D are thenumber of concordantand disconcordant pairs. 46 of 55 Association Test Observations: A vectorof traitsT (T (1) ,...,T ( p) )'and a vectorof markersM (M (1) ,...,M (G) )'. Notations: n Let U 2 1 u i j ij vij , where uij (f1 (Ti(1) - Tj(1) ),...,f p (Ti(p) - Tj(p) ))' and vij (Ci (1) - C j (1),...,Ci (G) - C j (G))' Test Statistic 2 W U 'Vara1 (U )U ~ rank (Var0 (U )) distributed Simulation Study-Model Setting Nominal type I error comparison thecoefficient of linkagedisequilibrium takesvalueof 0 Power evaluation thecoefficient of linkagedisequilibrium takesvalueof 0.11 Given the genotype at the trait locus, a non-proportional odds model is used to generate ordinal phenotype data and a Gaussian distributed model is used for quantitative phenotype 48 of 55 Type I error comparison alpha = 0.05 K O-FBAT 200 3 0.043 0.044 0.009 0.009 0.001 0.001 4 0.049 0.051 0.008 0.007 0.001 0.001 5 0.059 0.062 0.013 0.01 <0.001 <0.001 6 0.047 0.043 0.005 0.005 <0.001 <0.001 3 0.049 0.051 0.012 0.009 0.002 0.002 4 0.055 0.054 0.009 0.011 0.001 0.001 5 0.042 0.041 0.006 0.006 0.001 0.002 6 0.045 0.045 0.006 0.008 0.001 0.001 3 0.036 0.038 0.006 0.006 <0.001 <0.001 4 0.054 0.055 0.013 0.010 0.001 0.001 5 0.061 0.055 0.005 0.009 0.001 <0.001 6 0.038 0.038 0.006 0.007 <0.001 <0.001 600 O-FBAT alpha = 0.001 #(family) 400 FBAT alpha = 0.01 FBAT O-FBAT FBAT Power Comparison alpha = 0.05 K O-FBAT 200 3 0.783 0.778 0.553 0.541 0.261 0.249 4 0.732 0.702 0.492 0.456 0.213 0.184 5 0.760 0.672 0.541 0.429 0.277 0.193 6 0.504 0.403 0.266 0.184 0.076 0.042 3 0.980 0.982 0.922 0.916 0.757 0.752 4 0.961 0.946 0.882 0.857 0.664 0.627 5 0.978 0.949 0.914 0.839 0.757 0.604 6 0.792 0.664 0.584 0.437 0.328 0.203 3 0.999 0.999 0.989 0.991 0.958 0.954 4 0.996 0.988 0.978 0.970 0.920 0.885 5 0.999 0.990 0.987 0.957 0.935 0.837 6 0.947 0.859 0.826 0.658 0.582 0.379 600 O-FBAT FBAT alpha = 0.001 #(family) 400 FBAT alpha = 0.01 O-FBAT FBAT Application for COGA Data • Phenotypes: – Alcohol DX-DSM3R+Feighner (ALDX1) • 4 categories – Maximum number of drinks in a 24 hour period (MaxDrink) • 4 categories – Spent so much time drinking, had little time for anything else (TimeDrink) • 3 categories 51 of 55 Single trait analysis D7S679 with p-value 0.002879 for ALDX1 > 0.000538 = 0.05/(3*31) Multiple traits analysis P-value is 0.000553 < 0.0016129 = 0.05/31 at marker D7S679, which is around 1 cM away from D7S1793 that has been reported to have linkage evidence. Closing Comments • Genetic studies of mental diseases involve many challenges: some are clinical, some are statistical, and some are scientific. • We attempted to deal with a few statistical challenges. It remains to be seen as to whether we succeeded. However, our solutions appear promising. • We need more people to pay attention to these challenges and be persistent in our pursuit. 54 of 55 Acknowledgements Xiang Chen Rui Feng Minghui Wang Yuanqing Ye Ching-Ti Liu Xueqin Wang Meizhuo Zhang Wensheng Zhu