Statistical Challenges in Genetic Studies of Mental Disorders Heping Zhang Collaborative Center for Statistics in Science Yale University School of Medicine June 9, 2009 Institute of.

Download Report

Transcript Statistical Challenges in Genetic Studies of Mental Disorders Heping Zhang Collaborative Center for Statistics in Science Yale University School of Medicine June 9, 2009 Institute of.

Statistical Challenges in Genetic Studies
of Mental Disorders
Heping Zhang
Collaborative Center for Statistics in Science
Yale University School of Medicine
June 9, 2009
Institute of Mathematical Statistics
National University of Singapore
Outline
• Heredity of Psychiatric Disorders – A Century
Ago
– One Example
• Genetic Studies of Mental Disorders – As We
Are Speaking
– Three Examples
• Statistical Challenges – Our Progress
– Ordinal Traits
– Multivariate Traits
• Closing Comments and Acknowledgements
2 of 55
PRELIMINARY REPORT OF A STUDY OF HEREDITY
IN INSANITY IN THE LIGHT OF THE
MENDELIAN LAWS
BY GERTRUDE L. CANNON, A.M., AND A. J. ROSANOFF, M.D.
KINGS PARK STATE HOSPITAL, NEW YORK
3 of 55
Pedigrees from 11 Neuropathetic
Patients
Correlated Phenotypes
Theoretical Conclusions
The Genetics of Tourette
Syndrome
Tourette syndrome is a complex disorder
characterized by repetitive, sudden, and
involuntary movements or noises called tics.
Concordance in MZ twins ~ 50%
Concordance in DZ twins < 10%
In 1986, Pauls and Leckman concluded that
Tourette's syndrome is inherited as a highly
penetrant, sex-influenced, autosomal
dominant trait.
Pete Bennett, winner of the
7th series of Big Brother
7 of 55
The Genetics of Tourette
Syndrome
In 2005, State’s lab identified mutations involving the
SLITRK1 gene (13q31.1) in a small number of people with
Tourette syndrome.
Most people with Tourette syndrome do not have a mutation
in the SLITRK1 gene. Because mutations have been reported
in so few people with this condition, the association of the
SLITRK1 gene with this disorder has not been confirmed.
TSICG (2008): Lack of association between SLITRK1var321
and Tourette syndrome in a large family-based sample
Schizophrenia
Schizophrenia is a chronic, severe, and disabling brain
disorder that affects about 1.1 percent of the U.S.
population age 18 and older in a given year.
People with schizophrenia sometimes hear voices
others don’t hear, believe that others are broadcasting
their thoughts to the world, or become convinced that
others are plotting to harm them.
These experiences can make them fearful and
withdrawn and cause difficulties when they try to have
relationships with others.
http://www.nimh.nih.gov
9 of 55
Genetic Studies of Schizophrenia
Kraepelin (Textbook of Psychiatry, 1896) described ‘Dementia
Praecox’ as an inherited disorder.
Kety, Rosenthal, and Wender conducted a series of adoption
studies beginning in 1968, establishing genetic basis for
schizophrenics.
In 1987-88, it was reported “Bipolar affective disorders linked to
DNA markers on chromosome 11” and “Localization of a
susceptibility locus for schizophrenia on chromosome 5.”
Some regions (e.g., dysbindin on chromosome 6p, neuregulin
on 8p and G72 on 13q) have been more consistently identified
as candidate regions.
Attract a lot of publicity,
but couldn’t be replicated
There may not be a true sequence variation in a gene that
causes illness. Rather, variable expression through
epigenetic modification of gene activation may be the key
(DeLisi et al. 2007).
Genetic Studies of Schizophrenia
Nature Online July 30, 2008:
• International Schizophrenia Consortium:
3,391 schizophrenia cases and 3,181 controls in a European
sample
• Stefansson et al.:
1,433 schizophrenia cases and 33,250 controls
3,285 cases and 7,951 controls
• Both groups report genetic deletions associated
with schizophrenia in the same three locations
on chromosomes 1 and 15
a third deletion on chromosome 22 that has previously been
connected with increased susceptibility to schizophrenia.
Genetic Studies of Schizophrenia
Nature July 31, 2008:
• The surveys have identified sections of the human
genome that, when deleted, can elevate the risk of
developing schizophrenia by up to 15 times compared
with the general population.
• In ISC study, a total of 890 CNVs were observed in
either a case or a control as a single occurrence. This
set of CNVs showed a 1.45-fold increase in cases
(empirical P = 5E-6). On average, 13.1% of cases of
schizophrenia possessed a deletion or duplication
observed only once in the sample, in contrast to 10.4%
of controls.
Smoking
In 1990s, a series of large-sample twin studies in the US
and other countries showed repeatedly that smoking is a
heritable behavior.
The heritability for nicotine dependence is estimated
around 50%.
In the last decade, about 20 genome-wide linkage scans
for smoking behavior have been reported, but only a
limited number of putative genomic linkages have been
replicated in independent studies (Li 2007).
Challenges include genetic heterogeneity, the size of the
genetic effect, the density of markers, the definition and
assessment of the phenotypes, and the statistical
approaches (Li 2007).
13 of 55
Diagnosis of Psychiatric Disorders
Yale Global Tic Severity Scale
and the symptom checklist and
Yale-Brown Obsessive
Compulsive Scale
Ordinal scales
Review with the family
Perform comorbid psychiatric diagnoses
using the Schedule for Affective
Disorders and Schizophrenia for SchoolAge Children, the Children’s Depression
Rating Scale-Revised, and the Revised
Children’s Manifest Anxiety Scale.
14 of 55
Schizophrenia – DSM-IV
Example 1: 295.30 Schizophrenia,
Paranoid Type, Continuous
• Current:
– With severe psychotic
dimension
– With absent disorganized
dimension
– With moderate negative
dimension
• Lifetime:
– With mild psychotic dimension
– With absent disorganized
dimension
– With mild negative dimension
Example 2: 295.60 Schizophrenia,
Residual Type, Episodic With
Residual Symptoms
• Current:
– With mild psychotic dimension
– With mild disorganized
dimension
– With mild negative dimension
• Lifetime:
– With moderate psychotic
dimension
– With mild disorganized
dimension
– With mild negative dimension
http://www. psychiatryonline.com
Substance Abuse and Dependence
An individual continues use of the substance despite
significant substance-related problems.
Dependence is defined as a cluster of three or more
of the symptoms (Tolerance, Withdrawal, etc.)
occurring at any time in the same 12-month period.
In Summary
Psychiatric disorders are generally assessed with
instruments based on ordinal severity scores
ComorbidFagerstrom
psychiatric
disorders
are common:
Test for
Nicotine Dependence
(FTND)
TS, OCD, ADHD, etc.
1. How many cigarettes a day do you usually smoke?
Smoking,
Alcohol, Depression, etc.
1 to 10
11 to 20
0 point
1 point
21 to 30
30 or more
2 points
3 points
2. How soon after you wake up do you smoke your first cigarette?
After 60 minutes
31- 60 minutes
0 point
1 point
6 - 30 minutes
< 5 minutes
2 points
3 points
3. Do you smoke more during the first two hours of the day than during the rest of the day?
No
0 point
Yes
1 point
The first cigarette in the
morning
1 point
4. Which cigarette would you most hate to give up?
Any other cigarette than the
first one
0 point
5. Do you find it difficult to refrain from smoking in places where it is forbidden, such as
public buildings, on airplanes or at work?
No
0 point
Yes
1 point
6. Do you still smoke even when you are so ill that you are in bed most of the day?
No
0 point
Yes
Total points
1 point
Ordinal Traits
April 24, 2009
September 17, 2008
Experimental Cross
18 of 55
Genetic Analysis of Ordinal Traits
Genetic Analysis of Ordinal Traits
Software for Analysis of Ordinal Traits
21 of 55
LOT: Linkage Analysis of Ordinal Traits
LOT is a software program that performs linkage analysis of ordinal traits for
pedigree data. It implements a latent-variable proportional-odds logistic model
that relates inheritance patterns to the distribution of the ordinal trait.
Contents
1.Citation
2.Condition of use
3.Versions
4.Methodology
5.Input file formats
1..loc file
2..ped file
6.Downloads
7.Running LOT
1.Running LOT with GUI on Windows and Linux
2.Running LOT from command line in Windows
3.Running LOT from command line in Linux
4.Running LOT from command line in Mac OS X
8.Genehunter License Agreement
LOT: Methodology
Inference of Inheritance Vectors v(t)
• Nuclear family: 2 founders and n nonfounders
• Alleles of the two founders (1,2) (3,4)
• v(t) = (v1, v2, …, v2n-1, v2n)’
=1, if grandpaternal allele is transmitted to the paternal meiosis to the jth sibling
v2j-1
=2, if grandmaternal allele is transmitted to the paternal meiosis to the jth sibling
=3, if grandpaternal allele is transmitted to the maternal meiosis to the jth siblingz
v2j
=4, if grandmaternal allele is transmitted to the maternal meiosis to the jth sibling
• More complex pedigree: f founders and n nonfounders.
Alleles of the f founders (1,2) (3,4) (5,6) … (2f-1,2f)
Genetic Model and Hypothesis Testing
• Latent variable
• U1 : common genetic or environmental factors in a family not observed through the
covariates
• U2: genetic susceptibility of the family founders and nonfounders
• Proportional-odds logistic model


logit P Y ji  k U i , v i  x ij    k  U1i 1  U 2i  j  2 , k  0,1,...,K
LOT: Data Files
Two input files are required: a locus data file and pedigree file.
• Locus file: This file contains information on genetic distances
between markers, number of alleles at each locus and their
frequencies. The format of this file is very similar from the standard
GENEHUNTER (or LINKAGE) format.
• Pedigree file: This file consists of columns with the following
information in the correct order :
Pedigree_ID Person_ID Father_ID Mother_ID Sex Phenotype
Marker_genotypes Covariates
LOT: Output
Association Analysis
…
n1 siblings
…
ni siblings
nn siblings
n families
26 of 55
O-TDT
General Test Statistic
Assume that there are n nuclear families. In the i th
family, there are ni siblings, i=1,…, n. For the j th
child in the i th family, the trait value is yij , the
covariates is zij and the genotype is gij . X ij is the
number of allele A in the genotype gij. The
association test statistic can be constructed as
follows:
n
n
ni
T  Ti  Wij X ij ,
i 1
i 1 j 1
where Wij is a weight function of yij and zij.
O-TDT
Model and Method
• Di-allelic maker with possible alleles A and a.
• Assume that there is a trait increasing allele , and
we use to denote the wild type allele(s)
• Consider a trait taking values in ordinal responses
1,…, K.
Model: logit( P(Y  k | g ))   k  I ( g )   ' z,
k  1,...,K  1,
where k ' s are levelparameters, and  is geneticeffect.
I ( g ) is thenumber of copiesof allele D in genotypeg.
O-TDT
Score Statistic
The score function under the null hypothesis is
T  E(T | Y , M P ), where
n
ni
T   w( yij , zij ) X ij ,
i 1 j 1
w(k , z)  1  ˆ(k , z)  ˆ(k 1, z)
exp(ˆk  ˆ ' z )
ˆ (k , z ) 
, k  1, K  1
ˆ
1  exp(k  ˆ ' z )
ˆ(0, z )  0 ˆ( K , z)  1
Simulation
Powers Based on 10,000 Replications – Test
for Association in the Presence of Linkage
#F K
200
3
4
400
3
4
Sig. level
OTDT
QTDT
TDT
0.05
0.4067
0.2334
0.1961
0.01
0.1853
0.0842
0.0654
0.001
0.0469
0.0171
0.0116
0.05
0.4531
0.2354
0.1844
0.01
0.2201
0.0862
0.0618
0.001
0.0596
0.0164
0.0102
0.05
0.6960
0.4266
0.3471
0.01
0.4486
0.2068
0.1549
0.001
0.1887
0.0594
0.0384
0.05
0.7704
0.4609
0.3508
0.01
0.5405
0.2323
0.1556
0.001
0.2572
0.0707
0.0404
Quantitative Trait
Collaborative Studies on
Genetics of Alcoholism (COGA)
• In United States, 12.5% of Adults has ever had
alcohol dependence problem in their life time
(Hasin, et al, 2007)
• A large scale, multi-center study to map alcohol
dependence susceptible genes.
• 143 families with 1614 individuals. 4720 SNPs
from Illumina genotype data set.
• One ordinal trait with 4 levels was recorded
(pure unaffected, never drank, unaffected with
some symptoms, and affected).
• FBAT was also used for comparison
32 of 55
Association Analysis of COGA Data
SNP Markers That Are Significant at the 0.001 Level Based on O-TDT after
Adjusting for Gender and Age
SNP
Markers
Chromos
ome
Physical
location
rs1972373
14
rs1571423
P-values
Gender
and Age
Adjusted
Unadjusted
18435498
0.00038
0.00017
10
125256948
0.00046
0.00035
rs485874
1
18182512
0.00050
0.00101
rs619
X
29916017
0.00055
0.07736
rs718251
8
52437707
0.00067
0.01073
rs1869907
15
38835904
0.00087
0.03067
Gene Names
LOC440007
GK
Multivariate Traits
Smoking
Extraneous Variable
Nicotine
Drinking
Comorbid psychiatric disorders are common and their
determinants are multi-factorial.
34 of 55
Multivariate Traits
In theory, comorbid disorders should be considered. Technically, testing
multiple traits simultaneously can avoid adjusting for multiple testing.
But
• How beneficial is it to consider multiple traits?
• In what situations, is it most beneficial to consider multiple
traits?
Graphical Structures for Simulation
Models
Although we do not observe the causal relationship between the
genotypes and traits or among the traits, we generate the data from
40 directed acyclic graphs (DAGs). For example,
Y1
G
Y2
Y3
An arrow between any two elements points to a causal relationship
DAGs 1-20
DAGs 21-40
SEMs for each DAG (quantitative traits)
For Yj in a DAG, if there exist some arrows pointing to Yj , say, an arrow
from gene G to Yj and an arrow from Yk to Yj , we reflect these
relationships through a linear regression model as follows,
Yj   j   j X G   kjYk   j ,
for j, k  1,2,3
w here j is distributed as N (0,  2j ),  1,  2 ,  3 are mutually independent.
Conditiona l on X G and Yk , Y j can be generated from the normal distribution
N (  j   j X G   kjYk ,  2j ).
If there are no arrows pointing to Yj , Yj is independent of the disease
gene and other traits, and distributed as N ( j , 2j ).
Heritability and Interability
Without loss of generality, we use the following models for illustration
Y1  1 X G  1 ,
Y2   2 X G   12Y1   2 ,
Y3  3 X G   13Y1   23Y2   3 ,
Heritabili ty : h2j 
Var ( j X G )
Var (Y j )
,
Interabili ty : tkj2 
Var ( kjYk )
Var (Y j )
.
Af ter some simple algebra, w ehave
1
h12
1
h22
t122
β1 
, β2 
,  12 
,
2
2
2
2
2
1  t12  h2
2 p (1  p ) 1  h1
2 p (1  p ) 1  t12  h2
2
h32
t132
t 23
1
β3 
,  13 
,  23 
.
2
2
2
2
2
2
2
2
2
1  t13  t 23  h3
1  t13  t 23  h3
2 p (1  p ) 1  t13  t 23  h3
Extraneous
Variables
(EV)
There may exist one or more extraneous variables that are not included
in the traits under consideration and that results in correlations among
the traits under consideration
Y1
G
EV
Y2
Y3
To accommodat e this situation, w econsider that ( 1 ,  2 ,  3 )' is distributed as
N (0, ), w here  ( kj2 ) 33 represents the correlation among the traits under
consideration that is induced by extraneous variables.
Simulation Design and Settings
• Generate the parent’s genotype via the haplotype frequencies
(AD=0.2, Ad=0.1, aD=0.1, ad=0.6, where D is the minor allele in trait
locus G and A is the minor allele in the marker locus)
• Given the parental genotypes, generate the offspring genotype
using 1cM between trait locus and marker locus
• Conditional on the trait genotype, using the SEMs of each DAG
discussed above to generate the trait values for different scenario.
Let  j  0,  2j  1, h2j  0.05, tkj2  0.05,0.15, and 0.35.
In the presenceof extraneous variables, w elet
 2jj  1,  kj2   kj  0.2 and  0.2 for k  j.
42 of 55
Testing Strategies
• Univariate FBAT
Rabinowitz, 1997; Whittaker and Lewis 1998
• FBAT-GEE for multiple traits
Lange et al. 2003
Type I Errors: Quantitative Traits (alpha=0.01)
 kj
tkj2  0.15
tkj2  0.05
tkj2  0.35
Structure
No.
Un-FBAT
FBAT-GEE
Un-FBAT
FBAT-GEE
Un-FBAT
FBAT-GEE
S1
0.0099
0.0100
0.0099
0.0100
0.0099
0.0100
S2
0.0096
0.0096
0.0085
0.0092
0.0101
0.0097
S3
0.0088
0.0095
0.0092
0.0091
0.0081
0.0089
S4
0.0098
0.0095
0.0095
0.0098
0.0092
0.0093
S5
0.0095
0.0091
0.0094
0.0091
0.0098
0.0099
S6
0.0090
0.0093
0.0091
0.0091
0.0070
0.0085
S1
0.0090
0.0097
0.0090
0.0097
0.0090
0.0097
S2
0.0100
0.0101
0.0094
0.0097
0.0094
0.0097
S3
0.0101
0.0101
0.0092
0.0096
0.0084
0.0096
S4
0.0095
0.0099
0.0101
0.0102
0.0087
0.0102
S5
0.0099
0.0100
0.0092
0.0101
0.0085
0.0095
S6
0.0093
0.0092
0.0080
0.0092
0.0078
0.0096
S1
0.0095
0.0097
0.0095
0.0097
0.0095
0.0097
S2
0.0102
0.0101
0.0095
0.0097
0.0094
0.0097
S3
0.0104
0.0089
0.0098
0.0096
0.0093
0.0096
S4
0.0098
0.0096
0.0094
0.0097
0.0103
0.0102
S5
0.0090
0.0096
0.0095
0.0097
0.0093
0.0097
S6
0.0093
0.0091
0.0094
0.0096
0.0078
0.0097
--
0.2
-0.2
Power: Quantitative Traits (Alpha=0.01)
Black : kj  , Red : kj  0.2, Green : kj  0.2.
t
Power
2
0.35
0.0
0.2
0.4
0.6
0.8
1.0
FBAT: dots and FBAT-GEE: triangles.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
t
Power
2
0.15
0.0
0.2
0.4
0.6
0.8
1.0
Structure No.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
t
Power
2
0.05
0.0
0.2
0.4
0.6
0.8
1.0
Structure No.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Structure No.
Multivariate Trait
Kendall’s Tau: a non-parametric statistic measuring the
strength of the relationship between two variables
Let ( X i , Yi ) and ( X j , Y j ) be a pair of observations. If X j  X i and
Yj  Yi have thesame sign, we say that he
t pair is concordant. If they
havedifferentsign, we say that he
t pair is disconcordant.
For a samplesize n. T heKendallT auis defined as
  2(C - D)/n(n-1 )
where C and D are thenumber of concordantand disconcordant pairs.
46 of 55
Association Test
Observations:
A vectorof traitsT  (T (1) ,...,T ( p) )'and a vectorof markersM  (M (1) ,...,M (G) )'.
Notations:
n
Let U   
 2
1
u
i j
ij
 vij , where
uij  (f1 (Ti(1) - Tj(1) ),...,f p (Ti(p) - Tj(p) ))'
and
vij  (Ci (1) - C j (1),...,Ci (G) - C j (G))'
Test Statistic
2
W  U 'Vara1 (U )U ~ rank
(Var0 (U ))  distributed
Simulation Study-Model Setting
Nominal type I error comparison
thecoefficient of linkagedisequilibrium  takesvalueof 0
Power evaluation
thecoefficient of linkagedisequilibrium  takesvalueof 0.11
Given the genotype at the trait locus, a non-proportional odds model is
used to generate ordinal phenotype data and a Gaussian distributed
model is used for quantitative phenotype
48 of 55
Type I error comparison
alpha = 0.05
K
O-FBAT
200
3
0.043
0.044
0.009
0.009
0.001
0.001
4
0.049
0.051
0.008
0.007
0.001
0.001
5
0.059
0.062
0.013
0.01
<0.001
<0.001
6
0.047
0.043
0.005
0.005
<0.001
<0.001
3
0.049
0.051
0.012
0.009
0.002
0.002
4
0.055
0.054
0.009
0.011
0.001
0.001
5
0.042
0.041
0.006
0.006
0.001
0.002
6
0.045
0.045
0.006
0.008
0.001
0.001
3
0.036
0.038
0.006
0.006
<0.001
<0.001
4
0.054
0.055
0.013
0.010
0.001
0.001
5
0.061
0.055
0.005
0.009
0.001
<0.001
6
0.038
0.038
0.006
0.007
<0.001
<0.001
600
O-FBAT
alpha = 0.001
#(family)
400
FBAT
alpha = 0.01
FBAT
O-FBAT
FBAT
Power Comparison
alpha = 0.05
K
O-FBAT
200
3
0.783
0.778
0.553
0.541
0.261
0.249
4
0.732
0.702
0.492
0.456
0.213
0.184
5
0.760
0.672
0.541
0.429
0.277
0.193
6
0.504
0.403
0.266
0.184
0.076
0.042
3
0.980
0.982
0.922
0.916
0.757
0.752
4
0.961
0.946
0.882
0.857
0.664
0.627
5
0.978
0.949
0.914
0.839
0.757
0.604
6
0.792
0.664
0.584
0.437
0.328
0.203
3
0.999
0.999
0.989
0.991
0.958
0.954
4
0.996
0.988
0.978
0.970
0.920
0.885
5
0.999
0.990
0.987
0.957
0.935
0.837
6
0.947
0.859
0.826
0.658
0.582
0.379
600
O-FBAT
FBAT
alpha = 0.001
#(family)
400
FBAT
alpha = 0.01
O-FBAT
FBAT
Application for COGA Data
• Phenotypes:
– Alcohol DX-DSM3R+Feighner (ALDX1)
• 4 categories
– Maximum number of drinks in a 24 hour
period (MaxDrink)
• 4 categories
– Spent so much time drinking, had little time
for anything else (TimeDrink)
• 3 categories
51 of 55
Single trait analysis
D7S679 with p-value 0.002879 for ALDX1 > 0.000538 = 0.05/(3*31)
Multiple traits analysis
P-value is 0.000553 < 0.0016129 = 0.05/31 at marker
D7S679, which is around 1 cM away from D7S1793 that
has been reported to have linkage evidence.
Closing Comments
• Genetic studies of mental diseases involve many
challenges: some are clinical, some are
statistical, and some are scientific.
• We attempted to deal with a few statistical
challenges. It remains to be seen as to whether
we succeeded. However, our solutions appear
promising.
• We need more people to pay attention to these
challenges and be persistent in our pursuit.
54 of 55
Acknowledgements
Xiang Chen
Rui Feng
Minghui Wang Yuanqing Ye
Ching-Ti Liu
Xueqin Wang
Meizhuo Zhang
Wensheng Zhu