Introduction to Genetic Epidemiology [M.Tevfik DORAK]

Download Report

Transcript Introduction to Genetic Epidemiology [M.Tevfik DORAK]

Genetic Epidemiology
M. Tevfik DORAK
http://www.dorak.info/epi/genetepi.html
Approaches to the identification of susceptibility genes
Rebbeck TR. Cancer 1999 (www)
Palmer LJ. Webcast
(www)
GENETIC EPIDEMIOLOGIC RESEARCH METHODS
Handbook of Statistical Genetics
(John Wiley & Sons)
Fig.28-1 (www)
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
Autosomal recessive disorders are usually common in populations with high level
of inbreeding (restricted gene pool). Examples are Tangier disease in Tangier
Island off the coast of Virginia, USA; many genetic disorders in Ashkenazi Jews
(Tay-Sachs disease, Gaucher disease, Fanconi anaemia, Niemann-Pick
disease); congenital adrenal hyperplasia (CAH) due to 21-hydroxylase deficiency
in Yupik Eskimos; CAH due to 11-beta hydroxylase deficiency in Moroccan Jews;
and thalassaemias (beta & alpha) in Cyprus and Sardinia
Populations like Finland, Iceland and Newfoundland exhibit an increased
prevalence of rare recessive diseases (congenital nephrotic syndrome of the
Finnish type and Newfoundland rod-cone dystrophy)
Study Designs in Genetic Epidemiology
* nuclear families
(index case and parents)
* affected relative pairs
(sibs, cousins, any two members of the family)
* extended pedigrees
* twins
(monozygotic and dizygotic)
* unrelated population samples
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
Risk Ratio (Lambda)
Genetics in Clinical Research
(www)
Risk Ratio (Lambda)
Genetics in Clinical Research
(www)
Sibling Recurrence Risk / Sibling Risk Ratio (lS )
Curnow & Smith: J Roy Stat Soc 1975;138:139-169
ROCHE Genetic Education (www)
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
(MacGregor, 2000)
ROCHE Genetic Education (www)
ROCHE Genetic Education (www)
Adoption Studies
1. Compare the risk in biological relatives with adopted
relatives of affected adoptees (beware of adoption bias)
2. Compare the risk in biological relatives with adopted
relatives of unaffected adoptees
Migrant Studies
Liao CK et al. Endometrial cancer in Asian migrants to the United
States and their descendants. Cancer Causes Control 2003;14:357-60
(www)
Flood DM et al. Colorectal cancer incidence in Asian migrants to the
United States and their descendants. Cancer Causes Control
2000;11:403-11 (www)
Feltbower RG et al. Trends in the incidence of childhood diabetes in
south Asians and other children in Bradford, UK. Diabet Med
2002;19:162-6 (www)
“ Children in south Asia have a low incidence of type 1 diabetes but migrants to the UK
have similar overall rates to the indigenous population. However, a more steeply rising
incidence is seen in the south Asian population, and our data suggest that incidence in
this group may eventually outstrip that of the non-south Asians. Genetic factors are
unlikely to explain such a rapid change, implying an influence of environmental factors in
disease aetiology “
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
(www)
Washington University
(www)
Modes of inheritance
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
(www)
ROCHE Genetic Education (www)
Differences between linkage and association
Linkage
Association
Linkage is a property of loci
Association is a property of alleles
Role:
* To identify a biological mechanism for
transmission of a trait
* To locate the gene involved
Role:
* To identify association between an allelic
variant and a disease
* To identify linkage disequilibrium between a
disease allele and a marker
Coarse mapping (>1cM)
Fine mapping (<1cM)
No information about which allelic variant
associated with higher risk of disease
Require family pedigrees
Case-control or family based approach
Use very polymorphic markers
Usually bi-allelic markers
Risch NJ. Nature 2000
GENETIC EPIDEMIOLOGY
Flow of research
Disease characteristics:
Familial clustering:
Genetic or environmental:
Mode of inheritance:
Disease susceptibility loci:
Disease susceptibility markers:
Descriptive epidemiology
Family aggregation studies
Twin/adoption/half-sibling/migrant studies
Segregation analysis
Linkage analysis
Association studies
Association Studies
Population-based
Cases and unrelated population controls from the
same study base
Family-based
Child-family trios and TDT design is the most common
Odds Ratio: 3.6
95% CI = 1.3 to 10.4
ROCHE Genetic Education (www)
Genetic Models and
Case-Control Association Data Analysis
The data may also be analysed assuming a prespecified
genetic model. For example, with the hypothesis that
carrying allele B increased risk of disease (dominant
model), the AB and BB genotypes are pooled giving a
2x3x2 table. This is particularly relevant when allele B is
rare, with few BB observations in cases and controls.
Alternatively, under a recessive model for allele B, cells
AA and AB would be pooled. Analysing by alleles
provides an alternative perspective for case control
data. This breaks down genotypes to compare the total
number of A and B alleles in cases and controls,
regardless of the genotypes from which these alleles
are constructed. This analysis is counter-intuitive, since
alleles do not act independently, but it provides the
most powerful method of testing under a multiplicative
genetic model, where risk of developing a disease
increases by a factor r for each B allele carried: risk r for
genotype AB and r2 for genotype BB. If a multiplicative
genetic model is appropriate, both case and control
genotypes will be in Hardy–Weinberg equilibrium, and
this can be tested for. A fourth possible genetic model
is additive, with an increased disease risk of r for AB
genotypes, and 2r for BB genotypes. This model shows
a clear trend of an increased number of AB and BB
genotypes, with the risk for AB genotypes approximately
half that for BB genotypes. The additive genetic model
can be tested for using Armitage’s test for trend.
Lewis CM. Brief Bioinform 2002 (www)
ROCHE Genetic Education (www)
Linkage disequilibrium and population demography
Mapping disease genes by association requires the identification
of linkage disequilibrium (LD) between a marker and a disease
phenotype. Several studies of African populations have indicated
that levels and patterns of LD in these populations differ from
those in non-African populations owing to the age of African
populations, admixture with other African and non-African
populations, and historical differences in population size and
substructure. A disease mutation (shown in violet) that occurs on
a single haplotype background will initially be in complete LD with
flanking markers on that chromosome (see panel a). In each
generation, LD between a marker and a disease allele decays
owing to recombination between the sites, and also because of
the effects of mutation and gene conversion at marker loci. Young
populations, and those that have undergone recent bottlenecks
(as probably occurred during the migration of ancestral humans
out of Africa), will have haplotype blocks of large to moderate size
(panel b, shown in green). In older and larger African populations,
in which there has been more recombination, the size of
haplotype blocks will probably be smaller (panel c). LD can also
be established by a founder event, with the strength and extent of
the LD depending on the severity and length of the bottleneck
event. Population substructure increases LD owing to a smaller
effective population size and to higher levels of genetic drift in
subdivided populations. So, if a pooled sample derived from
several African populations was analysed, spurious LD would be
detected, even if the haplotypes in each subpopulation were in LD.
This could lead to erroneous conclusions about the association
between genetic markers and disease phenotype. Small
populations of stable size are expected to show LD between
closely linked loci as a result of increased genetic drift, and larger
populations will have fewer sites in LD. New mutations are less
likely to be in LD in growing populations owing to the smaller
effect of genetic drift, but allelic associations that exist before
population expansion might persist for a longer period of time in
an expanding population than in a population of constant size.
Tishkoff, Nat Reviews Genet 2002 (www)
Mapping Disease Susceptibility Genes by Association Studies
(www)
Mapping Disease Susceptibility Genes by Association Studies
Plot of minus log of P value for case-control test for allelic association with AD, for SNPs immediately
surrounding APOE (<100 kb)
Martin, 2000 (www)
Carlson, 2004 (www)
Sample size requirements for different genetic models
Palmer & Cardon, Lancet 2005 (www)
Sample size requirements as a function of allele
frequencies
Johnson GC et al. Nat Genet 2001 (www)
Sample size requirements as a function of the strength of
association
Botstein & Risch. Nat Genet 2003 (www)
SNP Selection for Association Studies
- Regulatory / Functional SNPs -
(www)
FastSNP
(www)
Yuan, 2006
(www)
SNP Selection for Association Studies
- Regulatory / Functional SNPs -
(www)
Yue, 2006
(www)
SNP Selection for Association Studies
- Haplotype Tagging SNPs -
(www)
(www)
Haplotype Association
Tabor HK et al. Nature Rev Genetics 2002 (www)
Illustration of tagging SNPs
a | The diagram shows five haplotypes. Twelve single nucleotide
polymorphisms (SNPs) are localized in order along the
chromosome. The letters on the top indicate groups of SNPs that
have perfect pairwise linkage disequilibrium (LD) with one
another, and the numbers on the bottom indicate each of the 12
SNPs. SNP 9 is the causal variant, which in this simple example
determines drug response: allele C results in a therapeutic
response, whereas allele G results in an adverse reaction. In this
example, the selection of just one SNP from each of the groups
A–E would be sufficient to fully represent all of the haplotype
diversity. Each haplotype can be identified by just five tagging
SNPs (tSNPs), and the causal variant would be tagged even if it
were not itself typed (in fact, multi-marker approaches to tSNP
selection would reduce the set of tags to fewer than five, but this
is ignored for simplicity). So, tSNP profiles that are highlighted
predict an adverse reaction to the medicine. Normally, LD
patterns are not so clear-cut and statistical methods are required
to select appropriate sets of tSNPs.
b | The diagram depicts the same 12 SNPs, but with different
associations among them, as might happen in a different
population group. Because patterns of LD are different, some
patients would be misclassified if the same five tSNPs were used
and interpreted in the same way; that is, using the same SNP
profiles as defined in population A, haplotype profiles 1, 2 and 3
are predicted to have allele C at the causal SNP 9 (a therapeutic
response), whereas haplotype profiles 4 and 5 are predicted to
have an adverse response. However, because the pattern of
association has changed, the new haplotypes 6 and 7 are
misclassified as haplotype patterns 6 and 7 in population B.
Goldstein, Nat Rev Genet 2003 (www)
Erichsen & Chanock. Br J Cancer 2004 (www)
Associations with
Ancestral
Haplotypes
(Schork, 1998)
Dorak, 2002 (www)
Ayala, 1994 (www)
Palmer LJ. Webcast
(www)
Wacholder, 2002 (www)
Population Stratification
Marchini, 2004 (www)
Population Stratification
Cardon & Palmer, 2003 (www)
Multiple Comparisons & Spurious Associations
Diepstra, Lancet 2005 (www)
(www)
Family-based association study designs
* Haplotype Relative Risk (HRR) Method
(Falk & Rubinstein, 1987; Knapp, 1993)
* Affected Family-Based Controls (AFBAC) Method
(Thomson, 1995)
* Transmission Disequilibrium/Distortion Test (TDT)
(Spielman, 1993 & 1994; Ewens & Spielman, 1995)
Reviews:
(Thomson, 1995; Gauderman, 1999)
Parent-Case Trios in TDT/HRR
“Non-transmitted allele”  “control”
□ ○
□ ○
□ ○
BC
AB
AC
AB
CD
BD
●
●
■
BB
BC
AB
“transmitted allele“  “case”
□○
■
AC
BB
BC
- AN EXAMPLE OF TDT TRANSMISSION DISEQUILIBRIUM OF HLA-B62 TO
THE PATIENTS WITH CHILDHOOD AML
(Dorak et al, BSHI 2002)
Nontransmitted Allele
Transmitted
Allele
B62
Other
B62
x
12
Other
1
y
Out of 13 parents heterozygote for B62,
12 transmitted B62 to the affected child and 1 did not
Mc Nemar’s test results:
P = 0.006 (with continuity correction)
odds ratio = 12.0, 95% CI = 1.8 to 513
Multifactorial Etiology
ROCHE Genetic Education (www)
Models of gene–environment interactions
Hunter, 2005 (www)
Sample size requirement for gene-environment
interaction studies
Hunter, 2005 (www)
An example of a gene-environment interaction
In Alzheimer disease, the risk of cognitive decline as measured by TICS test is
particularly high in APOE4 carriers who have untreated hypertension
(APOE4+/HT+).
Hunter, 2005 (www)
Falconer's polygenic threshold model for dichotomous nonmendelian characters:
Liability to the condition is polygenic and normally distributed (upper curve). People whose liability
is above a certain threshold value are affected. Their sibs (lower curve) have a higher average
liability than the population mean and a greater proportion of them have liability exceeding the
threshold. Therefore the condition tends to run in families (Falconer DS, 1967).
M.Tevfik DORAK
http://www.dorak.info