Transcript Intro to Mx Scripts - Virginia Commonwealth University
Introduction to Genetic Epidemiology
HGEN619, 2006 Hermine H. Maes
Genetic Epidemiology
Establishing / Quantifying the role of genes and environment in variation in disease and complex traits ~ Answering questions about the importance of nature and nurture on individual differences Finding those genes and environmental factors
Genes & Environment
How much of the variation in a trait is accounted for by genetic factors?
Do shared environmental factors contribute significantly to the trait variation?
The first of these questions addresses heritability, defined as the proportion of the total variance explained by genetic factors
Nature-nurture question
Sir Francis Galton: comparing the similarity of identical and fraternal twins yields information about the relative importance of heredity vs environment on individual differences Gregor Mendel: classical experiments demonstrated that the inheritance of model traits in carefully bred material agreed with a simple theory of particulate inheritance Ronald Fisher: first coherent account of how the ‘correlations between relatives’ explained ‘on the supposition of Mendelian inheritance’
People and Ideas Galton (1865-ish) Correlation Family Resemblance Twins Ancestral Heredity Mendel (1865) Particulate Inheritance Genes: single in gamete double in zygote Segregation ratios Darwin (1858,1871) Natural Selection Sexual Selection Evolution Fisher (1918) Correlation & Mendel Maximum Likelihood ANOVA: partition of variance Mather (1949) & Jinks (1971) Biometrical Genetics Model Fitting (plants) Thurstone Multiple Factor Analysis Jinks & Fulker (1970) Model Fitting applied to humans Spearman Common Factor Analysis Wright (1921) Path Analysis (1930's) (1904) Watson & Crick (1953) Joreskog (1960) Covariance Structure Analysis LISREL Morton (1974) Path Analysis & Family Resemblance Population Genetics Elston etc (19..) Segregation Linkage Martin & Eaves (1977) Genetic Analysis of Covariance Structure Rao, Rice, Reich, Cloninger (1970's) Assortment Cultural Inheritance
2000
Neale (1990) Mx Molecular Genetics
Biometrical Model
aa Aa
h m
AA
-d d
To make the simple two-allele model concrete, let us imagine that we are talking about genes that influence adult stature. Les us assume that the normal range of height for males is from 4 feet 10 inches to 6 feet 8 inches; that is, about 22 inches. And let us assume that each somatic chromosome has one gene of roughly equivalent effect. Then, roughly speaking, we are thinking in terms of loci for which the homozygotes contribute +- 1/2 inch (from the midpoint), depending on whether they are
AA
, the increasing homozygote, or
aa
, the decreasing homozygote. In reality, although some loci may contribute greater effects than this, others will almost certaily contribute less; thus we are talking about the kind of model in which any particular polygene is having an effect that would be difficult to detect by the methods of classical genetics.
in Biometrical Genetics chapter in Methodology for Genetic Studies of Twins and Families
1 0 3 2
Polygenic Traits
1 Gene 3 Genotypes 3 Phenotypes 2 Genes 9 Genotypes 5 Phenotypes 3 Genes 27 Genotypes 7 Phenotypes 4 Genes 81 Genotypes 9 Phenotypes
1 0 3 2 2 1 0 7 6 5 4 3 20 15 10 5 0
Stature in adolescent twins
700 Women 600 500 400 300 200 100 0 145.0
150.0
155.0
160.0
165.0
170.0
175.0
180.0
185.0
190.0
Std. Dev = 6.40 Mean = 169.1
N = 1785.00
Stature
Individual differences
Physical attributes (height, eye color) Disease susceptibility (asthma, anxiety) Behavior (intelligence, personality) Life outcomes (income, children)
Polygenic Model
Polygenic model: variation for a trait caused by a large number of individual genes, each inherited in a strict conformity to Mendel’s laws Multifactorial model: many genes and many environmental factors also of small and equal effect Effects of many small factors combined > normal (Gaussian) distribution of trait values, according to the central limit theorem.
Central Limit Theorem
The normal distribution is to be expected whenever variation is produced by the addition of a large number of effects, non-predominant This holds quite often Quantitative traits
Continuous or Categorical ?
Body Mass Index vs “obesity” Blood pressure vs “hypertensive” Bone Mineral Density vs “fracture” Bronchial reactivity vs “asthma” Neuroticism vs “anxious/depressed” Reading ability vs “dyslexic” Aggressive behavior vs “delinquent”
Multifactorial Threshold Model of Disease
Single threshold Multiple thresholds unaffected Disease liability affected normal mild mod Disease liability severe
Genetically Complex Diseases
Imprecise phenotype Phenocopies / sporadic cases Low penetrance Locus heterogeneity/ polygenic effects
Complex Trait Model
Marker Linkage Association Linkage Linkage disequilibrium Gene 1 Mode of inheritance Gene 2 Disease Phenotype Individual environment Gene 3 Common environment Polygenic background
Causes of Variation
pre-1990 estimation of ‘anonymous’ genetic and environmental components of phenotypic variation genetic epidemiologic studies post-1990 identification of QTL’s: quantitative trait loci contributing to genetic variation of complex (quantitative) traits linkage and association studies
Stages of Genetic Mapping
Are there genes influencing this trait?
Genetic epidemiological studies Where are those genes?
Linkage analysis What are those genes?
Association analysis
Partitioning Variation
phenotypic variance (VP) partitioned in genetic (VG) and environmental (VE) VP = VG + VE Assumptions: additivity & independence of genetic and environmental effects heritability (h 2 ): proportion of variance due to genetic influences (h 2 = VG /VP) property of a group (not an individual), thus specific to a group in place & time
Sources of Variance
Genetic factors: Additive (A) Dominance (D) Environmental factors: Common / Shared (C) Specific / Unique (E) Measurement Error, confounded with E
Genetic Factors
Additive genetic factors (A): sum of all the effects of individual loci Non-additive genetic factors: result of interactions between alleles at the same locus (dominance, D) or between alleles on different loci (epistasis)
Environmental Factors
Shared [common or between-family] environmental factors (C): aspects of the environment shared by members of same family or people who live together, and contribute to similarity between relatives Non-shared [specific, unique or within-family] environmental factors (E): unique to an individual, contribute to variation within family members, but not to their covariation
Estimating Components
Estimate phenotypic variance components from data on covariances of related individuals Different types of relative pairs share different amounts of phenotypic variance Biometrical genetics theory: specify amounts in terms of genetic and environmental variances Three major types of study: family, adoption and twin
Designs to disentangle G+E
Resemblance between relatives caused by: Shared Genes (G = A + D) Environment Common to family members (C) Differences between relatives caused by: Non-shared Genes Unique environment (E)
Informative Designs
Family studies – G + C confounded MZ twins alone – G + C confounded MZ twins reared apart – rare, atypical, selective placement ?
Adoption studies – increasingly rare, atypical, selective placement ?
MZ and DZ twins reared together Extended twin design
Classical Twin Study
MZ and DZ twins reared together MZ twins genetically identical DZ twins share on average half their genes Equal Environments Assumption MZ and DZ twins share relevant environmental influences to same extent
Zygosity
MZ DZ DZ
Identity at marker loci - except for rare mutation MZ and DZ twins: determining zygosity using ABI Profiler™ genotyping (9 STR markers + sex)
MZ & DZ Correlations
rMZ > rDZ: G (heritability) C: increase rMZ & rDZ Relative magnitude of the MZ and DZ correlations > contribution of additive genetic (G) and shared environmental (C) factors 1-rMZ: importance of specific environmental (E) factors
Twin Correlations
A 1.0
E C .5
* * .8
.4
.8
.6
DZ MZ * * .8
.7
* * DZ * MZ * DZ MZ DZ MZ
Example
thus if, VP = VA + VC + VE = 2.0
CovMZ = VA + VC = 1.6
CovDZ = 1/2VA + VC = 1.2
then, by algebra, VA = 0.8, VC = 0.8, VE = 0.4
but it isn’t always so simple, consider VP = 1.0, CovMZ = 0.6; CovDZ = 0.65 then VA = -0.1, VC = 0.7, VE = 0.4
nonsensical negative variance component
Observed Statistics
Trait variance & MZ and DZ covariance as unique observed statistics Estimate the contributions of additive genes (A), shared (C ) and specific (E) environmental factors, according to the genetic model Useful tool to generate the expectations for the variances and covariances under a model is path analysis
Path Analysis
Allows us to diagrammatically represent linear models for the relationships between variables Easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model Permits translation into matrix formulation
Variance Components P = eE + aA + cC + dD
Unique Environment Shared Environment Additive Genetic Dominance Genetic
E C A D
c a e d
Phenotype
ACE Model Path Diagram for MZ & DZ Twins
1 MZ=1.0 / DZ=0.5
E C A A C E e c a a c e
P T1 P T2
Model Fitting
Evaluate significance of variance components effect size & sample size Evaluate goodness-of-fit of model - closeness of observed & expected values Compare fit under alternative models Obtain maximum likelihood estimates
Mx
Structural equation modeling package Software: www.vcu.edu/mx Manual: Neale et al. 2006 Free
Structural equation modeling
Both continuous and categorical variables Systematic approach to hypothesis testing Tests of significance Can be extended to: More complex questions Multiple variables Other relatives
SEM: more complex questions I
Are the same genes acting in males and females? (sex limitation) Role of age on (a) mean (b) variance (c) variance components Are G & E equally important in age, country cohorts? (heterogeneity) Are G & E same in other strata (e.g. married/unmarried)? ( G x E interaction)
SEM: more complex questions II
Do the same genes account for variation in multiple phenotypes? (multivariate analysis) Do the same genes account for variation in phenotypes measured at different ages? (longitudinal analysis) Do specific genes account for variation/covariation in phenotypes? (linkage/association)
Linkage & Association Analysis
Stages of Genetic Mapping
Are there genes influencing this trait?
Epidemiological studies Where are those genes?
Linkage analysis What are those genes?
Association analysis
Linkage Analysis
Sharing between relatives Identifies large regions Include several candidates Complex disease Scans on sets of small families popular No strong assumptions about disease alleles Low power Limited resolution
Linkage Scan
Stages of Genetic Mapping
Are there genes influencing this trait?
Epidemiological studies Where are those genes?
Linkage analysis What are those genes?
Association analysis
Association Analysis
Sharing between
unrelated
individuals Trait alleles originate in common ancestor High resolution Recombination since common ancestor Large number of independent tests Powerful if assumptions are met Same disease haplotype shared by many patients Sensitive to population structure
Association Scan
Proof of Concept: Genes/Regions
Genome Scan
Breast cancer Lung cancer Melanoma Type 2 diabetes HDL-C plasma level Osteoarthritis Schizophrenia
Gene 1
DLC-1 CD44 B-RAF PPAR CETP AGC1 DDC
Gene 2
Chr 8q Chr 22q PPP1R3A LPL
Gene 3
Chr 13q FOXA2
Gene 4
Chr 1q
First (unequivocal) positional cloning of a complex disease QTL !
Number of genes identified from QTL by year
From QTL to gene: the harvest begins: RKorstanje & B Paigen :
Nature Genetics
31, 235 – 236 (2002)