Intro to Mx Scripts - Virginia Commonwealth University

Download Report

Transcript Intro to Mx Scripts - Virginia Commonwealth University

Introduction to Genetic Epidemiology

HGEN619, 2006 Hermine H. Maes

Genetic Epidemiology

 Establishing / Quantifying the role of genes and environment in variation in disease and complex traits ~ Answering questions about the importance of nature and nurture on individual differences  Finding those genes and environmental factors

Genes & Environment

 How much of the variation in a trait is accounted for by genetic factors?

 Do shared environmental factors contribute significantly to the trait variation?

 The first of these questions addresses heritability, defined as the proportion of the total variance explained by genetic factors

Nature-nurture question

 Sir Francis Galton: comparing the similarity of identical and fraternal twins yields information about the relative importance of heredity vs environment on individual differences  Gregor Mendel: classical experiments demonstrated that the inheritance of model traits in carefully bred material agreed with a simple theory of particulate inheritance  Ronald Fisher: first coherent account of how the ‘correlations between relatives’ explained ‘on the supposition of Mendelian inheritance’

People and Ideas Galton (1865-ish) Correlation Family Resemblance Twins Ancestral Heredity Mendel (1865) Particulate Inheritance Genes: single in gamete double in zygote Segregation ratios Darwin (1858,1871) Natural Selection Sexual Selection Evolution Fisher (1918) Correlation & Mendel Maximum Likelihood ANOVA: partition of variance Mather (1949) & Jinks (1971) Biometrical Genetics Model Fitting (plants) Thurstone Multiple Factor Analysis Jinks & Fulker (1970) Model Fitting applied to humans Spearman Common Factor Analysis Wright (1921) Path Analysis (1930's) (1904) Watson & Crick (1953) Joreskog (1960) Covariance Structure Analysis LISREL Morton (1974) Path Analysis & Family Resemblance Population Genetics Elston etc (19..) Segregation Linkage Martin & Eaves (1977) Genetic Analysis of Covariance Structure Rao, Rice, Reich, Cloninger (1970's) Assortment Cultural Inheritance

2000

Neale (1990) Mx Molecular Genetics

Biometrical Model

aa Aa

h m

AA

-d d

To make the simple two-allele model concrete, let us imagine that we are talking about genes that influence adult stature. Les us assume that the normal range of height for males is from 4 feet 10 inches to 6 feet 8 inches; that is, about 22 inches. And let us assume that each somatic chromosome has one gene of roughly equivalent effect. Then, roughly speaking, we are thinking in terms of loci for which the homozygotes contribute +- 1/2 inch (from the midpoint), depending on whether they are

AA

, the increasing homozygote, or

aa

, the decreasing homozygote. In reality, although some loci may contribute greater effects than this, others will almost certaily contribute less; thus we are talking about the kind of model in which any particular polygene is having an effect that would be difficult to detect by the methods of classical genetics.

in Biometrical Genetics chapter in Methodology for Genetic Studies of Twins and Families

1 0 3 2

Polygenic Traits

1 Gene  3 Genotypes  3 Phenotypes 2 Genes  9 Genotypes  5 Phenotypes 3 Genes  27 Genotypes  7 Phenotypes 4 Genes  81 Genotypes  9 Phenotypes

1 0 3 2 2 1 0 7 6 5 4 3 20 15 10 5 0

Stature in adolescent twins

700 Women 600 500 400 300 200 100 0 145.0

150.0

155.0

160.0

165.0

170.0

175.0

180.0

185.0

190.0

Std. Dev = 6.40 Mean = 169.1

N = 1785.00

Stature

Individual differences

 Physical attributes (height, eye color)  Disease susceptibility (asthma, anxiety)  Behavior (intelligence, personality)  Life outcomes (income, children)

Polygenic Model

   Polygenic model: variation for a trait caused by a large number of individual genes, each inherited in a strict conformity to Mendel’s laws Multifactorial model: many genes and many environmental factors also of small and equal effect Effects of many small factors combined > normal (Gaussian) distribution of trait values, according to the central limit theorem.

Central Limit Theorem

 The normal distribution is to be expected whenever variation is produced by the addition of a large number of effects, non-predominant  This holds quite often  Quantitative traits

Continuous or Categorical ?

       Body Mass Index vs “obesity” Blood pressure vs “hypertensive” Bone Mineral Density vs “fracture” Bronchial reactivity vs “asthma” Neuroticism vs “anxious/depressed” Reading ability vs “dyslexic” Aggressive behavior vs “delinquent”

Multifactorial Threshold Model of Disease

Single threshold Multiple thresholds unaffected Disease liability affected normal mild mod Disease liability severe

Genetically Complex Diseases

 Imprecise phenotype  Phenocopies / sporadic cases  Low penetrance  Locus heterogeneity/ polygenic effects

Complex Trait Model

Marker Linkage Association Linkage Linkage disequilibrium Gene 1 Mode of inheritance Gene 2 Disease Phenotype Individual environment Gene 3 Common environment Polygenic background

Causes of Variation

 pre-1990  estimation of ‘anonymous’ genetic and environmental components of phenotypic variation  genetic epidemiologic studies  post-1990  identification of QTL’s: quantitative trait loci contributing to genetic variation of complex (quantitative) traits  linkage and association studies

Stages of Genetic Mapping

 Are there genes influencing this trait?

 Genetic epidemiological studies  Where are those genes?

 Linkage analysis  What are those genes?

 Association analysis

Partitioning Variation

 phenotypic variance (VP) partitioned in genetic (VG) and environmental (VE)  VP = VG + VE  Assumptions: additivity & independence of genetic and environmental effects  heritability (h 2 ): proportion of variance due to genetic influences (h 2 = VG /VP)  property of a group (not an individual), thus specific to a group in place & time

Sources of Variance

 Genetic factors:  Additive (A)  Dominance (D)  Environmental factors:  Common / Shared (C)  Specific / Unique (E)  Measurement Error, confounded with E

Genetic Factors

 Additive genetic factors (A): sum of all the effects of individual loci  Non-additive genetic factors: result of interactions between alleles at the same locus (dominance, D) or between alleles on different loci (epistasis)

Environmental Factors

 Shared [common or between-family] environmental factors (C): aspects of the environment shared by members of same family or people who live together, and contribute to similarity between relatives  Non-shared [specific, unique or within-family] environmental factors (E): unique to an individual, contribute to variation within family members, but not to their covariation

Estimating Components

 Estimate phenotypic variance components from data on covariances of related individuals  Different types of relative pairs share different amounts of phenotypic variance  Biometrical genetics theory: specify amounts in terms of genetic and environmental variances  Three major types of study: family, adoption and twin

Designs to disentangle G+E

 Resemblance between relatives caused by:  Shared Genes (G = A + D)  Environment Common to family members (C)  Differences between relatives caused by:  Non-shared Genes  Unique environment (E)

Informative Designs

 Family studies – G + C confounded  MZ twins alone – G + C confounded  MZ twins reared apart – rare, atypical, selective placement ?

 Adoption studies – increasingly rare, atypical, selective placement ?

 MZ and DZ twins reared together  Extended twin design

Classical Twin Study

 MZ and DZ twins reared together  MZ twins genetically identical  DZ twins share on average half their genes  Equal Environments Assumption  MZ and DZ twins share relevant environmental influences to same extent

Zygosity

MZ DZ DZ

Identity at marker loci - except for rare mutation MZ and DZ twins: determining zygosity using ABI Profiler™ genotyping (9 STR markers + sex)

MZ & DZ Correlations

 rMZ > rDZ: G (heritability)  C: increase rMZ & rDZ  Relative magnitude of the MZ and DZ correlations > contribution of additive genetic (G) and shared environmental (C) factors  1-rMZ: importance of specific environmental (E) factors

Twin Correlations

A 1.0

E C .5

* * .8

.4

.8

.6

DZ MZ * * .8

.7

* * DZ * MZ * DZ MZ DZ MZ

Example

 thus if, VP = VA + VC + VE = 2.0

CovMZ = VA + VC = 1.6

CovDZ = 1/2VA + VC = 1.2

 then, by algebra,  VA = 0.8, VC = 0.8, VE = 0.4

but it isn’t always so simple, consider VP = 1.0, CovMZ = 0.6; CovDZ = 0.65  then VA = -0.1, VC = 0.7, VE = 0.4

 nonsensical negative variance component

Observed Statistics

 Trait variance & MZ and DZ covariance as unique observed statistics  Estimate the contributions of additive genes (A), shared (C ) and specific (E) environmental factors, according to the genetic model  Useful tool to generate the expectations for the variances and covariances under a model is path analysis

Path Analysis

 Allows us to diagrammatically represent linear models for the relationships between variables  Easy to derive expectations for the variances and covariances of variables in terms of the parameters of the proposed linear model  Permits translation into matrix formulation

Variance Components P = eE + aA + cC + dD

Unique Environment Shared Environment Additive Genetic Dominance Genetic

E C A D

c a e d

Phenotype

ACE Model Path Diagram for MZ & DZ Twins

1 MZ=1.0 / DZ=0.5

E C A A C E e c a a c e

P T1 P T2

Model Fitting

 Evaluate significance of variance components effect size & sample size  Evaluate goodness-of-fit of model - closeness of observed & expected values  Compare fit under alternative models  Obtain maximum likelihood estimates

Mx

 Structural equation modeling package  Software: www.vcu.edu/mx  Manual: Neale et al. 2006  Free

Structural equation modeling

 Both continuous and categorical variables  Systematic approach to hypothesis testing  Tests of significance  Can be extended to:  More complex questions  Multiple variables  Other relatives

SEM: more complex questions I

 Are the same genes acting in males and females? (sex limitation)  Role of age on (a) mean (b) variance (c) variance components  Are G & E equally important in age, country cohorts? (heterogeneity)  Are G & E same in other strata (e.g. married/unmarried)? ( G x E interaction)

SEM: more complex questions II

 Do the same genes account for variation in multiple phenotypes? (multivariate analysis)  Do the same genes account for variation in phenotypes measured at different ages? (longitudinal analysis)  Do specific genes account for variation/covariation in phenotypes? (linkage/association)

Linkage & Association Analysis

Stages of Genetic Mapping

 Are there genes influencing this trait?

 Epidemiological studies  Where are those genes?

 Linkage analysis  What are those genes?

 Association analysis

Linkage Analysis

 Sharing between relatives  Identifies large regions  Include several candidates  Complex disease  Scans on sets of small families popular  No strong assumptions about disease alleles  Low power  Limited resolution

Linkage Scan

Stages of Genetic Mapping

 Are there genes influencing this trait?

 Epidemiological studies  Where are those genes?

 Linkage analysis  What are those genes?

 Association analysis

Association Analysis

 Sharing between

unrelated

individuals  Trait alleles originate in common ancestor  High resolution  Recombination since common ancestor  Large number of independent tests  Powerful if assumptions are met  Same disease haplotype shared by many patients  Sensitive to population structure

Association Scan

Proof of Concept: Genes/Regions

Genome Scan

Breast cancer Lung cancer Melanoma Type 2 diabetes HDL-C plasma level Osteoarthritis Schizophrenia

Gene 1

DLC-1 CD44 B-RAF PPAR  CETP AGC1 DDC

Gene 2

Chr 8q Chr 22q PPP1R3A LPL

Gene 3

Chr 13q FOXA2

Gene 4

Chr 1q

First (unequivocal) positional cloning of a complex disease QTL !

Number of genes identified from QTL by year

From QTL to gene: the harvest begins: RKorstanje & B Paigen :

Nature Genetics

31, 235 – 236 (2002)