Transcript Document

2: Population genetics
Population genetics:
Introduction and motivation
Reference.
The big questions of population genetics:
•What are the evolutionary forces shaping
diversity among individuals within the same
species?
The language of population genetics
Locus: a place on the chromosome where
an allele resides.
Allele: A part of DNA at a specific location
on the chromosome.
A locus is thus a template for the alleles. An
allele is an instantiation of a locus.
A diploid organism has two alleles at a
particular autosomal locus, one from its
mother and the other from its father.
Allele frequencies
Genotype and allele frequencies
Say we have 3 AA, 2 Aa and 1 aa. What is the
frequency of the allele A?
Answer:
there are 6 alleles of A in the 3 AA
there are 2 alleles of A in the 2 Aa
and there are 0 alleles of A in the 1 aa.
Altogether there are 8 alleles of A, out of 12 alleles in
total, thus the frequency of A is 8/12.
Assume there are 11 alleles, 6 ”A” and 5 ”a”.
The frequency of the “A” allele is 6/11 = 0.55.
0.55 is the observed frequency of the “A” in the
sample. 0.55 is also the estimate of the allele
frequency in the entire population.
If n is the sample size, the 95% confidence
interval can be approximated by
pˆ (1  pˆ )
pˆ  1.96
n
Thus, the probability that the population
allele frequency falls within the interval
(0.26,0.84) is 0.95. When n increases, the
confidence interval becomes smaller.
2: Population genetics
General homozygosity
Genetic drift
The general homozygosity for k alleles is defined as
G
k
p
2
i
i 1
And the general heterozygosity is defined as H = 1-G.
G is the probability that if one samples two alleles at random from
the population with replacement, he obtains the same two allele.
Note that the definition of heterozygosity uses only allele
frequencies.
H 1
k
p
2
i
i 1
Thus, it can be used even for populations that are not in HW, or
even to non-diploid populations.
Population genetics: finite populations.
In real life all population are finite. For some populations
(bacteria), the assumption of infinite size is a good
approximation. For some, this is completely unrealistic.
HW assume that the population is infinite. When a
population is finite, random genetic drift can take place.
Random genetic drift is the random change of allele
frequencies. The source of the random changes is random
variation in the number of offspring between individuals
and for diploids.-sexual organisms – from Mendel’s law of
segregation.
Two mathematical approaches to studying genetic changes
in populations:
Deterministic models
HW
selection, mutation, migration
Stochastic models
Drift
Mutation
Some “green genes” randomly mutate to “brown
genes” (although since any particular mutation is rare,
this process alone cannot account for a big change in
allele frequency over one generation).
Migration
 Migration (or gene flow): Some beetles with
brown genes immigrated from another
population, or some beetles carrying green
genes emigrated
Selection
 Natural selection: Beetles with brown genes
escaped predation and survived to reproduce more
frequently than beetles with green genes, so that
more brown genes got into the next generation.
Genetic drift
 Genetic drift: When the beetles reproduced, just by
random, more brown genes than green genes ended up in
the offsprings. In the diagram on the right, brown genes
occur slightly more frequently in the offsprings (29%) than in
the parent generation (25%).
Drift
An important factor in producing changes in
allele frequencies is the random sampling of
gametes during reproduction.
Niche capacity = 10 plants
Random genetic drift has one effect:
Removal of variation by fixation.
If no mutations are introduced – after enough time we will
all be Cohen (or any other one allele).
If mutations are introduced, due to genetic drift, they have
a good chance of getting lost.
The probabilistic model for the random genetic drift is the
random walk…
We will see that the rate of removal is inversely
proportional to the population size. The effect of
random genetic drift is the biggest when small
populations are considered.
Drift has a small effect
Drift has a large effect
Mutations introduce variation. Random
genetic drift removes variation. What is the
equilibrium between these two?
Mutations
Variation
Drift
The neutral theory states that much of the
molecular variation in nature is due to
mutation and random genetic drift.
Selection has a very minor role in shaping
allele variability.
The neutral theory was tremendously controversial.
Partly because it was difficult to test, and mostly,
because it seemed an outrageous claim that most of
evolution is due to genetic drift rather than natural
selection, as Darwin proposed.
2: Population genetics
Modeling genetic drift
A computer simulation approach to study genetic drift
(Simulating the Wright-Fisher model):
Let N be the number of diploid individuals (Say we have
N=20 individuals, with 40 alleles.)
Let the two alleles be A and a where the frequency of A is
p. (Say p=0.2, so we start with 8 “A” alleles and 32 “a”
alleles).
To go to the next generation, we randomly chose 2N
alleles from the previous generation with replacements (the
same allele can be chosen more than once).
We repeat this process for many generations.
Assumptions
•All the individuals in the population have the same fitness
(selection does not operate).
•The generations are nonoverlapping.
•Adult population size is finite and does not change from
generation to generation.
•Gamete population size is infinite.
•The population is diploid (N individuals, 2N alleles).
•One locus with two alleles, A1 and A2, with frequencies p and q
= 1 – p, respectively.
A1 in proportion p
A2 in proportion q
n adults
Random mating
A1A1 (p2) A1A2 (2pq) A2A2 (q2)
But only n survive to the next adult stage
A1 in proportion p’
A2 in proportion q’
Magnitude of fluctuations depends on
population size.
Large population = Small fluctuations.
Small population = Large fluctuations.
Mean time to fixation or loss depends on
population size.
Large population = Long time.
Small population = Short time.
Ex. What is the probability that a particular allele gets a
copy into the next generation? Assume N is the population
size of diploids.
Solution:
1 2 N N 
1
p  1  (1 
) 1 
2N
e
Ex. Consider a single hermaphroditic diploid individual
that is heterozygous, with genotype Aa. Say this
population mates randomly (yet, it is a strange notion for a
single individual, and yet). What is the change if the
population is fixed after 1 generation? After 2 generations?
After n? (assuming the size of the population remains the
same, i.e., 1). On average, how much time will it take for
the population to become fixed?
Solution: After 1 generation there is a 0.25 chance of AA,
0.25 chance of aa, and 0.5 chance of Aa. Thus, the
probability of fixation is 0.5. After n generations it will be
1/2n. This is a geometric distribution. The expectation of
which is, in this case 2.
This exercise shows that the time till fixation is a random
variable, and has a specific distribution.
The general homozygosity for k alleles is defined as
G
k
p
2
i
i 1
H=1-G.
Our goal will be to formulate G(t) and H(t)=1-G(t) where
t is the number of generations. This function should
depend on N – the population size.
G(t) should increase with t, and H(t) should decrease…
The general homozygosity for k alleles is defined as
G
k
p
2
i
i 1
Let G’ be the probability that two alleles drawn at random
from the population without replacements are identical by
state. G’ also measure homozygosity, but is not the same as
G.
G’ is an approximation to G, in the following sense.
G is the probability that sampling two gametes with
replacements results in gametes with identical states.
G’ is the same but without the replacement.
G can be computed from G’, because G (with
replacements) can be decomposed into 2 events. One: the
second draw was the same allele exactly (probability
1/2N), or the second draw was from a different allele
(probability 1-1/2N). In the first event, the two allele are
the same with probability 1. In the second event, the two
allele are the same with probability G’. Thus,
1
1
G
 (1 
)G'
2N
2N
For big N, these values are almost the same.
1
1
G
 (1 
)G'
2N
2N
From the math point of view, it is easier to work with G’
A recursion formula for G’ (without replacements).
Let G’(t) be G’ in generation t.
G’(t+1) = chance that if we draw two alleles without
replacement they will be the same by state. Take two
alleles in generation t+1. They both existed already in
generation t. But, there are two possibilities. Either it was
the same allele in generation t (i.e., it was sampled more
than twice in the change from t to t+1), or not (i.e., either
the two alleles are identical by descent or not). The
probability that it was the same allele is 1/2N. In such case,
with probability 1 they have the same state. If they are
different by descent, the probability that they have the
same state is G’(t). Putting it all together, we get
1
1
G' (t  1) 
 (1 
)G' (t )
2N
2N
1
1
G' (t  1) 
 (1 
)G' (t )
2N
2N
Exercise: show that when t approximates infinity, G’(t)
approximates 1.
Answer: Let G  limt  G' (t )
than G  limt  G' (t  1)
Taking lim from both side of the equation above we obtain:
limt  G ' (t  1)  limt  [
1
1
 (1 
)G ' (t )]
2N
2N
1
1
 (1 
)G
2N
2N
1
1
0

G
2N 2N
G 1
G
Conclusion: The homozygosity approaches 1 after many generations
Solving the equation for G’
1
1
G' (t  1) 
 (1 
)G' (t )
2N
2N
Define H’(t) to be 1-G’(t) (H’ is similar to heterozigosity,
but without replacements). We get:
1
1
1  G ' (t  1)  1  [
 (1 
)G ' (t )] 
2N
2N
1
1
1
(1 
)  (1 
)G ' (t )  (1 
)(1  G ' (t ))
2N
2N
2N
1
H ' (t  1)  (1 
) H ' (t )
2N
Solving the equation for G’
1
H ' (t  1)  (1 
) H ' (t )
2N
This is a geometric series:
1 t
H ' (t )  (1 
) H ' (0)
2N
The conclusion is that the heterozigosity is decreasing in an
exponential rate, that depends on the population size. Since we are
talking about discrete organisms, H(t) will eventually, become 0.
Half life (not the computer game)
H ' (0)
H ' (t ) 
2
The half time is the t that solves the above equation. We indicate this
t by the symbol:
t
1
2
The conclusion is that the heterozigosity is decreasing in an
exponential rate, that depends on the population size. Since we are
talking about discrete organisms, H(t) will eventually, become 0.
Solving:
1 t 12 H ' (0)
H ' (0)(1 
) 
2N
2
1
t 1 ln(1 
)   ln(2)
2
2N
 ln(2)
t1 
1
2
ln(1 
)
2N
Half life (not the computer game)
Taylor series of ln(1+x):
x 2 x3 x 4
ln(1  x)  x 

 ...
2
3
4
Hence, for very small x we can approximate ln(1+x) by x.
t1
2
 ln(2)
 ln(2)


 2 N ln(2)
1
1
ln(1 
) 
2N
2N
Half life (not the computer game)
In other words: for big enough populations the
time it takes for genetic drift to reduce H’ by onehalf is proportional to the population size.
t 1  2 N ln(2)
2
Example: for a population of one million it takes
1.38 millions generations to reduce H’ by one half.
If each generation time is 20 years, it take 28
millions years to reduce the genetic variation by
half.
Fixation probability
Say we have 2N different alleles. Eventually, one
of these will be fixed. The probability that it will
be allele i is 1/2N. If m alleles are the same, the
probability that one of them will be fixed is m/2N.
If the initial frequency is p – the fixation
probability would also be p.
HW, Drift, etc’…
Random mating is a force with time scales of 1-2
generations (HW). Genetic drift is of time scales of
2N generations. In a short term – random mating
will change genotype frequencies much more than
drift.
How important is genetic drift on large population
is still debated.
2: Population genetics
Modeling mutations
As before, to go to the next generation, we chose 2N alleles from the
previous generation but this time - without replacements
In each generation there is a probability u for any allele to mutate. We
assume a mutation always result in a new allele that was never found
in the population.
This model is called the infinite-allele model.
u is the mutation probability, but is sometimes also called the mutation
rate.
A recursion formula for H’ (without replacements).
Let H’(t) be H’ in generation t (H’ = 1-G’).
H’(t+1) = chance that if we draw two alleles without replacement they
will be different by state.
We neglect the chance that they were sampled from the same
individual (neglect drift).
Take two alleles in generation t+1. They both existed already in
generation t. But, there are two possibilities. Either they already differ by
states in generation t (probability H’(t)), or not (probability 1-H’(t)). If
they were not the same by state in generation t, there is a possibility that
they still differ by state, due to mutation. The chance for this is 1-(1-u)2.
H ' (t  1)  H ' (t )  (1  H ' (t )(1  (1  u)2 )
Solving the equation for G’
H '(t  1)  H '(t )  (1  H '(t ))(1  (1  u) )
2
Neglecting u2
H ' (t  1)  H ' (t )  (1  H ' (t ))(2u )
u H  (1  H ' (t ))(2u )
1  H ' (t  1)  1  H ' (t )  (1  H ' (t ))(2u )
G ' (t  1)  G ' (t )(1  2u ) (geometric series)
G ' (t )  G ' (0)(1  2u )t
Solving the equation for G’
G' (t )  G' (0)(1  2u )t
The conclusion is that the homozygosity is
decreasing in an exponential rate, that depends
on the mutation rate. Since we are talking
about discrete organisms, G(t) will eventually,
become 0.
Half life
G ' (0)
G ' (t ) 
2
The half time is the t that solves the above equation. We indicate this
t1
t by the symbol
2
t1
G ' (0)
G' (0)(1  2u ) 
2
t 1 ln(1  2u )   ln(2)
2
2
t1
2
 ln(2)
 ln(2) ln(2)



ln(1  2u )
 2u
2u
Half life (not the computer game)
t1
2
1 ln( 2)

u 2
The time scale of mutation is proportional to 1/u. If u=10-5, it takes
100,000 generations for mutation to reduce the homozygosity by a
factor of 2.
2: Population genetics
Modeling genetic drift + mutation
If genetic drift removes variation, why
does genetic variation exist?
Mutations introduce new variation into the
population.
What is the relationship between drift and
mutation?
A model for mutation.
1
1
G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
G’ is the probability of getting two identical
alleles when drawing without replacements.
After one generation, the computation is as
before, but there is a chance u that any of the
alleles would change. Hence 1-u is the
probability of the complement event.
We are interested in the equilibrium
between mutation and drift.
1
1
G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
When t approaches infinity G’(t) approaches
a constant between zero and 1. We want to
compute this constant the probability that
two alleles different by origin are identical
by state after equilibrium is reached.
1
1
G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
Let G  limt  G' (t )  limt  G' (t  1)
1
1
G  (1  u ) [
 (1 
)G ]
2N
2N
2
(1  u )
1
G
 (1 
)(1  u ) 2 G
2N
2N
2
1
(
1

u
)
G[1  (1 
)(1  u ) 2 ] 
2N
2N
(1  u ) 2
2N
G
1
[1  (1 
)(1  u ) 2 ]
2N
2
Simplifying G
(1  u ) 2
2N
G
1
2
[1  (1 
)(1  u ) ]
2N
(1  u ) 2
1


2N 1
2N
[1  (
)(1  u ) 2 ]
2N
(1  u ) 2
2N


2
2N
[2 N  (2 N  1)(1  u ) ]
1  2u  u
2 N  (2 N  1)(1  2u  u 2 )
2
Simplifying G
1  2u  u 2

2
2 N  (2 N  1)(1  2u  u )
1  2u  u


2
2
2 N  2 N (1  2u  u )  (1  2u  u )
2
1  2u  u 2

4 Nu  2 Nu 2  1  2u  u 2
An approximate solution for G
1  2u  u 2
G
4 Nu  2 Nu 2  1  2u  u 2
Assumptions:
•N is much bigger than u
•u is smaller than 1
•u2N is also very small
1
G
1  4 Nu
A classic formula
1
G' 
1  4 Nu
Intermediate summary
Drift only:
1
1
G '(t  1)  [
 (1 
)G '(t )]
2N
2N
G ' 1
t 1  2 N ln 2
2
Mutation only: H '(t  1)  H '(t )  (1  H '(t ))(2u  u )
2
G' 0
Both:
ln 2
t1 
2u
2
1
1
G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
1
G'
1  4 Nu
Computing delta H
Drift only:
1
1
G '(t  1)  [
 (1 
)G '(t )]
2N
2N
1
1
H '(t  1)  1  [
 (1 
)G '(t )]
2N
2N
1
H '(t  1)  (1 
)(1  G '(t ))
2N
1
H '(t  1)  (1 
)( H '(t ))
2N
H '(t )
H '(t  1)  H '(t ) 
2N
H '(t )
H '  
2N
H '(t )
H '  
2N
Computing delta H
Mutation only:
H '(t  1)  H '(t )  (1  H '(t ))(2u  u )
2
H '  (1  H '(t ))(2u)
H '  [1  H '(t )]2u
Computing delta H
1
1
Mutation + Drift: G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
(1  u )2
1
H '(t  1)  1 
 (1  )(1  u ) 2 (1  H '(t ))
2N
2N
(1  u )2
1
1
2
H '(t  1)  1 
 (1  )(1  u )  (1  )(1  u ) 2 H '(t )
2N
2N
2N
2
(1  u )2
(1

u
)
1
2
H '(t  1)  1 
 (1  u ) 
 (1  )(1  u ) 2 H '(t )
2N
2N
2N
1
2
2
H '(t  1)  2u  u  (1  )(1  u ) H '(t )
2N
Computing delta H
1
1
Mutation + Drift: G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
2
1
H '(t  1)  2u  u  (1  u ) H '(t )  (1  u ) 2 H '(t )
2N
1
2
2
H '(t  1)  2u  u  H '(t )  (2u  u ) H '(t )  (1  u ) 2 H '(t )
2N
1
2
2
H '  (2u  u )  (2u  u ) H '(t )  (1  u )2 H '(t )
2N
1
2
H '  (2u  u )(1  H '(t ))  (1  u )2 H '(t )
2N
H '(t )
H '  2u (1  H '(t )) 
H '(t )
2N
H '  2u (1  H '(t )) 
2N
2
2
Summary with delta H’
1
1
G '(t  1)  [
 (1 
)G '(t )]
2N
2N
H '(t )
t

2
N
ln
2
H '  
1
G ' 1
2N
2
2
Mutation only: H '(t  1)  H '(t )  (1  H '(t ))(2u  u )
Drift only:
ln 2
t1 
2u
2
G' 0
Both:
H '  2u(1  H '(t ))
1
1
G' (t  1)  (1  u) [
 (1 
)G' (t )]
2N
2N
1
G'
1  4 Nu
2
H '(t )
H '  
 2u (1  H '(t ))
2N
Delta H for the model of drift+mutation
H ' (t )
  H ' (t  1)  H ' (t )  
 2u(1  H ' (t ))
2N
Always negative Always positive
   N   u 
Equilibrium is reached when
  0
  N   u 