מצגת של PowerPoint

Download Report

Transcript מצגת של PowerPoint

Evolutionary rates
Reference: Dan’s book chapter 4
Evolutionary rates - history
•The first to suggest using DNA and proteins to
investigate evolutionary history.
(They discussed molecular evolution before the
genetic code was established).
Linus Pauling (1901-1994)
•The only person ever to receive two
unshared Nobel Prizes—for
Chemistry (1954) and for Peace
(1962).
•His introductory textbook General
Chemistry, revised three times since
its first printing in 1947 and translated
into 13 languages, has been used by
generations of undergraduates.
Linus Pauling (1901-1994)
•Also wrote popular science books,
e.g., “How to Live Longer and Feel
Better”, and “Vitamin C and the
Common Cold”.
•Published over 1,000 articles and
books.
•Used to protest against nuclear
testing.
Linus Pauling (1901-1994)
•He received a Ph.D. in chemistry and
mathematical physics from California
Institute of Technology (Caltech) in
1925 (age 24).
Evolutionary rates
Rate is distance divided by time. Distance is number of
substitutions per site. Time is in years. The time must be
doubled, because the sequences evolved independently.
d
d
r
2T
Evolutionary rates
This formula is not accurate for closely related taxa, in
which polymorphism must be taken into account
(Takahata and Satta 1997).
d
r
2T
Mean Rate of Nucleotide Substitutions in
Mammalian Genomes
-9
~10
Substitutions/site/year
Evolution is a very slow process at the molecular
level (“Nothing happens…”)
Sequence alignments
Alignment is needed for phylogeny
and for molecular evolution. We will
assume that the alignment is given.
How to construct alignment is outside
the scope of this course.
Synonymous vs. nonsynonymous substitutions
For most proteins, it is observed that the
rate of synonymous substitutions (silent
substitutions) is much larger than the
nonsynonymous rate (amino-acid
modifying substitutions).
UUU -> UUC (both encode phenylalanine ): synonymous
UUU -> CUU (phenylalanine to leucine): nonsynonymous
A lot
A little
Synonymous vs. nonsynonymous substitutions
Synonymous vs. nonsynonymous substitutions
Empirical findings:
Important proteins evolve
slower than unimportant ones.
Insulin
Insulin
1953, Frederick Sanger
determines the amino-acid
sequence of insulin.
This is the FIRST protein
whose amino-acid sequence
was determined.
It demonstrated that insulin
is comprised of only
L-amino acids.
Insulin
Insulin was characterized to be composed of two
chains (A&B), linked together by S-S bonds.
21 AA
30 AA
Insulin
How is the 2 chain protein synthesized?
Donald Steiner (University of Chicago) gave
the answer.
He studied an islet-cell adenoma of the
pancreas, a rare human tumor producing large
amounts of insulin.
Adenoma
Adenoma is a benign tumor (not a malignant
tumor). Benign in English = harmless
Benign tumor: A tumor that does not recur
locally and does not spread to other parts of the
body.
Adenoma is from a glandular (i.e., from a gland)
origin.
Adenomas can grow from many organs including
the colon, adrenal, pituitary, thyroid.
Insulin
He sliced the pancreatic tumor and incubated
it with tritiated leucine and then analyzed it.
He found a new protein that was later proven
to be the biosynthetic precursor of insulin, the
proinsulin.
Insulin
Proinsulin has 30 residues that are absent
from insulin.
Insulin
There is even a former form of proinsulin,
called preproinsulin. It contains additional 19
AA at the N-terminus. This 19 AA
hydrophobic stretch directs the preproinsulin
to the ER.
Preproinsulin -> Proinsulin (ER membrane)
From the ER it moves on to the Golgi and
then to secretory granules.
Proinsulin -> Insulin (Granules)
Alignment of preproinsulin
Xenopus
Bos
MALWMQCLP-LVLVLLFSTPNTEALANQHL
MALWTRLRPLLALLALWPPPPARAFVNQHL
**** : * *.*: *:..* :. *:****
Xenopus
Bos
CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ
CGSHLVEALYLVCGERGFFYTPKARREVEG
***************:***** ** :*::*
Xenopus
Bos
AQVNGPQDNELDG-MQFQPQEYQKMKRGIV
PQVG---ALELAGGPGAGGLEGPPQKRGIV
.**.
** *
*
*****
Xenopus
Bos
EQCCHSTCSLFQLENYCN
EQCCASVCSLYQLENYCN
**** *.***:*******
Empirical findings:
Functional regions evolve
slower than nonfunctional
regions.
Rates of amino-acid replacements in
different proteins
Protein
Rate (number of
replacements per site per
10 9 years)
Fibrinopeptide
8.3
Insulin Cpeptide
Ribonuclease
Hemoglobin
2.4
Cytochrome c
Histone H4
0.3
0.01
2.1
1.0
Clotting – The end reaction
thrombin
fibrinogen
fibrin
Synonymous vs. nonsynonymous substitutions
Histone H4 between human and wheat: excess of
synonymous substitutions
Mean nonsynonymous rate
0.74  0.67 (10-9 substitutions per site per year)
Mean synonymous rate
3.51  1.01 (10-9 substitutions per site per year)
The coefficient of variation is an attribute of a distribution:
its standard deviation divided by its mean
Coefficient of variation of nonsynonymous rate
91%
Coefficient of variation of synonymous rate
29%
Transition vs. transversion rates
Ratio
1.5
4.4
1.1
Degeneracy class
4
2
0
Computing synonymous and nonsynonymous rates
Silent
and
non-silent…
Computing synonymous and nonsynonymous rates
3
3
Ka/Ks
Our goal is to be able to compare two (or later,
more) sequences and to compare the rate of
neutral evolution (determined by the
synonymous rate) with than of the nonsynonymous rate.
The lower the ratio of non-synonynous
substitutions to synonymous ones, the higher
the intensity of the purifying selection.
Computing synonymous and nonsynonymous rates
3
3
p-distance of
synonymous subs. =
3/6
p-distance of
nonsynonymous subs.
= 3/6
Problematic: p-distance does not correct for multiple
substitutions…
Solution: compute the JC correction to the p-distance.
Computing synonymous and nonsynonymous rates
Assume a protein without selection (evolving
neutrally).
CAA (Gln)
GAA (Glu)
TAA (Stop)
AAC (Asn)
ACA (Thr)
AAG (Lys)
AAA (Lys)
AGA (Arg)
AAT (Asn)
ATA (Ile)
The random chance of a synonymous substitution is much
smaller than the chance of a nonsynonymous one.
Computing synonymous and nonsynonymous rates
Assume a protein without selection (evolving
neutrally).
ACA (Thr)
CCA (Pro)
TCA (Ser)
GCC (Ala)
GAA (Glu)
GCG (Ala)
GCA (Ala)
GGA (Gly)
GCT (Ala)
This is also different for different codons.
GTA (Val)
Computing synonymous and nonsynonymous rates
So when one “observe” 6 times more nonsynonymous
substitutions than synonymous ones – does it indicate that
the protein is under purifying selection???
We must normalize for the potentials for silent vs. nonsilent mutations of the codons in question.
Nei & Gojobori (1986)
method
Masatoshi Nei
Takashi Gojobori
Counting synonymous sites
Consider a particular position in a
codon (j=1,2,3). Let fj be the fraction
of synonymous changes at this site.
In TTT (Phe), the first two positions are
nonsynonymous, because no synonymous changes
can occur in them, and the third position is 1/3
synonymous and 2/3 nonsynonymous because one
of the three possible changes is synonymous.
Counting synonymous sites
Let s be the number of synonymous sites for
each codon. s is in fact, the proportion, out of 3,
of synonymous substitutions, assuming equal
probability for each type of substitution.
3
s  fj
j 1
For this example,
s = 1/3.
Counting synonymous sites
Let n be the number of non-synonymous sites
for each codon. n is in fact, the proportion, out
of 3, of non-synonymous substitutions,
assuming equal probability for each type of
substitution.
n  3 s
For this example,
n = 2+2/3.
Counting synonymous sites
Assume we have r codons (3r sites). Let si and
ni be s and n for the i’th codon. We define:
r
S
s
i
i 1
r
N

ni
i 1
S  N  3r
Classification of sites
S is in fact, the proportion, out of 3r, of
synonymous substitutions, assuming equal
probability for each type of substitution.
Classification of sites
We have two sequences
ACG CCG ATT
ATG CCT CTA
S for these two sequences, will be the average S
of the 2 sequence. The same goes for N.
Counting synonymous substitutions
So far we have counted the potential for
synonymous and nonsynonymous substitutions.
Now we wish to count the actual number of
synonymous and nonsynonymous substitutions.
Counting synonymous substitutions
For two codons that differ by only one nucleotide,
the difference is easily inferred.
GTC (Val)  GTT (Val) synonymous
GTC (Val)  GCC (Ala)
nonsynonymous.
Counting synonymous substitutions
We define sd and nd to be the number of
synonymous and nonsynonymous substitutions per
codon.
GTC (Val)  GTT (Val) sd = 1, nd = 0
GTC (Val)  GCC (Ala)
sd = 0, nd = 1
Counting synonymous substitutions
For two codons that differ by two or more
nucleotides, the estimation problem is more
complicated, because we need to determine the
order in which the substitutions occurred.
Pathway (1) requires one synonymous and one
nonsynonymous substitutions, whereas pathway
(2) requires two nonsynonymous substitutions.
If there are 3 differences between two codons,
there are 6 possible paths.
ABC  XYZ
A changed first, B second and finally C.
A changed first, C second and finally B.
B changed first, A second and finally C.
B changed first, C second and finally A.
C changed first, A second and finally B.
C changed first, B second and finally A.
There are two approaches to deal with multiple
substitutions at a codon:
The unweighted method: Average the numbers of the different
types of substitutions for all the possible scenarios. For example, if
we assume that the two pathways are equally likely, then the number
of nonsynonymous substitutions is (1 + 2)/2 = 1.5, and the number
of synonymous substitutions is (1 + 0)/2 = 0.5.
The weighted method. Employ an a priori criteria to assign the
probability of each pathway. For instance, if the weight of pathway 1
is 0.9, and the weight for pathway 2 is 0.1, then the number of
nonsynonymous substitutions between the two codons is (0.9  1) +
(0.1  2) = 1.1, and the number of synonymous substitutions is 0.9.
Counting synonymous sites
Assume we have r codons (3r sites). Let sdi
and nd be sd and nd for the i’th codon. We
define:
r
i
Sd 

sd i
i 1
r
Nd 
n
di
i 1
S d  N d  Total number of “observed” substitutions
Counting synonymous substitutions per
synonymous sites
We define p-distances for each type of
substitution:
Sd
ps 
S
Nd
pn 
N
These distances, are than corrected using the JC
formula:
3
4
d s   ln(1  ps )
4
3
3
4
d n   ln(1  pn )
4
3
Three types of selection
If dn < ds  purifying selection
If dn = ds  neutral evolution
If dn > ds  positive selection
Humans are not
so special?
Generation time and genomic evolution in
primates
Vincent M. Sarich & Allan C. Wilson
Science vol 179: 1144-1147 (1973).
A primate
Some background on Primates
New world monkeys
(Platyrrhines)
Haplorhines
(Higher primates)
Gibbons
Hominidae
Catarrhines
Old world monkeys
Tarsiers
Prosimians
(Strepsirhines)
http://www.whozoo.org/mammals/Primates/primatephylogeny.htm
Some background on Primates
•Primates: 233 species
and 13 families
•The smallest living
primate is the pygmy
marmoset (NW
monkey), which
weighs around 70 g;
the largest is the
gorilla, weighing up to
around 175 kg.
http://animaldiversity.ummz.umich.edu/site/accounts/information/Primates.html
Some background on Primates
•Most primate species live in the tropics or
subtropics, although a few, most notably humans,
also inhabit temperate regions.
•Except for a few terrestrial species, primates are
arboreal. Some species eat leaves or fruit; others
are insectivorous or carnivorous.
Arbor = tree in Latin
Prosimians
Great apes
Hominidae is the primate family, which
includes the extant species of humans,
chimpanzees, gorillas, and orangutans, as well
as many extinct species.
The members of the family are called
hominids. The family is also called “great
apes”.
Great apes
Originally non-human great apes were called
Pongidae. However, this original definition
suggests that Pongidae is a monophyletic
family – which is not the case.
Many studies have showed a correlation
between time of divergence and amount of
evolutionary (molecular) distance:
Protein sequences of species that diverged
earlier, show more differences.
p-dist
time
There’s a big disagreement if time should be
measured in terms of astronomical time (i.e.,
years) or generation length.
The generation-time-hypothesis:
The number of substitutions is proportional
to the number of generations.
A (human)
O
B (tree shrew)
Prediction:
Short generation  More generations since
divergence  More substitutions (in B)
Absolute rates of evolution demand
knowledge of divergence dates (from the
fossil record).
However, relative rates of evolution can be
computed from the phylogeny…
This will be done using the “relative rate
method”.
Assume 3 taxa, A, B and C.
O
A (human)
B (tree shrew)
C (outgroup)
T1
T2
Assume 3 taxa, A, B and C.
O
A (human)
B (tree shrew)
C (outgroup)
BO > AO
BO+OC > AO+OC
BC > AC
Assume 3 taxa, A, B and C.
O
A (human)
B (tree shrew)
C (outgroup)
The generation time hypothesis predicts:
BO > AO
BO+OC > AO+OC
BC > AC
In words, the distance of species with short
generation time from an outgroup, should be higher
compared to species with longer generation time.
Assume 3 taxa, A, B and C.
O
A (human)
B (tree shrew)
C (outgroup)
They used (C) modern carnivore species as their
outgroup.
The authors compared immunological distances
between a few species and carnivore species.
The distance between Homo sapiens and each
one of 4 carnivore species was computed, and
they reported the average.
The 4 carnivore species are: Hyaena, Genetta,
Ursus, and Arctogalida.
Hyaena, Genetta, Ursus, and Arctogalida.
Genetta genetta (small-spotted genet)
Although catlike in appearance and
habit, the genet is not a cat but a
member of the family Viverridae.
Genets were kept as pets by the ancient
Egyptians as they are today by Berbers
in North Africa. From the Greek empire
to the Middle Ages, the genet was kept
as a rat catcher and was often portrayed
on tapestries of the period. The
domestic cat eventually replaced the
genet, probably because it is more
efficient in killing rats-and perhaps
because it is less smelly.
Results:
Immunological distances from carnivore species:
Homo sapiens
Macaca mulatta (rhesus monkey)
Ateles geoffroyi (spider monkey)
Nycticebus coucang (slow loris)
Lemur fulvus (brown lemur)
Tarsius spectrum (tarsier)
Tupaia glis (tree shrew)
162
166
149
125
135
137
156
Results:
Immunological distances from
carnivore species:
162
166
149
125
135
137
156
Prosimian
Homo sapiens
Macaca mulatta (rhesus monkey)
Ateles geoffroyi (spider monkey)
Nycticebus coucang (slow loris)
Lemur fulvus (brown lemur)
Tarsius spectrum (tarsier)
Tupaia glis (tree shrew)
India, Malaysia,
Sumatra, Java,
Borneo,
Philippines
Nycticebus coucang (slow loris)
Life span is 20 years (generation time < 20
years). Nocturnal and arboreal, they spend the
day sleeping in a tight ball up a tree.
These results are against the generation-time
hypothesis…
Prosimian
No correlation of distances with generation length,
for homo-prosimians
Homo sapiens
162
Macaca mulatta (rhesus monkey)
166
Ateles geoffroyi (spider monkey)
149
Nycticebus coucang (slow loris)
125
Lemur fulvus (brown lemur)
135
Tarsius spectrum (tarsier)
137
Tupaia glis (tree shrew)
156
Results:
Immunological distances from
carnivore species:
162
166
149
125
135
137
156
Scandentia
Homo sapiens
Macaca mulatta (rhesus monkey)
Ateles geoffroyi (spider monkey)
Nycticebus coucang (slow loris)
Lemur fulvus (brown lemur)
Tarsius spectrum (tarsier)
Tupaia glis (tree shrew)
Common tree shrew - TUPAIA GLIS
Order: Climbing Mammals (Scandentia)
Family: Tupaiidae.
Common tree shrew - TUPAIA GLIS
This small order of
tree shrews was at one
time placed in the
midst of controversy:
is it a primate (order
Primates) or an
insectivore (order
Insectivora).
For several years, different groups placed the tree shrews in
either one of these orders. Finally, in 1984 this issue was
resolved when they were placed in their own order, called
Scandentia. Some researchers still argue that they are the
most primitive form of the primates, however.
Tarsius spectrum
(tarsier)
Although data are not available on the lifespan of this
species, another member of the genus, T. syrichta, is
reported to have lived 13.5 years in captivity. Tarsius
spectrum is likely to have a similar maximum lifespan.
Results:
Immunological distances from carnivore species:
Homo sapiens
162
Macaca mulatta (rhesus monkey)
166
Ateles
geoffroyi
(spider
monkey)
149
No correlation of distances with generation length.
Nycticebus
loris) the shortest.
125
Homo
has thecoucang
longest,(slow
tree shrew,
Lemur fulvus (brown lemur)
135
Tarsius spectrum (tarsier)
137
Tupaia glis (tree shrew)
156
An evolutionary experiment
Spalax ehrenberghi
The structural protein composing the lens is called
α-crystallin.
It is composed of two subunits, αA and αB.
Each subunit is a single-copy gene located on a
different chromosome.
The two subunits have approximately 57%
sequence homology, probably reflecting ancient
gene duplication.
They also have low sequence similarity to heatshock proteins (possible origin of family).
In Spalax, aAcrystallin lost its
functional role
more than 25
million years ago,
when the mole rat
became
subterranean and
presumably lost use
of its eyes.
The aA-crystallin of Spalax evolves 4
times faster than the aA-crystallins in
other rodents, such as rats, mice, hamsters,
gerbils and squirrels. Functional
relaxation.
The aA-crystallin of Spalax evolves 5
times slower than pseudogenes. It is still
functional.
The aA-crystallin of Spalax possess all the
prerequisites for normal function and expression,
including the proper signals for alternative splicing.
The aA-crystallin of Spalax was shown to still be
present in the rudimentary lens of the mole rat.
Functional.
Explanation 1:
There is good evidence that the rudimentary eye,
though not able to detect light anymore is still of
vital importance for photoperiod perception, which
is required for the physiological adaptations of the
animal to seasonal changes.
Explanation 2:
The blind mole rat lost its vision more recently than
25 million years ago. The rate of nonsynonymous
substitutions after nonfunctionalization has been
underestimated.
Contradicting evidence:
The aA-crystallin gene is still an intact gene as far as
the essential molecular structures for its expression
are concerned.
Explanation 3:
The aA-crystallin-gene product serves a function
unrelated to that of the eye.
Supporting evidence:
1. aA-crystallin has been found in other tissues.
2. aA-crystallin also functions as a chaperone that
binds denaturing proteins and prevents their
aggregation.
3. The regions within aA-crystallin responsible
for chaperone activity are conserved in the mole
rat.