CHAPTER 24 Molecular Evolution
Download
Report
Transcript CHAPTER 24 Molecular Evolution
Peter J. Russell
CHAPTER 24
Molecular Evolution
edited by Yue-Wen Wang Ph. D.
Dept. of Agronomy,台大農藝系
NTU
遺傳學 601 20000
Chapter 24 slide 1
1. Populations and genes change over evolutionary time. Molecular
evolution examines DNA and proteins, addressing two types of
questions:
a. How do DNA and protein molecules evolve?
b. How are genes and organisms evolutionarily related?
2. Population genetics focuses on changes between generations.
Molecular evolution considers the hundreds or thousands of generations
needed for speciation, where small departures from Hardy-Weinberg
equilibrium, random effects and slight differences in fitness can
become very significant.
3. Development of techniques in molecular biology makes it possible to
study molecular evolution, using genomes as historical records that can:
a. Reveal the dynamics of evolutionary processes.
b. Indicate the chronology of change.
c. Identify phylogenetic relationships between organisms.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 2
Patterns and Modes of Substitutions
Substitutions in Protein and DNA Sequences
1. Patterns of variation within homologous genes show that some amino
acid substitutions are found more frequently than others. Substitutions
often involve amino acids with similar chemical characteristics,
supporting two evolutionary principles:
a. Mutations are rare events.
b. Most dramatic changes are removed by natural selection.
2. Chemically similar amino acids tend to have similar codons, and so
may result from a single mutation.
a. Natural selection acting on this variation produces proteins optimized
for role and environment.
b. More substantial alterations of protein structure are likely to be
deleterious and so be removed from the gene pool.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 3
Sequence Alignments
1. Sequence comparison begins with alignment using computer
algorithms based on the idea that the best alignments reflect true
ancestral relationships.
a. Matching nucleotides are interpreted as unchanged since a common
ancestor.
b. Substitutions, insertions and deletions can be identified.
c. Gaps inserted to maximize the similarity between aligned sequences
indicate the occurrence of insertions or deletions (indels).
2. Many alignments are possible between sequences, and algorithms
typically maximize the matching number of amino acids or nucleotides,
invoking the smallest possible number of indel events.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 4
Substitutions and the Jukes-Cantor Model
1. When DNA sequences diverge, they begin to collect mutations. The
number of substitutions (K) found in an alignment is widely used in
molecular evolution analysis.
a. If the alignment shows few substitutions, a simple count is used.
b. If many substitutions have occurred, it is likely that a simple count will
underestimate the substitution events, due to the probability of multiple
changes at the same site (Figure 24.1).
2. Jukes and Cantor (1969) assumed that each nucleotide is equally likely
to change into any other nucleotide, and created a mathematical model
to describe multiple base substitutions.
a. Rate of change to any of the other three nucleotides is designated α, so
the overall rate of substitution for any given nucleotide is 3α.
b. For example, if the beginning (t = 0) nucleotide was C, the probability
(P) of the site still being C at the first time point (t = 1), is PC(1) = 1 - 3α.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 5
c. After more time has passed (t = 2), the probability (PC(2)) is calculated from
the equation: PC(2) = (1 - 3α)PC(1) + α[1 - PA(1)].
d. The probability of that site containing C at any given time in the future is
defined by the equation: PC(t) = 1⁄4 + (3⁄4)e-4αt.
3. As data became available a decade later, the observation that different
mutations occur at different rates (e.g., transitions are more common
than transversions) revealed oversimplifications in the Jukes-Cantor
model.
4. The model provided a framework to estimate actual number of
substitutions (K) when multiple substitutions were possible.
a. K = -3⁄4 ln(1 - 4⁄3p)
b. p is the fraction of nucleotides that are different in a simple count of
sequence mismatches.
i. When few mismatches are observed, p is small and the chance of
multiple mutations at a given site is also small.
ii. When many mismatches are counted, the actual number of
substitutions is calculated to be even larger than the direct count.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 6
Fig. 24.1 Two possible scenarios in which multiple substitutions at a single site would
lead to underestimation of the number of substitutions that had occurred if a
simple count was performed (t = time)
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 7
Rates of Nucleotide Substitutions
1. The number of substitutions in homologous sequences
since divergence is central to molecular evolution
analysis.
a. Number of substitutions per site (K) coupled with divergence
time (T) is converted to a rate (r) of substitution in the equation:
r = K/(2T).
b. Substitutions are assumed to accumulate simultaneously and
independently in both species.
2. Substitution rate comparison provides insight into the
mechanisms of molecular change and evolutionary
events.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 8
Variation in Evolutionary Rates Within Genes
1. Studies show that different regions of genes evolve at
different rates.
2. Distinctions are seen between and within coding and
noncoding regions. Examples of noncoding regions
include introns, leaders and trailers, nontranscribed
flanking regions and pseudogenes.
3. Even within the coding region, not all nucleotide
substitutions create changes in the gene product (e.g., a
substitution at the third position of a codon may produce a
synonymous codon).
台大農藝系 遺傳學 601 20000
Chapter 24 slide 9
Synonymous and NonSynonymous Sites
1. Different gene regions evolve at different rates (Table 24.1).
2. Synonymous changes, which do not alter the amino acids in the protein,
are found five times more often than nonsynonymous changes.
a. Both types of change are equally likely to occur, but nonsynonymous
changes are usually detrimental to fitness, and are eliminated by natural
selection.
b. This creates a distinction between mutations and substitutions:
i. Mutations are changes in nucleotide sequences due to errors in
replication or repair.
ii. Substitutions are mutations that have passed through the filter of
selection.
c. Synonymous substitutions probably reflect the actual mutation rate in
the genome. Nonsynonymous substitution rates do not.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 10
Flanking Regions
1. Changes in 3’ flanking regions have no known effect on amino acid
sequence, and little effect on gene expression, so most are tolerated by
natural selection.
2. Introns have rates of change higher than exons, but not as high as 3’
flanking regions, due to their need to retain:
a. Sequences required at splice junctions and branch points.
b. In some cases, alternative ORFs used by alternative splicing that takes
place in some tissues but not others.
3. The 5’ flanking regions have lower rates of change than 3’ regions, due
to the presence of promoters and other gene regulatory elements. Small
changes in these sequences may have a large effect on protein
production, and so be subject to natural selection.
4. Leader and trailer regions have lower rates than the 5’ flanking region,
because they contain signals for processing and translation of mRNA.
5. Non-synonymous coding sequences have the lowest rate of change,
because most protein coding sequences produce products optimized for
their role and environment. Most substitutions are eliminated by natural
selection.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 11
Pseudogenes
1. Non-functional pseudogenes have the highest evolution rate seen. No
longer encoding proteins, changes do not impact fitness and are not
eliminated by natural selection.
2. Between mice and humans, for example, pseudogenes show about five
times as many changes as regions that encode proteins or regulate gene
expression.
3. Natural selection evaluates the consequences of an enormous number of
changes, on an evolutionary time scale.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 12
Codon Usage Bias
1. Codon bias is an example of the effect of small changes over many generations.
a. The slightly lower rate of evolutionary change at synonymous sites compared with
pseudogenes suggests that some triplet codons are favored over others.
b. Sequence data show that synonymous codons are not used equally throughout the
coding sequences of an organism. Leucine codons are an example:
i. There are six codons that specify leucine (UUA, UUG, CUU, CUC, CUA and
CUG).
ii. 60% of the leucine codons in a bacterium are CUG.
iii. In yeast, 80% of the leucine codons are UUG.
2. Selection appears to favor some codons over others. Proposed reasons why one
codon would be more successful include:
a. Synonymous codons may be recognized by different tRNAs, and some may be favored
because their cognate tRNA is more abundant or efficient.
b. Bonding energy between the tRNA and codon may differ, due to differences in base
pairs.
3. Selective pressure acting on translation efficiency and/or bonding energy appears
to be especially significant in:
a. Genes that are expressed at high levels.
b. Organisms with short generation times and large
populations (e.g., bacteria,Chapter
yeast and
台大農藝系 遺傳學 601 20000
24 slide 13
fruit flies).
Variation in Evolutionary Rates Between Genes
1. Striking differences also occur in the rate of gene evolution within a
species. The difference results from one or both of these factors:
a. Differences in substitution frequency.
b. The action of natural selection on a locus.
2. Distinguishing between adaptive and random changes in nucleotide
sequences requires statistical analysis.
a. An example is the McDonald-Kreitman test (1991) comparing within-species
polymorphism with between-species divergence at synonymous and
nonsynonymous sites in a gene.
i. If the ratio of nonsynonymous to synonymous substitutions in a species is
the same as between species, the substitutions are likely to be neutral.
ii. If the ratios are not the same, natural selection must be responsible.
b. In this analysis applied to mammals:
i. Synonymous substitution rates usually differ by a factor of less than two.
ii. Nonsynonymous substitutions show about 1,000-fold difference between
different classes of genes (Table 24.2).
台大農藝系 遺傳學 601 20000
Chapter 24 slide 14
c. Variations in substitution rates between genes must also be largely due to
differences in the intensity of natural selection at each locus.
d. An example is two classes of genes, histones and apolipoproteins, which
have different levels of functional constraint.
i. Histones are essential DNA binding proteins in eukaryotes, and most
substitutions decrease their ability to bind DNA. Histones are thus very
slow to evolve, and are highly conserved across species.
ii. Apolipoproteins nonspecifically bind lipids in their hydrophobic
domains, and carrying them in the blood. The hydrophobic domains
遺傳學 601 20000
Chapter 24 slide 15
work with any similar hydrophobic台大農藝系
amino acids.
3. Amino acid substitutions are generally deleterious, but sometimes
natural selection favors variability. Genes of the mammalian major
histocompatability complex (MHC) are an example.
a. The MHC genes are under pressure to diversify, and show a greater rate of
nonsynonymous substitutions than synonymous ones.
b. MHC is a large multigene family involved with the immune system’s
ability to recognize foreign antigens.
i. About 90% of humans receive different sets of MHC genes from each
parent.
ii. A sample of 200 humans will have 15–30 different alleles.
c. MHC diversity reduces the risk of an entire population being vulnerable to
infection with a single virus. It also drives viral evolution, which is much
faster than that for mammalian genes, due to error-prone replication and
diversifying selection.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 16
Rates of Evolution in Mitochondrial DNA
1. Organelle genomes are distinct from nuclear genomes in replication,
transmission and their increased rate of substitutions.
a. Mammalian mitochondria have about 15 kb of circular dsDNA (mtDNA).
b. Human mtDNA encodes two rRNAs, 22 tRNAs and 13 proteins.
2. Mammalian mitochondrial genes have an average synonymous
substitution rate about 10 times the average for nuclear genes. Possible
reasons for the increased rate include:
a. Lack of proofreading during replication.
b. Differences in DNA repair.
c. Higher levels of mutagens (e.g., oxygen free radicals) due to metabolic
processes.
d. Lack of selective pressure, since most cells contain several dozen
mitochondria.
3. Mammalian mtDNA is inherited almost exclusively from the mother, and
does not undergo meiosis, so all offspring have the maternal mtDNA
genotype. Tracing maternal lines via mtDNA is a valuable tool in
population genetics.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 17
Fig. 24.2 Lineage relationships among mtDNA types in pocket gophers
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 18
Molecular Clocks
1. Genes with similar functions can show very uniform rates of molecular
evolution over long periods of time.
2. This led Zuckerkandl and Pauling (1960s) to suggest that amino acid
changes accumulate at a constant rate over many tens of millions of
years, functioning as a molecular clock that measures divergence from
a common ancestor.
3. The molecular clock runs at different rates in different proteins.
Comparison of the divergence between two homologous proteins
correlates well with time since speciation. This allows calculation of:
a. Phylogenetic relationships between species.
b. The time of their divergence (in much the same way as radioactive
decay is used to date geological times).
4. The molecular clock hypothesis has been challenged on the basis of:
a. Inconsistencies with morphological (classical) evolution, based on a
fossil record which has a more erratic tempo.
台大農藝系 遺傳學
Chapter 24 slide 19
b. Questions about the uniformity of evolutionary
rates601
in 20000
all genes.
Fig. 24.3 The molecular clock runs at different rates in different proteins
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 20
Relative Rate Tests
1. Divergence dates from the fossil record are of questionable accuracy,
and so a method to estimate the overall rate of substitution in different
lineages without knowing their divergence date was devised by Sarich
and Wilson (1973).
a. To determine the relative rate of substitution for two species, a third species
less related to both is used as an outgroup (e.g., if humans and gorillas are
compared, the outgroup might be a baboon, or other primate) (Figure 24.4).
b. The number of substitutions between any two species is assumed to be the
sum of the number of substitutions along the branches of the tree
connecting them.
c. Simple algebra is used to calculate the amount of divergence that has taken
place since the two species last shared a common ancestor.
2. As DNA sequence data have become available, the molecular clock
premise has been tested.
a. Substitution rates are similar in rats and mice.
b. Substitution rates in humans and apes are about 1⁄2 as rapid as those in
rodents.
c. The molecular clock clearly varies among taxonomic groups, complicating
the use of molecular divergence to date the last common ancestor.
台大農藝系
遺傳學 601
20000
Chapter 24 slide 21
d. In groups with a uniform clock (e.g., rodents)
this model
is useful.
Fig. 24.4 Phylogenetic tree used in a relative rate test
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 22
Causes of Variation in Rates
1. Some possible explanations for the observed differences in
evolutionary rates:
a. Generation time varies greatly between species. Substitution rates
should be related more closely to the number of germ line replications
than to simple divergence times.
b. Other differences in the lines since the time of divergence may be
involved. These include:
i. Average repair efficiency.
ii. Average exposure to mutagens.
iii. Opportunities to adapt to new ecological niches and environments.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 23
Molecular Phylogeny
1. Evolution is defined as genetic change in the face of selective
dynamics, and so genetic relationships are key to understanding
evolutionary relationships.
2. Organisms that are similar at the molecular level are expected to be
more closely related than dissimilar organisms. Phylogenetic
relationships among living things are inferred from molecular
similarity.
a. Before molecular biology, phenotype was used for evolutionary studies to
infer genetic information.
b. Original studies used gross anatomy. Later, behavioral, ultrastructural and
biochemical traits were also used.
c. Evolutionary trees were constructed for many groups of plants and animals,
and these continue to provide a basis for evolutionary study.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 24
d. Phenotypes can be misleading, because they do not always reflect genetic
relatedness.
i. Sometimes similarities result from convergent evolution, complicating
the study of divergence among organisms (e.g., wings alone would put
birds, bats and insects in the same evolutionary group).
ii. Not all organisms have easily studied phenotypic features (e.g.,
bacteria).
iii. Among distant relatives (e.g., humans and bacteria), few phenotypic
features are shared, and it is difficult to determine how such species
should be compared.
3. Molecular evolution provides important information, because the
effects of natural selection are generally less pronounced at the DNA
sequence level.
4. Comparison of molecular and morphological phylogenies is valuable
for examining the effect of natural selection on phenotypic differences
at levels from molecular to gross anatomical.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 25
Phylogenetic Trees
1. Phylogenetic trees are used in phylogenetic reconstructions to describe
the relationship between species.
a. All living things on earth shared a common ancestor about 4 billion years
ago.
b. Every phylogenetic tree uses branches that connect adjacent nodes.
i. Terminal nodes indicate taxa for which molecular information is
available.
ii. Internal nodes represent common ancestors of the two (or more) groups.
c. Branch length may be scaled to show the amount of divergence between taxa.
d. If all nodes on the tree have a common ancestor, it is possible to make it a
rooted tree, indicating an evolutionary path.
i. Unrooted trees show a relationship between nodes, and do not indicate an
evolutionary path.
ii. Roots for unrooted trees can usually be determined by using an outgroup
for comparison.
iii. In a situation where only three taxa are considered, there
are three possible rooted trees, and only one unrooted tree (Figure 24.5).
台大農藝系 遺傳學 601 20000
Chapter 24 slide 26
Fig. 24.5 The relationship between three taxa can be described by only one unrooted
tree but three different rooted trees
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 27
Number of Possible Trees
1. As more taxa are considered, the number of possible trees quickly
becomes enormous (Table 24.3).
2. The number of trees can be determined for any number of taxa (n).
a. For rooted trees (NR) the equation is:
NR = (2n - 3)! ÷ 2n-2(n - 2)!
b. For unrooted trees (NU) the equation is:
NU = (2n - 5)! ÷ 2n- 3(n - 3)!
3. The value for n can be as large as every species, or even every
individual.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 28
Gene Versus Species Trees
1. A gene tree is a phylogenetic tree based on divergence within a single
homologous gene.
a. A gene tree represents the history of the gene, but not necessarily the
history of the species.
b. Species trees usually analyze data from multiple genes.
2. Divergence within genes typically occurs prior to speciation.
a. This means that members of separate groups may be more similar to
each other than they are to members of their own population.
b. Divergence is especially high for loci where diversity is advantageous
(e.g., MHC). On the basis of MHC alone, many humans would be
grouped with gorillas rather than other humans, because the
polymorphism predates the split in the two lineages.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 29
Fig. 24.6 Transspecies or shared polymorphism may occur if the ancestor was
polymorphic for two or more alleles and if alleles persist to the present in
both species
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 30
Reconstruction Methods
1. Many possibilities exist for phylogenetic trees, and it is
generally impossible to know which is the true tree that
represents actual events in evolution. Most phylogenetic
trees generated with molecular data are considered
inferred trees.
2. Computer algorithms that generate these inferred trees
use two types of approaches, distance matrix and
parsimony-based methods.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 31
Distance Matrix Approaches to Phylogenetic
Reconstruction
1. Distance matrix approaches group organisms on the basis of overall similarity.
a. The unweighted-pair-group method (UPGMA) is based on statistics, and requires data
that can be condensed to measure genetic distance between all pairs of taxa being
considered.
i. Pairwise distances are calculated for each of the taxa.
ii. UPGMA begins by clustering the two taxa with the smallest difference,
separating them into one composite group.
iii. Then a new distance matrix is computed between the group and the remaining
taxa, and taxa separated by the smallest distance are clustered.
iv. The process repeats until all taxa are grouped.
b. If branch lengths represent evolutionary distance, branch points are positioned
halfway between the taxa being grouped.
2. The distance matrix approach works well with either morphological or molecular
data, as well as combinations of both, and takes all data into account.
3. The UPGMA approach assumes a constant rate of molecular evolution in all
lineages, which is probably not accurate. Several alternative matrix-based
approaches (e.g., transformed distance and neighbor-joining methods)
incorporate different evolutionary rates for different
台大農藝系lineages.
遺傳學 601 20000
Chapter 24 slide 32
Parsimony-Based Approaches to Phylogenetic
Reconstruction
1. Parsimony approaches group organisms to minimize the number of
substitutions since the last shared ancestor.
a. The underlying principle is that mutations are rare events, and so the tree that
invokes the fewest mutations is considered best (the tree of maximum
parsimony).
b. This approach focuses only on sequence positions that favor one tree over
another with regard to number of substitutions (informative sites), rather than
on all sites equally (Figure 24.7).
c. For a site to be informative, it has to have at least two different nucleotides,
and each nucleotide has to be present at least twice in the array of sequences
considered.
2. Maximum parsimony trees are constructed by identifying all informative
sites within an alignment, and then determining which unrooted tree
invokes the fewest mutations at these sites.
a. This also produces inferred ancestral sequences at each node of the tree,
filling in for “missing links” and providing insight into ancestral organisms.
b. The approach assumes that all mutations are equally likely, although more
complex algorithms consider the differing probabilities of transitions and
transversions.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 33
Fig. 24.7 Three different unrooted trees describe all possible relationships between
four taxa
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 34
Bootstrapping and Tree Reliability
1. Large numbers (e.g., ≧ 30 species) of long sequences are difficult to
analyze, even with fast computers and streamlined algorithms.
2. Neither distance matrix nor maximum parsimony methods can
guarantee the correct tree, but generally, if a similar tree results from
both of these fundamentally different methods, it is considered to be
fairly reliable.
3. The confidence level for portions of inferred trees can be determined by
bootstrap tests, in which a subset of the original data is drawn with
replacement and a new tree inferred.
a. When this is repeated hundreds or thousands of times, and the same
groupings usually emerge, these parts of the tree are well supported.
b. The fraction of similar groupings is placed next to the nodes in
bootstrapped trees to convey the confidence in that part of the tree.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 35
Phylogenetic Trees on a Grand Scale
1. Sequence data have provided new insights into the evolutionary
relationships underlying the primary divisions of life.
a. The simple dichotomy of plants and animals was revised as more
organisms were discovered.
b. The basic division became prokaryotes and eukaryotes, even though
grouping by the absence of structures (e.g., internal membranes) is
recognized as a bad way to construct taxonomic groupings.
2. More recently, Whittaker proposed five kingdoms:
a. Prokaryotes.
b. Protista.
c. Plants.
d. Fungi.
e. Animals.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 36
The Tree of Life
1. DNA and RNA sequences were first used for phylogenetic purposes in
the mid-1980s.
a. Woese and Pace constructed an evolutionary tree based upon 16S
rRNA sequences, because homologs are found in all organisms, as well
as in mitochondria and chloroplasts (Figure 24.8).
b. The tree showed three major groups:
i. Eubacteria, including traditional bacteria, mitochondria and
chloroplasts.
ii. Eukaryotes.
iii. Archaebacteria, including thermophiles and other extremophiles.
c. Archae and eubacteria, although both prokaryotes, were as different
genetically as eubacteria are from eukaryotes.
2. Later work comparing other genes (e.g., 5S rRNAs, large rRNAs and
genes for fundamental proteins) supports this phylogeny, and also
shows that eukaryotic mitochondrial and chloroplast genes have
different origins than their nuclear counterparts
(Box 24.1).
台大農藝系 遺傳學 601 20000
Chapter 24 slide 37
Fig. 24.8 An evolutionary tree of life revealed by comparison of 16S rRNA sequences
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 38
Human Origins
1. DNA sequence analysis is also used to understand human evolution, because
differences between human populations are relatively small.
a. For example, mtDNA differs by only about 0.33% between two human populations,
while other primates have much larger differences (e.g., subspecies of orangutans at
5%).
b. The greatest differences are not separated by continents, but are found in Africa.
i. The differences found between African populations are greater than those seen
between Africans and humans on other continents.
ii. This is generally believed to mean that that humans arose in Africa, developed
diverse populations, and then migrated to other continents (the “out of Africa”
theory).
iii. All humans alive today are believed to carry mtDNA derived from an African
ancestor, and all males to have a Y chromosome from the same source.
iv. These analyses set the date of divergence for humans at about 200,000 years
ago, although the out-of-Africa theory is not universally accepted.
2. DNA sequence data are increasingly important in the study of evolution of
humans and other organisms.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 39
Acquisition/Origins of New Functions
1. Haldane (1932) suggested that spare copies of existing
genes could give rise to new genes.
2. This appears to account for most new genes, although
other mechanisms (e.g., transposons) also exist.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 40
Multigene Families
1. Eukaryotes often have tandemly arrayed copies of genes with very similar
sequences (multigene families) that appear to be the result of gene duplication.
2. The human globin genes are a classic example of a multigene family, with a general
distribution of seven a-like genes on chromosome 16, and six b-like genes on
chromosome 11.
a. Globin-like genes are found in many animals and even plants, suggesting an ancient
origin.
b. Animal globin genes have the same general structure (three exons and two introns), but
their number and order vary among species (Figure 24.9).
c. Sequence and structure suggest duplication of an ancestral gene, which diverged to
produce the a-like and b-like genes.
d. Duplication and divergence would then produce the modern a-like and b-like gene
groups.
3. Variation in globin gene number and distribution found in modern humans suggest
that duplication and deletion of genes is an ongoing process still operating today.
a. Duplications and deletions may result from unequal crossing-over.
b. Duplications may also arise through transposition.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 41
Fig. 24.9 Organization of the globin gene families in several mammalian species
Peter J. Russell, iGenetics: Copyright © Pearson Education, Inc., publishing as Benjamin Cummings.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 42
Gene Duplication and Gene Conversion
1. Duplication frees a copy of the sequence to undergo changes, since a
functional copy will still exist.
a. Most changes would produce less functional products, or even
nonfunctional pseudogenes.
b. A few changes, however, might alter function and/or pattern of
expression to something more advantageous for the organism. Selection
would allow these genes to become widespread in the population.
2. Misalignment between a pseudogene and a functional copy can result in
gene conversion through recombination events, giving the organism
even more opportunities to create a gene with a new function.
3. Gene conversion continues to operate in modern humans. An example
is two genes for red-green color vision on the X chromosome, which
undergo gene conversion in most of the known cases of spontaneous
deficiencies in green color vision.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 43
Arabidopsis Genome Results
1. More gene duplications are being found as genomic
sequencing projects proceed. An example is Arabidopsis
thaliana (thale cress), the first plant genome to be
completely sequenced.
2. The Arabidopsis genome appears to have undergone
significantly less duplication than other commercially
important plants, but still, more than 1⁄2 of its genes are
duplicates.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 44
Domain (Exon) shuffling
1. Often, less than an entire gene is duplicated, resulting in copies of protein
domains.
a. An example is human serum albumin, whose gene has three copies of a 195
amino acid domain.
b. Internal duplication is not a rapid method of producing proteins with new
functions, however.
2. Most complex proteins arise from assemblages of several protein
domains performing different functions (e.g., substrate binding or
membrane spanning).
a. The beginnings and ends of exons and protein domains often correspond.
b. Gilbert (1978) proposed that most gene families today arose through domain
shuffling involving duplication and rearrangement of domains (usually
encoded by single exons).
c. Domain shuffling proposes that introns were a feature of early life on earth,
even though they are now missing from prokaryotes.
d. Numerous examples of complex genes made from segments of other genes
are known, and clearly some novel functions have been created in this way.
台大農藝系 遺傳學 601 20000
Chapter 24 slide 45