Transcript Document

Teosinte
Maize Landraces
Inbreds/Hybrids
Sequence Diversity in
Evolution and Crop Improvement
Sherry Flint-Garcia
Research Geneticist
USDA-ARS
MU Division of Plant Sciences
Photos courtesy J. Doebley
Sequence Diversity
Evolution:
What are the forces that cause evolution?
Speciation & hybridization
Uncovering evolutionary history
Crop Improvement:
The teosinte-maize story
The Four Forces of Evolution
Mutation -- spontaneous changes in the DNA of gametes.
Prerequisite to all other evolution.
Natural Selection -- genetically-based differences in survival
or reproduction that leads to genetic change in a
population.
Gene flow -- movement of genes between populations. In
plants this can be accomplished by pollen or seed
dispersal.
Genetic drift -- random changes in gene frequency. This is
very important in small populations.
Mutation: Generation of New Alleles
Mutations are the result of mistakes in DNA
replication, exposure to UV or to some
chemicals (mutagens) and other causes.
Point mutations
changing one nucleotide to another
e.g., C-->T
Sickle Cell Anemia
A single point mutation causes a
dramatic change in phenotype.
Other types of mutations
Indels
insertions/deletions
Cause frame-shifts, & usually
premature ‘stops’
Gene duplication
May lead to new functions
Chromosomal mutations
Inversions, translocations,
deletions
Polyploidy
Very common in plants
May lead to new species in one
step
Most point mutations have no effect or
almost no effect. Why?
Most of the genome seems to be ‘junk’ -- at least it
doesn’t code for proteins.
Many mutations within protein-coding region of genes
don’t change the amino acid specified. i.e., there is
redundancy in the genetic code.
For example,
6 different codons
specify the amino
acid leucine.
The Four Forces of Evolution
Mutation -- spontaneous changes in the DNA of gametes.
Prerequisite to all other evolution.
Natural Selection -- genetically-based differences in survival
or reproduction that leads to genetic change in a
population.
Gene flow -- movement of genes between populations. In
plants this can be accomplished by pollen or seed
dispersal.
Genetic drift -- random changes in gene frequency. This is
very important in small populations.
Natural Selection
Peppered moth (Biston betularia) evolution during
the industrial revolution in England
Early 1800s = pre-industrial
Bark of trees were white
Almost all moths were of typica form
‘typica’ form
1895 = Industrial Era
Bark of trees were covered in black soot
98% of moths were of carbonaria form
Today = Clean Air laws enforced
Prevalence of carbonaria form declining
‘carbonaria’ form
Brassica
oleracea
The Four Forces of Evolution
Mutation -- spontaneous changes in the DNA of gametes.
Prerequisite to all other evolution.
Natural Selection -- genetically-based differences in survival
or reproduction that leads to genetic change in a
population.
Gene flow -- movement of genes between populations. In
plants this can be accomplished by pollen or seed
dispersal.
Genetic drift -- random changes in gene frequency. This is
very important in small populations.
Gene Flow
Tends to homogenize populations.
Rates of gene flow depend on the spatial
arrangement of populations.
“Directional” movement of alleles
Migration occurs at random among
a group of equivalent populations.
Migration along a linear set of populations
Populations
are continuous.
The Four Forces of Evolution
Mutation -- spontaneous changes in the DNA of gametes.
Prerequisite to all other evolution.
Natural Selection -- genetically-based differences in survival
or reproduction that leads to genetic change in a
population.
Gene flow -- movement of genes between populations. In
plants this can be accomplished by pollen or seed
dispersal.
Genetic drift -- random changes in gene frequency. This is
very important in small populations.
Founder effect: Gene flow and genetic drift are responsible
for the limited genetic variation on islands, relative to
mainland populations.
Speciation and Hybridization
Speciation – how do new species arise?
What is a species, anyway?
Most species were originally described by their
morphology.
The Problem: Convergence
Similar features in unrelated organisms due to evolution
of traits that “work” in similar environments
Convergent structures in the ocotillo (left) from the American
Southwest, and in the allauidia (right) from Madagascar.
Nectar feeders have converged on
this hovering long-tongued
morphology.
Speciation and Hybridization
Biological Species Concept (BSC)
Based on reproductive compatibility
Natural spatial, temporal, and morphological
discontinuities generally correspond to fertility barriers
The Problem: In plants, many named species can
hybridize.
Most dandelions are asexual.
So the biological species
concept (BSC) doesn’t apply.
How can you name species
depending on who can mate
with whom when the
organisms do not mate at all?!
Scarlet and Black
oaks can hybridize
and inhabit the same
range -- but they
have different
microhabitat
preferences so
hybridization is rare.
These pines can also hybridize but they shed their
pollen at different times of the season
Speciation by Hybridization
Hybridization
often shows
how difficult it
is to apply the
BSC to plants.
The hybrid in this
case is a new
species. The
rearrangements of
its chromosomes
make it infertile with
either parent.
hybrid
As the climate
becomes drier the
desert splits the
range of this
hypothetical tree
species. This
reduces gene flow
between the now
isolated populations
and sets the stage
for speciation.
Evolution of species
that are geographically
separated. Genetic drift plays
a significant role.
“Edge effect” where evolution
of reproductive barriers occurs
between neighboring populations.
Requires considerable selection
pressure.
Establishment of a new population
with a different ecological niche
within the same geographical range
of the parental population
Uncovering Evolutionary History
Taxonomy vs. Systematics
Estimating Phylogeny
Distance Methods
Maximum Parsimony Methods
Maximum Likelihood Methods
Taxonomy vs. Systematics
Taxonomy
Discovering
Describing
Naming
Classifying
Systematics
Figuring out the evolutionary relationships of species
Summarize the evolutionary history of a group
Plant Taxonomy
taxon - any group at any rank
kingdom
division (phylum)
class
order
family
genus
species
corn = common name
Plantae (Viridiplantae)
Anthophyta
Liliopsida
Commelinales
Poaceae
Zea
Zea mays
always
capitalized
never
capitalized
Plant Systematics
A phylogenetic tree is used
to illustrate systematic
relationships
Modern taxonomic groups
generally correspond to
clades on a phylogenetic
tree (i.e. cladogram)
Example: phylogenetic
tree of the grass family
Mathews et al. 2000 American Journal of Botany
Angiosperm
Phylogeny Group
Tree
“Dicots” are not a
monophyletic
group.
Data Types that can be used to
Estimate a Phylogeny
Cross Compatibility
Uses the ‘Biological Species
Concept’
Morphological
Continuous traits
Meristic (countable) traits
Cytological
Chromosome number
Chromosome features
Pairing in hybrids
Molecular data
Secondary chemicals
Proteins
DNA
Allele frequencies at many loci
(isozymes, SSR)
DNA sequences, considered
as a whole
DNA sequences, considered
site-by-site
Maximum Parsimony
(Minimum Evolution) Methods
The process of attaching preference to the pathway
that requires the invocation of the smallest
number of mutational events.
Most effective when examining sequences with
strong similarity
Underlying premises:
Mutations are exceedingly rare events.
The more unlikely events a model invokes, the less likely
the model is to be correct.
Using only trait 1 …
trait1
2
3
4
trait5
0
1.2
red
A
T
Species 2
0
3.4
blue
G
C
Species 3
1
3.5
red
A
T
Species 1
sp2
0<->1
Species 4
1
4.0
red
A
T
Species 5
1
2.8
blue
G
T
Traits must have
discrete character states.
sp1
sp3 sp4 sp5
Must have same character
state in at least 2 taxa.
But traits 3 & 4 disagree with trait 1.
trait1
2
3
4
trait5
Species 1
0
1.2
red
A
T
Species 2
0
3.4
blue
G
C
Species 3
1
3.5
red
A
T
Species 4
1
4.0
red
A
T
Species 5
1
2.8
blue
G
T
sp5
sp2
Red<->blue
A<->G
sp1 sp4 sp3
Every possible tree is considered individually for each
informative site (computationally intensive).
After all informative sites have been considered, the
tree that invokes the smallest total number of
substitutions is the most parsimonious.
5 1 3 4
2
0
2 3 4 5
1
0
Blue
G
4 substitutions
required
Red
A
1
0
Blue
Blue
G
G
5 substitutions
required
Red
A
1
Distance-based approaches
Compare each taxon to every
other taxon to estimate a
“distance matrix”
Sp1
Sp2
Sp3
Sp4
Sp5
Sp1
Sp2
Sp3
Sp4
Sp5
0
d12
d13
d14
d15
0
d23
d24
d25
0
d34
d35
0
d45
0
Distances are then
‘clustered’ to estimate
a phylogenetic tree.
Distance-based approaches
Compare each taxon to every
Example: DNA sequence considered as a whole
other taxon to estimate a
10
20
30
40
50
Sp1:
GTGCTGCACG
GCTCAGTATA
GCATTTACCC
TTCCATCTTC
AGATCCTGAA
“distance matrix”
Sp2: ACGCTGCACG GCTCAGTGCG GTGCTTACCC TCCCATCTTC AGATCCTGAA
Sp1
Sp2
Sp3
Sp4
Sp5
Sp1
Sp2
Sp3
0
9
0
Sp4
Sp5
8
12
15
11
15
18
0
10
13
0
5
0
Sp3: GTGCTGCACG GCTCGGCGCA GCATTTACCC TCCCATCTTC AGATCCTATC
Sp4: GTATCACACG ACTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCCTAAA
Sp5: GTATCACATA GCTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCTAAAA
Distance-based approaches
Sp1
Sp2
Sp3
Sp4
Sp5
Sp1
Sp2
Sp3
0
9
0
Sp4
Sp5
8
12
15
11
15
18
0
10
13
0
5
Distances are then
‘clustered’ to estimate
a phylogenetic tree.
Example:
UPGMA algorithm
Unweighted Pair-Group
Method using Arithmetic means
The smallest distance is identified, the average
of the two combined taxa is calculated, and the
matrix is recalculated. This iteration is repeated.
0
2.5
4
2.5
5
Distance-based approaches
Sp1
Sp2
Sp3
4-5
Sp1
Sp2
Sp3
4-5
0
9
8
13.5
0
11 16.5
0
11.5
0
4
4
1
3
2.5
4
2.5
5
Distance-based approaches
Sp2
Sp2
1-3
4-5
0
1-3
4-5
10 16.5
0
12.5
0
4
4
5
1
3
2
2.5
4
2.5
5
Distance-based approaches
1-2-3 4-5
1-2-3
4-5
0
12.5
0
6.5
6.5
4
4
5
1
3
2
2.5
4
2.5
5
Maximum Likelihood Methods
Best suited for DNA and protein sequence data
Requires a model of evolution
Each nucleotide/amino acid substitution has an
associated likelihood
A function is derived to represent the likelihood of
the data given the tree, branch-lengths and
additional parameters
Function is minimized
1:
2:
3:
4:
A
A
A
A
C
C
C
C
G
G
G
A
C
C
C
C
G
G
A
A
T
T
A
G
T
T
T
G
G
G
G
G
G
G
A
A
G
G
A
A
Based on a model of
nucleotide substitution
matrix (transitions and
transversions)
C
G
T
10-6
2 x10-6
10-6
1
3
1
2
A
1
2
4
3
4
C
10-6
1
3
L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13
4
T
2
0.25
L0
L1
L2
T
10-6
G
2 x 10-6
L4
T
A
L5
T
Tree 1
L6
A
G
G
T
2 x10-6
10-6
1
10-6
2 x10-6
10-6
2 x10-6
1
10-6
10-6
1
1:
2:
3:
4:
A
A
A
A
C
C
C
C
G
G
G
A
C
C
C
C
1
3
2
4
G
G
A
A
T
T
A
G
T
T
T
G
G
G
G
G
G
G
A
A
G
G
A
A
Consider every
possible base
assignment to each
node and calculate the
likelihood
L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13
L(Tree 2) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 1 x 10-18
T
L0
C
L1
L2
T
T
L5
T
Tree 1
T
L6
A
0.25
L1
G
L4
L0
G
L3
T
L2
2 x 10-6
L4
T
L5
A
Tree 2
10-6
Repeat for each of
node assignment, and
each site in alignment.
Probability of that
unrooted tree is the
sum of all individual
trees.
Repeat for each
G
unrooted tree and
2L6 x 10-6
choose the tree with
G
the highest liklihood.
The TeosinteMaize Story
6000 – 10,000 years ago
The practical side of sequence diversity
PLANT BREEDING!
Sequence Diversity in Teosinte
Sequence Diversity in Maize
Selection During Domestication and Improvement
Sequence Diversity and Plant Breeding
Genetic diversity within a crop species is the raw
material for current plant breeding
Genetic diversity is the insurance policy to enable
plant breeders to adapt crops to changing
environments
The Problem
To what degree is limiting genetic diversity
inhibiting genetic improvement in corn?
160
Bushels Per Acre
140
120
100
80
60
40
Open Pollinated Varieties
Single Cross Hybrids
20
Double Cross Hybrids
0
1866 1876 1886 1896 1906 1916 1926 1936 1946 1956 1966 1976 1986 1996
Year
Two Views of the Problem
“Most of the corn germplasm in use in the USA today is
derived from mixtures of only two major races [out of ~ 300
races total] (Wallace and Brown, 1956). The simplest means of
correcting this situation and of increasing the genetic
diversity of this important crop is to introduce unrelated
sources of germplasm” (Brown and Goodman, 1977, Races of Corn,
in Corn and Corn Improvement)
[From a project comparing sequence diversity in 21 genes
of nine U.S. inbred lines with 16 diversity maize landraces]
“We found that our sample of [U.S.] inbreds contained a
level of [SNP] diversity that was 77% the level of diversity in
our landrace sample.” (Tenaillon et al., 2001, PNAS, 98:9161-9166)
Sequence Diversity in Maize
How has selection shaped sequence diversity in maize?
Survey SNPs from ~1800 genes in diverse maize and
teosinte germplasm
Screen 4000 candidate genes for evidence of selection
Practical Goal: identify genes exhibiting selection
Domestication, agronomic improvement, and local
adaptation
Allele Frequencies
teosinte
Domestication
landraces
Plant Breeding
modern
inbreds
Unselected
Gene
Domestication
Gene
Improvement
Gene
Can we develop genomic screens to
identify genes that have undergone
selection?
1. Invariant SSR approach
2. Direct Sequencing Approach
What proportion of genomic sequences that have
low allelic diversity among inbreds result from
selection for domestication?
Contrast sequence diversity among teosintes,
landraces, and inbreds
Screening SSR
primers against
12 inbred lines
1,772 total SSRs
1,053 were polymorphic (Class I)
719 were invariant (Class II)
Invariant SSR primers
Invariant SSR Screening
470 invariant SSR primer sets
321 monomorphic throughout
60 polymorphic in both exotics and teosintes
14 polymorphic only in exotics
75 polymorphic only in teosintes (Class II-E)
Non - Class II
Teosinte (6)
Landrace (5)
US Inbreds
Landrace (5)
US Inbreds
Class II
Teosinte (6)
Vigouroux et al. 2002. PNAS 99: 9650
Analysis of Class II-E SSRs
31 Class I SSRs and 44 Class II-E SSRs
44 teosinte and 45 landrace accessions
Tested for selection (loss of diversity)
0 Class I SSRs showed evidence of selection
15 Class II-E SSRs showed evidence of selection
Extrapolated back to the 1772 total SSRs:
“1.4% genes have been selected”
Direct Sequencing Approach
Purpose: to develop a SNP resource for the
maize community
Result: a LOT of data!!!
Number of unigenes
Distribution of SNP Haplotypes (patterns)
100
90
80
70
60
50
40
30
20
10
0
470 maize Unigenes in
14 maize lines
Mean haplotype # = 4.46
1
2
3
4
5
6
7
8
9
10
11
Number of haplotypes
Conserved
12
> 80% of unigenes have
2 to 7 haplotypes
Diverse
For each gene, a few haplotypes account for much of the diversity
Are genes with low inbred diversity enriched for
domestication and improvement candidates?
(Masanori Yamasaki, post-doc in McMullen Lab)
36 genes with no diversity among a 14-inbred set
Sequenced same region in 16 landraces,
16 teosintes, and a Tripsacum dactyloides
sample.
Tripsacum
teosinte
landraces
Test for selection on inbreds, landraces and
teosintes compared to four neutral genes.
inbreds
Selection Tests for 33 (of 36) Genes
5 genes were significant in both the inbreds and
the landraces (evidence for domestication genes).
7 genes were significant in the inbreds but not the
landraces (evidence for improvement genes).
1 additional gene was classified as either
domestication or improvement depending on the
test.
13 out of 33 genes = 39% !!
Yamasaki et al. submitted
Selection on a Genomic Scale
Sequenced 774 maize unigenes in 14 maize
inbreds and 16 teosinte accessions
Tested for selection using coalescent simulations
Result: 2-4% had experienced artificial selection
Assume 59,000 genes in maize
59,000 x 2% = 1200 selected genes
Wright et al. 2005 Science 308: 1310
Where are we going with this?
Before genomics, 11 genes had been identified
as selected by population genetic approaches.
By sequencing 1000 genes, have ~50 novel
candidates.
1140 more !
We need:
1. to completely sequence the maize genome to
identify ALL genes.
2. to resequence all remaining genes in multiple
maize inbreds and teosinte accessions.
Signatures of Selection
If selected genes were important in the past
improvement, continued manipulation might
contribute to future gain.
If selected genes suffered a loss of diversity
because of selection, they are prime
candidates for introgressive breeding from wild
relatives.
Hypothesis: manipulation of the expression of
domestication and improvement genes will alter
key agronomic traits
0
Valine
Tyrosine
Tryptophan
Four genes that show evidence of selection are
involved in amino acid biosynthesis
25
Maize
10
0
Total AA
15
% of Kernel Weight
20
Threonine
Serine
Proline Total
Hydroxyproline
Proline
Phenylalanine
Methionine
Lysine
Leucine
Isoleucine
Histidine
Glycine
Glutamic Acid
Cysteine Total
Taurine
Cysteine
Aspartic Acid
Arginine Total
Ornithine
Arginine
Alanine
% of total AA
Selection for Amino Acid Content?
Teosinte
30
Landraces
25
20
15
10
5
5
Selection for Amino Acid Content?
Are there more genes in amino acid pathways
that have been selected?
Sequenced 16 genes in 28 maize inbreds, 16
teosinte, and 2 tripsacum.
Result: we found 4 genes that may have been
selected during domestication/improvement.
The Ultimate Selection Project
B73 with knockout
in selected gene
B73 – inbred line
B73 with teosinte
allele of selected gene
teosinte
Teosinte
Maize Landraces
Inbreds/Hybrids
Sequence Diversity in
Evolution and Crop Improvement
Sherry Flint-Garcia
Research Geneticist
USDA-ARS
MU Division of Plant Sciences
Photos courtesy J. Doebley
Insertion
B73
CO159
GT119
Tx501
Tx303
Mo17
Mp708
IHO
T218
Conserved region
Deletion
SNP
InDel
Molecular Diversity:
SNP: Single nucleotide polymorphism
InDel: Insertion deletion
SNPs and Indels are used markers for genetic analysis