Transcript Document
Teosinte Maize Landraces Inbreds/Hybrids Sequence Diversity in Evolution and Crop Improvement Sherry Flint-Garcia Research Geneticist USDA-ARS MU Division of Plant Sciences Photos courtesy J. Doebley Sequence Diversity Evolution: What are the forces that cause evolution? Speciation & hybridization Uncovering evolutionary history Crop Improvement: The teosinte-maize story The Four Forces of Evolution Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution. Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population. Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal. Genetic drift -- random changes in gene frequency. This is very important in small populations. Mutation: Generation of New Alleles Mutations are the result of mistakes in DNA replication, exposure to UV or to some chemicals (mutagens) and other causes. Point mutations changing one nucleotide to another e.g., C-->T Sickle Cell Anemia A single point mutation causes a dramatic change in phenotype. Other types of mutations Indels insertions/deletions Cause frame-shifts, & usually premature ‘stops’ Gene duplication May lead to new functions Chromosomal mutations Inversions, translocations, deletions Polyploidy Very common in plants May lead to new species in one step Most point mutations have no effect or almost no effect. Why? Most of the genome seems to be ‘junk’ -- at least it doesn’t code for proteins. Many mutations within protein-coding region of genes don’t change the amino acid specified. i.e., there is redundancy in the genetic code. For example, 6 different codons specify the amino acid leucine. The Four Forces of Evolution Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution. Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population. Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal. Genetic drift -- random changes in gene frequency. This is very important in small populations. Natural Selection Peppered moth (Biston betularia) evolution during the industrial revolution in England Early 1800s = pre-industrial Bark of trees were white Almost all moths were of typica form ‘typica’ form 1895 = Industrial Era Bark of trees were covered in black soot 98% of moths were of carbonaria form Today = Clean Air laws enforced Prevalence of carbonaria form declining ‘carbonaria’ form Brassica oleracea The Four Forces of Evolution Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution. Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population. Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal. Genetic drift -- random changes in gene frequency. This is very important in small populations. Gene Flow Tends to homogenize populations. Rates of gene flow depend on the spatial arrangement of populations. “Directional” movement of alleles Migration occurs at random among a group of equivalent populations. Migration along a linear set of populations Populations are continuous. The Four Forces of Evolution Mutation -- spontaneous changes in the DNA of gametes. Prerequisite to all other evolution. Natural Selection -- genetically-based differences in survival or reproduction that leads to genetic change in a population. Gene flow -- movement of genes between populations. In plants this can be accomplished by pollen or seed dispersal. Genetic drift -- random changes in gene frequency. This is very important in small populations. Founder effect: Gene flow and genetic drift are responsible for the limited genetic variation on islands, relative to mainland populations. Speciation and Hybridization Speciation – how do new species arise? What is a species, anyway? Most species were originally described by their morphology. The Problem: Convergence Similar features in unrelated organisms due to evolution of traits that “work” in similar environments Convergent structures in the ocotillo (left) from the American Southwest, and in the allauidia (right) from Madagascar. Nectar feeders have converged on this hovering long-tongued morphology. Speciation and Hybridization Biological Species Concept (BSC) Based on reproductive compatibility Natural spatial, temporal, and morphological discontinuities generally correspond to fertility barriers The Problem: In plants, many named species can hybridize. Most dandelions are asexual. So the biological species concept (BSC) doesn’t apply. How can you name species depending on who can mate with whom when the organisms do not mate at all?! Scarlet and Black oaks can hybridize and inhabit the same range -- but they have different microhabitat preferences so hybridization is rare. These pines can also hybridize but they shed their pollen at different times of the season Speciation by Hybridization Hybridization often shows how difficult it is to apply the BSC to plants. The hybrid in this case is a new species. The rearrangements of its chromosomes make it infertile with either parent. hybrid As the climate becomes drier the desert splits the range of this hypothetical tree species. This reduces gene flow between the now isolated populations and sets the stage for speciation. Evolution of species that are geographically separated. Genetic drift plays a significant role. “Edge effect” where evolution of reproductive barriers occurs between neighboring populations. Requires considerable selection pressure. Establishment of a new population with a different ecological niche within the same geographical range of the parental population Uncovering Evolutionary History Taxonomy vs. Systematics Estimating Phylogeny Distance Methods Maximum Parsimony Methods Maximum Likelihood Methods Taxonomy vs. Systematics Taxonomy Discovering Describing Naming Classifying Systematics Figuring out the evolutionary relationships of species Summarize the evolutionary history of a group Plant Taxonomy taxon - any group at any rank kingdom division (phylum) class order family genus species corn = common name Plantae (Viridiplantae) Anthophyta Liliopsida Commelinales Poaceae Zea Zea mays always capitalized never capitalized Plant Systematics A phylogenetic tree is used to illustrate systematic relationships Modern taxonomic groups generally correspond to clades on a phylogenetic tree (i.e. cladogram) Example: phylogenetic tree of the grass family Mathews et al. 2000 American Journal of Botany Angiosperm Phylogeny Group Tree “Dicots” are not a monophyletic group. Data Types that can be used to Estimate a Phylogeny Cross Compatibility Uses the ‘Biological Species Concept’ Morphological Continuous traits Meristic (countable) traits Cytological Chromosome number Chromosome features Pairing in hybrids Molecular data Secondary chemicals Proteins DNA Allele frequencies at many loci (isozymes, SSR) DNA sequences, considered as a whole DNA sequences, considered site-by-site Maximum Parsimony (Minimum Evolution) Methods The process of attaching preference to the pathway that requires the invocation of the smallest number of mutational events. Most effective when examining sequences with strong similarity Underlying premises: Mutations are exceedingly rare events. The more unlikely events a model invokes, the less likely the model is to be correct. Using only trait 1 … trait1 2 3 4 trait5 0 1.2 red A T Species 2 0 3.4 blue G C Species 3 1 3.5 red A T Species 1 sp2 0<->1 Species 4 1 4.0 red A T Species 5 1 2.8 blue G T Traits must have discrete character states. sp1 sp3 sp4 sp5 Must have same character state in at least 2 taxa. But traits 3 & 4 disagree with trait 1. trait1 2 3 4 trait5 Species 1 0 1.2 red A T Species 2 0 3.4 blue G C Species 3 1 3.5 red A T Species 4 1 4.0 red A T Species 5 1 2.8 blue G T sp5 sp2 Red<->blue A<->G sp1 sp4 sp3 Every possible tree is considered individually for each informative site (computationally intensive). After all informative sites have been considered, the tree that invokes the smallest total number of substitutions is the most parsimonious. 5 1 3 4 2 0 2 3 4 5 1 0 Blue G 4 substitutions required Red A 1 0 Blue Blue G G 5 substitutions required Red A 1 Distance-based approaches Compare each taxon to every other taxon to estimate a “distance matrix” Sp1 Sp2 Sp3 Sp4 Sp5 Sp1 Sp2 Sp3 Sp4 Sp5 0 d12 d13 d14 d15 0 d23 d24 d25 0 d34 d35 0 d45 0 Distances are then ‘clustered’ to estimate a phylogenetic tree. Distance-based approaches Compare each taxon to every Example: DNA sequence considered as a whole other taxon to estimate a 10 20 30 40 50 Sp1: GTGCTGCACG GCTCAGTATA GCATTTACCC TTCCATCTTC AGATCCTGAA “distance matrix” Sp2: ACGCTGCACG GCTCAGTGCG GTGCTTACCC TCCCATCTTC AGATCCTGAA Sp1 Sp2 Sp3 Sp4 Sp5 Sp1 Sp2 Sp3 0 9 0 Sp4 Sp5 8 12 15 11 15 18 0 10 13 0 5 0 Sp3: GTGCTGCACG GCTCGGCGCA GCATTTACCC TCCCATCTTC AGATCCTATC Sp4: GTATCACACG ACTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCCTAAA Sp5: GTATCACATA GCTCAGCGCA GCATTTGCCC TCCCGTCTTC AGATCTAAAA Distance-based approaches Sp1 Sp2 Sp3 Sp4 Sp5 Sp1 Sp2 Sp3 0 9 0 Sp4 Sp5 8 12 15 11 15 18 0 10 13 0 5 Distances are then ‘clustered’ to estimate a phylogenetic tree. Example: UPGMA algorithm Unweighted Pair-Group Method using Arithmetic means The smallest distance is identified, the average of the two combined taxa is calculated, and the matrix is recalculated. This iteration is repeated. 0 2.5 4 2.5 5 Distance-based approaches Sp1 Sp2 Sp3 4-5 Sp1 Sp2 Sp3 4-5 0 9 8 13.5 0 11 16.5 0 11.5 0 4 4 1 3 2.5 4 2.5 5 Distance-based approaches Sp2 Sp2 1-3 4-5 0 1-3 4-5 10 16.5 0 12.5 0 4 4 5 1 3 2 2.5 4 2.5 5 Distance-based approaches 1-2-3 4-5 1-2-3 4-5 0 12.5 0 6.5 6.5 4 4 5 1 3 2 2.5 4 2.5 5 Maximum Likelihood Methods Best suited for DNA and protein sequence data Requires a model of evolution Each nucleotide/amino acid substitution has an associated likelihood A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameters Function is minimized 1: 2: 3: 4: A A A A C C C C G G G A C C C C G G A A T T A G T T T G G G G G G G A A G G A A Based on a model of nucleotide substitution matrix (transitions and transversions) C G T 10-6 2 x10-6 10-6 1 3 1 2 A 1 2 4 3 4 C 10-6 1 3 L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13 4 T 2 0.25 L0 L1 L2 T 10-6 G 2 x 10-6 L4 T A L5 T Tree 1 L6 A G G T 2 x10-6 10-6 1 10-6 2 x10-6 10-6 2 x10-6 1 10-6 10-6 1 1: 2: 3: 4: A A A A C C C C G G G A C C C C 1 3 2 4 G G A A T T A G T T T G G G G G G G A A G G A A Consider every possible base assignment to each node and calculate the likelihood L(Tree 1) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 5 x 10-13 L(Tree 2) = L0 x L1 x L2 x L3 x L4 x L5 x L6 = 1 x 10-18 T L0 C L1 L2 T T L5 T Tree 1 T L6 A 0.25 L1 G L4 L0 G L3 T L2 2 x 10-6 L4 T L5 A Tree 2 10-6 Repeat for each of node assignment, and each site in alignment. Probability of that unrooted tree is the sum of all individual trees. Repeat for each G unrooted tree and 2L6 x 10-6 choose the tree with G the highest liklihood. The TeosinteMaize Story 6000 – 10,000 years ago The practical side of sequence diversity PLANT BREEDING! Sequence Diversity in Teosinte Sequence Diversity in Maize Selection During Domestication and Improvement Sequence Diversity and Plant Breeding Genetic diversity within a crop species is the raw material for current plant breeding Genetic diversity is the insurance policy to enable plant breeders to adapt crops to changing environments The Problem To what degree is limiting genetic diversity inhibiting genetic improvement in corn? 160 Bushels Per Acre 140 120 100 80 60 40 Open Pollinated Varieties Single Cross Hybrids 20 Double Cross Hybrids 0 1866 1876 1886 1896 1906 1916 1926 1936 1946 1956 1966 1976 1986 1996 Year Two Views of the Problem “Most of the corn germplasm in use in the USA today is derived from mixtures of only two major races [out of ~ 300 races total] (Wallace and Brown, 1956). The simplest means of correcting this situation and of increasing the genetic diversity of this important crop is to introduce unrelated sources of germplasm” (Brown and Goodman, 1977, Races of Corn, in Corn and Corn Improvement) [From a project comparing sequence diversity in 21 genes of nine U.S. inbred lines with 16 diversity maize landraces] “We found that our sample of [U.S.] inbreds contained a level of [SNP] diversity that was 77% the level of diversity in our landrace sample.” (Tenaillon et al., 2001, PNAS, 98:9161-9166) Sequence Diversity in Maize How has selection shaped sequence diversity in maize? Survey SNPs from ~1800 genes in diverse maize and teosinte germplasm Screen 4000 candidate genes for evidence of selection Practical Goal: identify genes exhibiting selection Domestication, agronomic improvement, and local adaptation Allele Frequencies teosinte Domestication landraces Plant Breeding modern inbreds Unselected Gene Domestication Gene Improvement Gene Can we develop genomic screens to identify genes that have undergone selection? 1. Invariant SSR approach 2. Direct Sequencing Approach What proportion of genomic sequences that have low allelic diversity among inbreds result from selection for domestication? Contrast sequence diversity among teosintes, landraces, and inbreds Screening SSR primers against 12 inbred lines 1,772 total SSRs 1,053 were polymorphic (Class I) 719 were invariant (Class II) Invariant SSR primers Invariant SSR Screening 470 invariant SSR primer sets 321 monomorphic throughout 60 polymorphic in both exotics and teosintes 14 polymorphic only in exotics 75 polymorphic only in teosintes (Class II-E) Non - Class II Teosinte (6) Landrace (5) US Inbreds Landrace (5) US Inbreds Class II Teosinte (6) Vigouroux et al. 2002. PNAS 99: 9650 Analysis of Class II-E SSRs 31 Class I SSRs and 44 Class II-E SSRs 44 teosinte and 45 landrace accessions Tested for selection (loss of diversity) 0 Class I SSRs showed evidence of selection 15 Class II-E SSRs showed evidence of selection Extrapolated back to the 1772 total SSRs: “1.4% genes have been selected” Direct Sequencing Approach Purpose: to develop a SNP resource for the maize community Result: a LOT of data!!! Number of unigenes Distribution of SNP Haplotypes (patterns) 100 90 80 70 60 50 40 30 20 10 0 470 maize Unigenes in 14 maize lines Mean haplotype # = 4.46 1 2 3 4 5 6 7 8 9 10 11 Number of haplotypes Conserved 12 > 80% of unigenes have 2 to 7 haplotypes Diverse For each gene, a few haplotypes account for much of the diversity Are genes with low inbred diversity enriched for domestication and improvement candidates? (Masanori Yamasaki, post-doc in McMullen Lab) 36 genes with no diversity among a 14-inbred set Sequenced same region in 16 landraces, 16 teosintes, and a Tripsacum dactyloides sample. Tripsacum teosinte landraces Test for selection on inbreds, landraces and teosintes compared to four neutral genes. inbreds Selection Tests for 33 (of 36) Genes 5 genes were significant in both the inbreds and the landraces (evidence for domestication genes). 7 genes were significant in the inbreds but not the landraces (evidence for improvement genes). 1 additional gene was classified as either domestication or improvement depending on the test. 13 out of 33 genes = 39% !! Yamasaki et al. submitted Selection on a Genomic Scale Sequenced 774 maize unigenes in 14 maize inbreds and 16 teosinte accessions Tested for selection using coalescent simulations Result: 2-4% had experienced artificial selection Assume 59,000 genes in maize 59,000 x 2% = 1200 selected genes Wright et al. 2005 Science 308: 1310 Where are we going with this? Before genomics, 11 genes had been identified as selected by population genetic approaches. By sequencing 1000 genes, have ~50 novel candidates. 1140 more ! We need: 1. to completely sequence the maize genome to identify ALL genes. 2. to resequence all remaining genes in multiple maize inbreds and teosinte accessions. Signatures of Selection If selected genes were important in the past improvement, continued manipulation might contribute to future gain. If selected genes suffered a loss of diversity because of selection, they are prime candidates for introgressive breeding from wild relatives. Hypothesis: manipulation of the expression of domestication and improvement genes will alter key agronomic traits 0 Valine Tyrosine Tryptophan Four genes that show evidence of selection are involved in amino acid biosynthesis 25 Maize 10 0 Total AA 15 % of Kernel Weight 20 Threonine Serine Proline Total Hydroxyproline Proline Phenylalanine Methionine Lysine Leucine Isoleucine Histidine Glycine Glutamic Acid Cysteine Total Taurine Cysteine Aspartic Acid Arginine Total Ornithine Arginine Alanine % of total AA Selection for Amino Acid Content? Teosinte 30 Landraces 25 20 15 10 5 5 Selection for Amino Acid Content? Are there more genes in amino acid pathways that have been selected? Sequenced 16 genes in 28 maize inbreds, 16 teosinte, and 2 tripsacum. Result: we found 4 genes that may have been selected during domestication/improvement. The Ultimate Selection Project B73 with knockout in selected gene B73 – inbred line B73 with teosinte allele of selected gene teosinte Teosinte Maize Landraces Inbreds/Hybrids Sequence Diversity in Evolution and Crop Improvement Sherry Flint-Garcia Research Geneticist USDA-ARS MU Division of Plant Sciences Photos courtesy J. Doebley Insertion B73 CO159 GT119 Tx501 Tx303 Mo17 Mp708 IHO T218 Conserved region Deletion SNP InDel Molecular Diversity: SNP: Single nucleotide polymorphism InDel: Insertion deletion SNPs and Indels are used markers for genetic analysis