Transcript Association Mapping in Wheat Flavio Breseghello Advisor
Association Mapping as a Breeding Strategy
Mark E. Sorrells and Flavio Breseghello Department of Plant Breeding & Genetics Cornell University
Presentation Overview
A Genetic Model for Association Mapping in Plant Breeding Populations Comparison of Different Plant Breeding Materials for Association Mapping Association Mapping of Kernel Size and Milling Quality in Soft Winter Wheat Cultivars
A Genetic Model for AM in Plant Breeding Populations: Association as Conditional Probabilities Gene Marker Population genetics theory (Hedrick 2005) Recombination (c) New Parent (A,M) Breeding Pool Gene={a} Marker={m,M} Pr(A,M)= φ Pr(a,M)= θ Pr(a,m)=1 φ θ Pr(A,m)=0 Pr(A|M,c,t,
t generations
φ , θ ,w) “Probability of a plant with marker allele M to have gene allele A, t generations after the introduction of A”
Recombination x initial frequency of M in the breeding pool
Freq. new parent: φ=0.05
Relative fitness: w=1 Freq. M in original population = θ Freq. Recombination c A novel marker allele at 10 cM distance can be more predictive of the QTL allele than an allele 1 cM away if it was present in the original pop at a freq of 0.05
θ =0 c=0.10
θ =0.05
θ =0.25
0 5 c=0.05
~8 c=0.01
10 15 0 0.05
0.25
~18 20
Recombination x selection for M
• The generation at which the marker is depleted [Pr(A|M)=Pr(A)] , depends on the selection intensity applied; • The final frequency of A depends on selection and tightness of linkage between marker and gene. Freq. new parent: φ=0.05
Relative fitness: 2 (green), w = 4 (red), 1.25 (blue) Freq. M in original pop: 0 Freq. Recombination: c = 0.01, 0.05, 0.10
Pr(A|M) Pr(A) Generations
Summary
• In plant breeding populations, the locus most associated with the trait is not necessarily the closest locus; • Loosely linked markers can still be useful for MAS if high intensity of selection is applied.
MAS for Complex Traits: Issues
• Accurate detection and estimation of QTL effects • Pre-existing marker alleles in a breeding population can be linked to non-target QTL alleles • Multiple QTL alleles can have different relative values • Gene x gene and gene by environment interactions
Association Analysis as a Breeding Strategy
• • Most association studies have focused on estimating linkage disequilibrium and fine mapping.
Breeding programs are dynamic, complex genetic entities that require frequent evaluation of marker / phenotype relationships.
Breseghello, F., and M.E. Sorrells. 2006. Association mapping of kernel size and milling quality in wheat (
Triticum aestivum
L.) cultivars. Genetics 172:1165-1177.
Breseghello, F., and M.E. Sorrells. 2006. Association analysis as a strategy for improvement of quantitative traits in plants. Crop Sci. In press.
Association Mapping versus QTL Mapping
• Association Mapping can be conducted directly on the breeding material, therefore: • Direct inference from data analysis to breeding is possible • Phenotypic variation is observed for most traits of interest • Marker polymorphism is higher than in biparental populations • Routine variety trial evaluations provide phenotypic data • Association Mapping provides other useful information about: • Organization of genetic variation in relevant breeding populations • Novel alleles can be identified and their relative value can be assessed as often as necessary
Association Mapping versus QTL Mapping
• Type I error (false positives) can be higher because of: • Unaccounted population structure • Simultaneous selection of combinations of alleles at different loci • High sampling variance of rare alleles • Type II error can be higher (low power) because of: • Lower LD than in biparental mapping populations • Unbalanced design due to differences in allele frequencies • A larger multiple-testing problem because of lower LD
Integration of Association Analysis in a Breeding Program Parental Selection Germplasm Hybridization New Populations Marker Assisted Selection Selection (Intermating)
Novel & Validated QTL/Marker Associations
New Synthetics, Lines, Varieties Evaluation Trials Elite germplasm feeds back into hybridization nursery Genotypic & Phenotypic data Elite Synthetics, Lines, Varieties
Types of Populations
• Germplasm Bank Collection • A collection of genetic resources including landraces, exotic material and wild relatives. • Synthetic Populations • Outcrossing populations (either male-sterile or manually crossed) synthesized from inbred lines. May be used for recurrent selection.
• Elite Lines • Inbred lines (and checks) manipulated with the objective of releasing new varieties in the short term.
Characteristics Related to Association Mapping: Practical aspects
Aspects of AM
Sample
Germplasm bank
Core-collection Sample turnover Static
Synthetic Populations
Segregating progenies Ephemeral
Elite Germplasm
Elite lines and checks Gradually substituted Source of phenotypic data Type of traits Screenings High heritability traits; Domestication traits Progeny tests Yield trials Depends on the evaluation scheme Low heritability traits: yield, resistance to abiotic stresses
Characteristics Related to Association Mapping:
Genetic Expectations
Aspects of AM
Linkage Disequilibrium Population structure Allele diversity among samples Allele diversity within samples
Germplasm bank Synthetic Populations
Low Intermediate and fast-decaying Medium Low
Elite Germplasm
High High High Variable Intermediate 1 or 2 alleles (diploid species) Low 1 allele (inbred lines)
Characteristics Related to Association Mapping:
Potential Applications
Aspects
Power Resolution
Germplasm bank
Low
Synthetic Populations
Intermediate and decreasing High; could allow fine mapping Intermediate and increasing
Elite Germplasm
High; could allow genome scan Low Use of significant markers Transfer of new alleles by marker-assisted backcross Incorporation in selection index Forward Breeding MAS in progenies (requires validation)
Previous QTL information
• Doubled-Haploid Population AC Reed x Grandin • QTL for kernel size ( width ) near
Xwmc18 2D
Width 2D • Recombinant Inbred Population Synthetic W7984 x Opata (ITMI population) • QTL for kernel size ( length ) on 5A and 5B Length 5B
Association Analysis
• • Materials 95/149 soft winter wheat cultivars from the Northeastern US: representing 35 seed companies / institutions Mostly recent releases, 93 SSR loci: • 33 on 2D, 20 on 5A, 9 on 5B, 31 on 16 other chromosomes Rare alleles (freq<5%):considered as missing for LD and population structure analysis; considered as allele for AM analysis • • • • Methods Population Structure: 36 “unlinked” SSR markers-
Structure
without admixture,
SPAGeDi
(Hardy & Vekemans) program for Kinship ; Visualization: Factorial (Multiple) Correspondence Analysis (Benzecri, 1973
L' Analyse des correspondances.
Dunod) Linkage Disequilibrium: 1000 permutations
Tassel
(maizegenetics.net) used to compute r 2 , with p-values from Association Analysis:
R
stats package
lme
used to analyze Linear mixed-effects model with marker as fixed effects (selected from previously identified QTL regions) and subpopulations or Kinship as random effects (no obvious differentiating characteristics); Two-marker models: tested by likelihood ratio test Jianming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association Mapping Accounting for Multiple Levels of Relatedness
Nature Genetics 38:203-208
Estimating Relatedness The K Matrix
Fij
= (
Qij
-
Qm
)/(1 (Ritland, Loiselle)
Qm
)
i F
11 ………….
j Θ ij
≅ .
.
.
.
.
.
.
.
F ij
If Fij is negative, then it is set to zero.
F nj
……
F nn
Relatedness (K)
In cattle studies the analogous matrix is estimated from pedigrees, and it controls for the polygene effect Ji anming Yu, Gael Pressoir, et al. (2006) A Unified Mixed-Model Method for Association Mapping Accounting for Multiple Levels of Relatedness
Nature Genetics 38:203-208
Population Structure:
Sample Subdivisions
Subpopulation No. of Varieties Fst 1 19 0.337
2 32 0.111
3 4 Total 13 31 95 0.295
0.064
0.188
Moderate Population Subdivision
Population Structure:
Factorial Correspondence Analysis Orthogonal views of 4 soft winter wheat subpopulations
S2 S3 S4 S1
• •
Linkage Disequilibrium: Germplasm
Sample Selection
R 2 probability for unlinked SSR markers 149 lines genotyped with 18
unlinked
SSR markers Most similar lines were excluded
149 lines
• "Normalizing" the sample drastically reduced LD among unlinked markers
95 lines
p<.0001
p<.001
p<.01
Definition of a baseline-LD specific for our sample
Defined as the 95 th percentile of the distribution of r 2 among unlinked loci r 2 estimates above this value are probably due to genetic linkage Baseline LD for this sample: r 2 = 0.0654
Normal curve
LD baseline Normal Distr. 95 th percentile
0 100 200 300 400 500 600 0.0
0.1
0.2
Correlation Coefficient r 0.3
0.4
Linkage Disequilibrium: Chromosome 2D
Consistent LD was below 1 cM, localized LD 1-5 cM Baseline LD 0 20 40 cM 60 80 100
Linkage Disequilibrium: Chromosome 5A
Significant LD extended for 5 cM in pericentromeric region Baseline LD 0 10 20 cM 30 40 50
**
Loci Associated with Kernel Size (p-values)
Chromosome 2D Agreed with QTL in Reed x Grandin
Kernel Size
Locus Weight Area Length Width cM Name NY OH NY OH NY OH NY OH 7 11
Xcfd56 Xwmc111
0.069
0.005
0.160
0.020
0.012
0.005
0.119
0.108
0.076
0.003’ 0.031
0.107
0.000* 0.000* 0.252
0.000** 23
Xgwm261
28
Xwmc112
64
Xgwm30
91
Xgwm539
0.145
0.012
0.081
0.042
0.016
0.057
0.862
0.038
0.019
0.047
0.053
0.030
0.009
0.120
0.848
0.027
0.480
0.312
0.039
0.001* 0.009
0.058
0.367
0.001* 0.820
0.000** 0.005
0.290
0.001* 0.024
0.212
0.334
Milling Quality
None of the loci on 2D were significant after multiple testing correction
n.s.
Loci Associated with Kernel Size (p-values) Chromosome 5A
Agreed with QTL in M6 x Opata
Kernel Size
Locus cM Name Weight NY OH NY Area OH Length NY OH NY Width OH 55
Xcfa2250
0.021
0.007
0.044
0.014
0.014
0.002* 0.637
0.649
55
Xwmc150b
0.002* 0.003
0.003
0.005
0.009
0.002* 0.093
0.429
**
56
Xbarc117
0.009
0.002* 0.021
0.005
0.118
0.022
0.044
0.039
60
Xbarc141
Milling Quality
cM Locus 55
Xcfa2250
0.631
0.037
0.232
Milling Score 0.010
Flour Yield 0.029
0.024
0.038
ESI 0.047
0.002* 0.852
Friability 0.002* 0.863
Break-Flour Yield 0.081
B.L.U.E. of allele effects Kernel Length
N. of Cultivars: 9 5 18 37 9 9 41 45 43 49
B.L.U.E. of allele effects Kernel Width
N. of Cultivars: 41 14 8 15 18 24 5 10 19
B.L.U.E of allele effects Kernel Weight
N. of Cultivars: 41 45 43 49
• Linkage Disequilibrium
Conclusions
• Variation in LD across the genome can be characterized in relevant germplasm • Markers closely linked to QTL of interest can be identified and allelic effects quantified • Association Mapping as a Breeding Strategy • For recurrent selection, markers could be used to carry information from a “good year” to a “bad year” • In pedigree breeding, markers could carry information about traits of interest from replicated field trials to single row or single plant selection • Allelic values of previously identified alleles can be updated annually based on advanced trial data combined with genotypic data • New alleles can be identified and characterized to determine their relative value • A selection index can be used to incorporate both phenotypic and molecular data
Acknowledgements
• USDA Soft Wheat Quality Lab, Wooster, OH • Embrapa
F o
Technical Support: • David Benscher • James Tanaka • Gretchen Salm
Kangaroo Island Wayne Powell