Linkage Disequilibrium and Association Mapping: Issues & Opportunities for the Triticeae

Download Report

Transcript Linkage Disequilibrium and Association Mapping: Issues & Opportunities for the Triticeae

Linkage Disequilibrium and Association Mapping:
Issues & Opportunities for the Triticeae
Mark E. Sorrells and Flavio Breseghello
Department of Plant Breeding & Genetics
Cornell University
Overview
• Part I: A Genetic Model for Association Mapping in Plant
Breeding Populations
• Part II: Comparison of Different Plant Breeding Materials for
Association Mapping
• Part III: Association Mapping of Kernel Size and Milling
Quality in Soft Winter Wheat Cultivars
A Definition of Association Mapping
“Association analysis, also known as LD mapping or
association mapping, is a population-based survey
used to identify trait-marker relationships based on
linkage disequilibrium”
(Flint-Garcia et al. 2003)
Association Mapping as a Plant Breeding Strategy:
AM versus QTL Mapping
• Association Mapping can be conducted directly on the breeding
material, therefore:
• Direct inference from research to breeding is possible
• Phenotypic variation is observed for most traits of interest
• Marker polymorphism is higher than in biparental populations
• Routine evaluations provide phenotypic data
• Association Mapping provides other useful information about:
• Organization of genetic variation
• Polymorphism across the genome
Association Mapping as a Plant Breeding Strategy:
AM versus QTL Mapping
•
Type I error (false positives) can be higher because of:
•
Unaccounted population structure
•
Simultaneous selection of combinations of alleles at different loci
•
High sampling variance of rare alleles
•
Type II error can be higher (low power) because of:
•
Lower LD than in mapping populations
•
Unbalanced design due to differences in allele frequencies
•
Serious multiple-testing problem
A Genetic Model for AM in Plant Breeding Populations:
Association as Conditional Probabilities
Gene
Marker
Population genetics theory
(Hedrick 2005)
c
Breeding Pool
Gene={a}
Marker={m,M}
New Parent
(A,M)
t generations
Pr(A,M)=φ
Pr(a,M)=θ
Pr(a,m)=1-φ-θ
Pr(A,m)=0
Pr(A|M,c,t,φ,θ,w)
“Probability of a plant with marker allele M
to have gene allele A, t generations after the
introduction of A”
Freq. new parent:
φ=0.05
1.0
Relative fitness: w=1
0.8
Recombination x initial frequency
of M in the breeding pool
0
  0.05
  0.25
0.6
0.2
0.4
θ=0.05
θ=0.25
0.0
A novel marker allele at
10 cM distance can be
more predictive of the
QTL allele than one at 1
cM distance that was
present in the original
pop at a freq of 0.05
Pr(A|M)
Freq. Recombination: c
c=0.05
c=0.10
Pr(A|M)
Freq. M from original
pop = θ
c=0.01
θ=0
0
5
~8
10
Generations
t Generations
15
~18
20
Recombination x selection for M
• The generation at which the marker is depleted
depends on the selection intensity applied;
• The final frequency of A depends on selection and
tightness of linkage between marker and gene.
Pr(A|M)
Pr(A)
Generations
Freq. new parent: φ=0.05
Relative fitness: w = 4 (red),
2 (green), 1.25 (blue)
Freq. M from original pop: 0
Freq. Recombination: c =
0.01, 0.05, 0.10
Summary Part I
• In plant breeding populations, the locus most
associated with the trait is not necessarily the
closest locus;
• Loosely linked markers can still be useful for MAS if
high intensity of selection is applied.
Overview
• Part I: A Genetic Model for Association Mapping in
Plant Breeding Populations
• Part II: Comparison of Different Plant Breeding
Materials for Association Mapping
• Part III: Association Mapping of Kernel Size and
Milling Quality in Soft Winter Wheat Cultivars
Types of Populations
• Germplasm Bank Collection
• A collection of genetic resources including landraces, exotic
material and wild relatives.
• Synthetic Populations
• Outcrossing populations (either male-sterile or manually crossed)
synthesized from inbred lines. May be used for recurrent selection.
• Elite Lines
• Inbred lines (and checks) manipulated with the objective of
releasing new varieties in the short term.
Characteristics Related to Association Mapping:
Practical aspects
Synthetic
Populations
Aspects of AM
Germplasm bank
Elite Germplasm
Sample
Core-collection
Segregating
progenies
Elite lines and checks
Sample turnover
Static
Ephemeral
Gradually substituted
Source of
phenotypic data
Screenings
Progeny tests
Yield trials
Type of traits
High heritability traits;
Domestication traits
Depends on the
Low heritability traits:
evaluation scheme yield, resistance to
abiotic stresses
Type of marker
SNP
SSR / SNP
SSR
Characteristics Related to Association Mapping:
Genetic Expectations
Aspects of AM
Germplasm bank
Synthetic Populations
Elite Germplasm
Linkage
Disequilibrium
Low
Intermediate and
fast-decaying
High
Medium
Low
High
Allele diversity
among samples
High
Intermediate
Low
Allele diversity
within samples
Variable
1 or 2 alleles
(diploid species)
1 allele
(inbred lines)
Population
structure
Characteristics Related to Association Mapping:
Potential Applications
Aspects
Germplasm bank
Synthetic Populations
Elite Germplasm
Power
Low
Intermediate and
decreasing
High; could allow
genome scan
Resolution
High; could allow fine Intermediate and
mapping
increasing
Low
Use of
significant
markers
Transfer of new alleles Incorporation in
by marker-assisted
selection index
backcross
MAS in progenies
(requires validation)
Summary Part II
• Germplasm bank core-collections could be useful for allele-mining of
candidate genes and fine-mapped QTLs;
• Elite lines could be useful to detect genomic regions associated with
traits of interest;
• Synthetic populations might represent a balance between power and
precision, and have the major advantage of being unstructured.
Overview
• Part I: A Genetic Model for Association Mapping in Plant Breeding
Populations
• Part II: Comparison of Different Plant Breeding Materials for Association
Mapping
• Part III: Association Mapping of Kernel Size and Milling
Quality in Soft Winter Wheat Cultivars
Previous QTL information
Width
• Doubled-Haploid Population AC Reed x
2D
Grandin
• QTL for kernel size (width) near Xwmc18-2D
• Recombinant Inbred Population Synthetic
W7984 x Opata
• QTL for kernel size (length) on 5A and 5B
Length
5B
Plant Material
• 95 cultivars of soft winter wheat from the Northeast
of USA
• Mostly recent releases: 92>1990; 39>2000
• Representing 35 seed companies / institutions
• selected from 149 cultivars based on 18 unlinked SSR
markers
Genotypic Data
• Marker distribution: 93 SSR loci
• 33 on chromosome 2D
• 20 on chromosome 5A
•
9 on chromosome 5B
• 31 on 16 other chromosomes
• Data trimming
• rare alleles (freq<5%) were pooled with missing data, and
• considered as missing for LD and population structure analysis
• considered as allele for AM analysis
Methods: Population Structure
•
Data: 36 “unlinked” SSR markers
•
Program: Structure (Pritchard et al., 2000, Genetics 155: 945)
•
Model: without admixture (cultivars discretely assigned to subpopulations)
•
Validated subpopulations: Resampled subsets of 12, 18, 24 and 30 unlinked loci
•
Visualization: Factorial Correspondence Analysis (Benzecri, 1973 L' Analyse des
correspondances. Dunod)
Methods: Linkage Disequilibrium
• Statistics: r2 , with p-values from 1000 permutations
• Program: Tassel (maizegenetics.net)
• LD among linked loci:
• Scan of entire chromosome 2D
• Scan of pericentromeric region of chromosome 5A
• LD among unlinked loci:
• Computed among 36 unlinked loci
Methods: Association Mapping
• Statistical Model: Linear mixed-effects model
• marker as fixed effects
• subpopulations as random effects
• Program: R package lme (Pinheiro & Bates, 2000 Mixed-Effects Models in S and S-PLUS.
Springer)
• Multiple testing correction: 1000 permutations chromosome-wise
• Two-marker models: tested by likelihood ratio test
Population Structure:
Sample Subdivisions
S
2
S
3
S
4
Moderate Population Subdivision
S
1
Subpopulation No. of Varieties Fst
1
19
0.337
2
32
0.111
3
13
0.295
4
31
0.064
Total
95
0.188
Population Structure:
Factorial Correspondence Analysis
S2
S3
S4
S1
Population Structure:
Percentage of cultivars assigned to
one of 4 subpopulations
Resampling
Number of unlinked markers used for inference
of population structure
Linkage Disequilibrium:
Germplasm Sample Selection
• 149 lines genotyped
with 18 unlinked
SSR markers
R2
probability for unlinked SSR markers
149
lines
• Most similar lines
were excluded
• "Normalizing" the
sample drastically
reduced LD among
unlinked markers
95
lines
p<.0001
p<.001
p<.01
Definition of a baseline-LD specific for our sample
Defined as the 95th percentile of the distribution of r2 among unlinked loci
r2 estimates above this value are probably due to genetic linkage
Baseline LD for this sample: r2 = 0.0654
8
Normal curve
4
Normal Distr.
95th percentile
0.00
0
0.02
0.04
2
0.06
2
r
Density
0.08
0.10
6
0.12
LD
baseline
LD baseline
0.0
0
100
200
300
400
500
600
0.1
0.2
0.3
Correlation Coefficient r
0.4
Linkage Disequilibrium: Chromosome 2D
0.3
0.2
0.1
Baseline LD
0.0
r
2
0.4
0.5
0.6
Consistent LD was below 1 cM
0
20
40
60
cM
80
100
Linkage Disequilibrium: Chromosome 5A
0.2
0.4
r
2
0.6
0.8
1.0
LD extended for 5 cM
0.0
Baseline LD
~5 cM
0
10
20
30
cM
40
50
Loci Associated with Kernel Size (p-values)
Chromosome 2D
Agreed with
QTL in
Reed x Grandin
Likelihood
Ratio Test
Kernel Size
**
Locus
cM Name
Weight
Area
Length
NY
OH
NY
OH
NY
OH
Width
NY
OH
7
Xcfd56
0.069
0.160
0.012
0.119
0.076
0.031
0.000*
0.252
11
Xwmc111
0.005
0.020
0.005
0.108
0.003’
0.107
0.000*
0.000**
23
Xgwm261
0.145
0.016
0.019
0.009
0.027
0.009
0.058
0.001*
28
Xwmc112
0.012
0.057
0.047
0.120
0.480
0.367
0.001*
0.024
64
Xgwm30
0.081
0.862
0.053
0.848
0.312
0.820
0.000**
0.212
91
Xgwm539
0.042
0.038
0.030
0.039
0.001*
0.005
0.290
0.334
Milling Quality
None of the loci on 2D were significant after multiple testing correction
Loci Associated with Kernel Size (p-values)
Likelihood
Ratio Test
Chromosome 5A
n.s.
**
Agreed with
QTL in
M6 x Opata
Kernel Size
Locus
cM Name
Weight
Area
Length
NY
OH
NY
OH
NY
OH
Width
NY
OH
55
Xcfa2250
0.021
0.007
0.044
0.014
0.014
0.002*
0.637
0.649
55
Xwmc150b
0.002*
0.003
0.003
0.005
0.009
0.002*
0.093
0.429
56
Xbarc117
0.009
0.002*
0.021
0.005
0.118
0.022
0.044
0.039
60
Xbarc141
0.631
0.037
0.232
0.024
0.038
0.002*
0.852
0.863
Milling Quality
cM
Locus
55
Xcfa2250
Milling
Score
Flour Yield
ESI
Friability
Break-Flour
Yield
0.010
0.029
0.047
0.002*
0.081
B.L.U.E. of allele effects
Kernel Length
N. of Cultivars:
9
5
18
37
9
9
41
45
43
49
B.L.U.E. of allele effects
Kernel Width
N. of Cultivars:
41
14
8
15
18
24
5
10
19
B.L.U.E of allele effects
Kernel Weight
N. of Cultivars:
41
45
43
49
Summary Part III
• Linkage Disequilibrium
• LD on chromosome 2D was in the subcentimorgan scale
• LD on chromosome 5A extended for 5 cM, forming an LD block
• Association Mapping
• Loci on chromosome 2D were associated with kernel width
• Loci on chromosome 5A were associated with kernel length and friability
• Favorable and unfavorable marker alleles were identified
•
In recurrent selection, markers could be used to carry information from a “good year”
to a “bad year”
•
In pedigree breeding, markers could carry information about yield potential from the
phase of replicated field trials to the phase of singleplant selection
Acknowledgements
• USDA Soft Wheat Quality Lab, Wooster, OH
• Embrapa
A
rro
zeF
e
ijã
o
• Technical Support:
• David Benscher
• James Tanaka
• Gretchen Salm
Cornell Small Grains Breeding & Genetics Project
James
Tanaka
Dani
Grechen Satwayan
Salm
Mike
Gifford
Rob
Elshire
Jesse
Munkvold
David
Benscher
Abigail
Losh