Corporate Profile

Download Report

Transcript Corporate Profile

CSS 650 Advanced Plant Breeding

• • Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Plant Breeding

“The science, art, and business of improving plants for human benefit” Considerations: – Crop(s) – Production practices – End-use(s) – Target environments – Type of cultivar(s) – Traits to improve – Breeding methods – Source germplasm – Time frame – Varietal release and intellectual property rights Bernardo, Chapter 1

Plant Breeding

A common mistake that breeders make is to improve productivity without sufficient regard for other characteristics that are important to producers, processors and consumers.

 Well-defined Objectives  Good Parents  Genetic Variation  Good Breeding Methods  Functional Seed System  Adoption of Cultivars by Farmers

Quantitative Traits

• • •

Continuum of phenotypes (metric traits) Often many genes with small effects Environmental influence is greater than for qualitative traits

Specific genes and their mode of inheritance may be unknown

Analysis of quantitative traits

– population parameters • means • variances – molecular markers linked to QTL

Populations

• • • • In the genetic sense, a population is a breeding group – individuals with different genetic constitutions – sharing time and space In animals, mating occurs between individuals – ‘Mendelian population’ – genes are transmitted from one generation to the next In plants, there are additional ways for a population to survive – self-fertilization – vegetative propagation Definition of ‘population’ may be slightly broader for plants – e.g., lines from a germplasm collection Falconer, Chapt. 1; Lynch and Walsh, Chapt. 4

What do population geneticists do?

Study genes in populations

– Frequency and interaction of alleles – Mating patterns, genotype frequencies – Gene flow – Selection and adaptation vs random genetic drift – Genetic diversity and relationship – Population structure

Related Fields

– Evolutionary Biology – e.g., crop domestication – Landscape Genetics

Gene and genotype frequencies

For a population of diploid organisms: Frequencies # Individuals Proportions A 1

p

80 Alleles 0.4

A 2

q

120 0.6

A 1 A

P

11 16 0.16

1

p

+

q

= 1

P

11 +

P

12 +

P

22 = 1 Genotypes A 1 A 2

P

12 48 0.48

A 2 A

P

22 36 0.36

2

p 1 p 2

p

P 11

 1 2

P 12

q

P 22

 1 2

P 12

 0

.

16  0

.

24  0

.

4  0

.

36  0

.

24  0

.

6 Bernardo, Chapter 2

Gene frequencies (another way)

Number of individuals = N = N 11 + N 12 + N 22 = 100 Number of alleles = 2N = N 1 + N 2 = 200

Alleles Genotypes Frequencies # Individuals A 1

p

80 A 2

q

120 A 1 A 1

P

11 16 A 1 A 2

P

12 48 A 2 A 2

P

22 36 Proportions 0.4

0.6

0.16

0.48

0.36

p p 2 1  p  q      N     N 11 22   1 2 1 N 2 N 12   12      N N    2 N 11  2 N 22  N 12  N 12   2 N 2 N    2  2 * * 16 36   48  200 48  200   0 .

4 0 .

6

Allele frequencies in crosses

Inbred x inbred

Alleles are unknown, but allele frequencies at segregating loci are known F 1 and F 2 :

p

=

q

=

0.5

BC 1 BC 2 BC 3 BC 4

p

0.75

0.875

0.9375

0.96875

q

0.25

0.125

0.0625

0.03125

Value of

q

is reduced by ½ in each backcross generation

Factors that may change gene frequencies

• • Population size – changes may occur due to sampling  assume ‘large’ population • Differences in fertility and viability – parents may differ in fertility – gametes may differ in viability – progeny may differ in survival rate  assume no selection Migration and mutation  assume no migration and no mutation

Factors that may change genotype frequencies

• Changes in

genotype

frequency (not

gene

frequency) Mating system – assortative or disassortative mating – selfing – geographic isolation  assume that mating occurs at random (panmixia)

Hardy-Weinberg Equilibrium

• Assumptions – large, random-mating population – no selection, mutation, migration – normal segregation – equal gene frequencies in males and females – no overlap of generations (no age structure) • Note that assumptions only need to be true for the locus in question  Gene and genotype frequencies remain constant from one generation to the next  Genotype frequencies in progeny can be predicted from gene frequencies of the parents  Equilibrium attained after one generation of random mating

Hardy-Weinberg Equilibrium

Frequencies Example Genes in parents A 1

p

0.4

A

q

2 0.6

A 1 A 1

P 11 = p 2

0.16

Genotypes in progeny A 1 A 2

P 12 = 2pq

0.48

A 2 A 2

P 22 = q 2

0.36

Expected genotype frequencies are obtained by expanding the binomial (

p

+

q

) 2 =

p 2 + 2pq + q 2 = 1

A 1 A 2 A 1

p

2 =.16

pq

=.24

A 2

pq

=.24

q

2 =.36

p

= 0.4

q

= 0.6

Equilibrium with multiple alleles

For multiple alleles, expected genotype frequencies can be found by expanding the multinomial (

p

1 +

p

2 + ….+

p

n ) 2 For example, for three alleles: 

p

1 

p

2 

p

3  2 

p

1 2  2

p p

1 2  2

p p

1 3 

p

2 2  2

p p

2 3 

p

3 2 Corresponding genotypes: A 1 A 1 A 1 A 2 A 1 A 3 A 2 A 2 A 2 A 3 A 3 A 3 Lynch and Walsh (pg 57) describe equilibrium for autopolyploids

Relationship between gene and genotype frequencies

• • •

f(

A 1 A 2 ) has a maximum of 0.5, which occurs when

p

=

q

=0.5

Most rare alleles occur in heterozygotes Implications for – F 1 ?

– F 2 ?

– Any BC?

1 0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 A 1 A 1 A 1 A 2 A 2 A 2 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Frequency of A 2

0.9

1

Applications of the Hardy-Weinberg Law

• • • Predict genotype frequencies in random-mating populations Use frequency of recessive genotypes to estimate the frequency of a recessive allele in a population – Example: assume that the incidence of individuals homozygous for a recessive allele is about 1/11,000.

q

2 = 1/11,000

q

 0.0095 Estimate frequency of individuals that are carriers for a recessive allele

p

= 1 - 0.0095 = 0.9905 2

pq

= 0.0188  2%

Testing for Hardy-Weinberg Equilibrium

All genotypes must be distinguishable A 1 A 1 233 242.36

Genotypes A 1 A 2 385 366.26

A 2 A 2 129 138.38

Gene frequencies A 1 0.5696

A 2 0.4304

Observed Expected

N

=

N

11 +

N

12 +

N

22 = 233 + 385 + 129 = 747

1

N 11

 0

.

5

* N 12 N E

(

N 11

) 

p 1 2 * N

 ( 233  1 2   385 ) / 747  0 .

5696   0 .

5696  2 * 747  242 .

36

Chi-square test for Hardy-Weinberg Equilibrium

χ

2    Obs Exp  2 Exp  1 .

96 critical

χ

2 1df  3 .

84

Example in Excel

only 1 df because gene frequencies are estimated from the progeny data • • Accept H 0 : no reason to think that assumptions for Hardy Weinberg equilibrium have been violated – does not tell you anything about the fertility of the parents When you reject H 0 , there is an indication that one or more of the assumptions is not valid – does not tell you which assumption is not valid

Exact Test for Hardy-Weinberg Equilibrium

• • Chi-square is only appropriate for large sample sizes If sample sizes are small or some alleles are rare, Fisher’s Exact test is a better alternative Pr(

N AA ,N Aa ,N aa n A

,

n a

) 

N

!

n A

!

n a

!

2

N Aa N AA

!

N Aa

!

N aa

!

( 2

N

)!

– Calculate the probability of all possible arrays of genotypes for the observed numbers of alleles – Rank outcomes in order of increasing probability – Reject those that constitute a cumulative probability of <5%

Example in Excel

Weir (1996) Chapt. 3

Likelihood Ratio Test

 

L L

 

r

 

Maximum of the likelihood function given the data (z) when some parameters are assigned hypothesized values Maximum of the likelihood function given the data (z) when there are no restrictions When the hypothesis is true:

LR

  2 ln    2 

L

   

r

   2 df=#parameters assigned values Likelihood ratio tests for multinomial proportions are often called G-tests (for goodness of fit) Lynch and Walsh Appendix 4

Likelihood Ratio Test for HWE

G

  2

i

1

j n n

  

i N ij

ln   

N ij ij

  and is the observed number of the ij

ij

th genotype

Calculations in Excel

Gametic phase equilibrium

Random association of alleles at different loci (independence)

P

AB =

p

A

p

B

B b A

P

AB

P

Ab a

P

aB

P

ab

Disequilibrium D

AB =

P

AB

D

AB =

P

AB –

P p

A ab –

p

B

P

Ab

P

aB

D

AB = 0.40 – 0.5*0.5 = 0.15

D

AB = 0.4*0.4 – 0.1*0.1 = 0.15

A a

p

B B

.40

.10

.50

Lynch and Walsh, pg 94-100; Falconer, pg 15-19

p

b b

.10

.40

.50

p

A

p

a

.50

.50

Linkage Disequilibrium

• Nonrandom association of alleles at different loci – the covariance in frequencies of alleles between the loci • Refers to frequencies of alleles in gametes (haplotypes) • May be due to various causes in addition to linkage – ‘gametic phase disequilibrium’ term is a more accurate – ‘linkage disequilibrium’ (LD) is widely used to describe associations of alleles in the same or in different linkage groups

Linkage Disequilibrium

Gametic types Observed Expected Disequilibrium AB

P

AB

p

A

p

B +D Ab

P

Ab

p

A

p

b -D aB

P

aB

p

a

p

B -D Excess of coupling phase gametes  Excess of repulsion phase gametes  +D -D ab

P

ab

p

a

p

b +D

Sources of linkage disequilibrium

• • • • • • • • Linkage Multilocus selection (particularly with epistasis) Assortative mating Random drift in small populations Bottlenecks in population size Migration or admixtures of different populations Founder effects Mutation

Two locus equilibrium

• For two loci, it may take many generations to reach equilibrium even when there is independent assortment and all other conditions for equilibrium are met – New gamete types can only be produced when the parent is a double heterozygote

A B A B A b a b

0.5 AB 0.5 Ab 0.25 AB 0.25 Ab 0.25 aB 0.25 ab

Decay of linkage disequilibrium

• In the absence of linkage, LD decays by one-half with each generation of random mating c = recombination frequency 0.25

D t

 1  ( 1 

c

)

D t

0.20

c=.50 c=.20 c=.10 c=.01 0.15

D t

 ( 1 

c

)

t D

0 0.10

0.05

0.00

0 10 20 30 40 50 60 Generation 70 80 90 100

Factors that delay approach to equilibrium

D t

 ( 1 

c

)

t D

0 • • Linkage Selfing – because it decreases the frequency of double heterozygotes • Small population size – because it reduces the likelihood of obtaining rare recombinants

Implications for breeding

P1 A 1 A 1 B 1 B 1 P2 x A 2 A 2 B 2 B 2

Effect of inbreeding on the frequency of a recombinant genotype

Inbreds F2 F2 (adjusted) F 1 A 1 A 2 B 1 B 2 gamete frequency A 1 B A 1 B 2 A 2 B 1 A 2 B 2 1 0.5*(1-c) 0.5*c 0.5*c 0.5*(1-c) 0.25

0.20

0.15

0.10

0.05

0.00

0 0.1

0.2

0.3

0.4

c = recombination frequency

0.5

• • • Gametic Phase Disequilibrium that is not due to linkage is eliminated by making the F 1 cross Recombination occurs during selfing There would be greater recombination with additional random mating, but it may not be worth the time and resources

Effect of mating system on LD decay

c = effective recombination rate s = the fraction of selfing 0.5

0.4

0.3

0.2

0.1

0 1 0.9

0.8

0.7

0.6

outcrossing

Generation

99% selfing

c

1

2 s

s

0.05 0.00

0.05 0.99

0.25 0.00

0.25 0.99

0.50 0.00

0.50 0.99

no linkage

Alternative measures of LD

fyi • • • • r 2 

D

2 AB

p A p a p B p b

D is the covariance between alleles at different loci Maximum values of D depend on allele frequencies It is convenient to consider same r

2

to be the square of the correlation coefficient, but it can only obtain a value of 1 when allele frequences at the two loci are the r

2

indicates the degree of association between alleles at different loci due to various causes (linkage, mutation, migration)

D – minimum and maximum values

A a B

P

AB =

p

A

p

B + D

P

aB =

p

a

p

B - D

p

B

b

P

Ab =

p

A

p

b - D

P

ab =

p

a

p

b + D

p

b

p p

A a

If D>0 Look for the maximum value D can have

P

Ab =

p

A

p

b - D

0

D

p

A

p

b

P

aB =

p

a

p

B - D

0

D

p

a

p

B

D  min(

p

A

p

b ,

p

a

p

B ) If D<0 Look for the minimum value D can have

P P

AB ab = =

p p

a A

p p

B b + D

+ D

0 0

 

D D

  -p -p

a A

p p

b B

D  max(-

p

A

p

B , -

p

a

p

b )

Alternative measures of LD

D

' 

D

AB min(

p A p b

,

p a p B

) When D AB > 0 fyi

D

' 

D

AB (  1 ) * min(

p A p B

,

p a p b

) When D AB < 0

• •

• • D’ is scaled to have a minimum of 0 and a maximum of 1 D’ indicates the degree to which gametes exhibit the maximum potential disequilbrium for a given array of allele frequencies D’=1 indicates that one of the haplotypes is missing D’ is very unstable for small sample sizes, so r 2 is more widely utilized to measure LD

Testing for gametic phase disequilibrium

• Best when you can determine haplotypes – inbred lines or doubled haploids – haplotypes of double heterozygotes inferred from progeny tests • • • • Use a Goodness of Fit test if the sample size is large – Chi-square – G-test (likelihood ratio) Use Fisher’s exact test for smaller sample sizes Use a permutation test for multiple alleles Need a fairly large sample to have reasonable power for LD (~200 individuals or more) See Weir (1996) pg 112-133 for more information

Depiction of Linkage Disequilibrium

Disequilibrium matrix for polymorphic sites within

sh1

in maize

Prob value

Fisher’s Exact Test r 2 Flint-Garcia

et al

., 2003. Annual Review of Plant Biology 54: 357-374 .

Extent of LD in Maize

Average LD decay distance is 5 –10 kb r 2 Linkage disequillibrium across the 10 maize chromosomes measured with 914 SNPs in a global collection of 632 maize inbred lines.

Yan et al. 2009. PLoS ONE 4(12): e8451

Extent of LD in Barley

Elite North American Barley No adjustment for population structure  Average LD decay distance is ~5 cM r 2 Adjusted for population structure  Other studies Wild barley – LD decays within a gene Landraces ~ 90 kb European germplasm - significant LD: mean 3.9 cM, median 1.16 cM Waugh et al., 2009, Current Opinion in Plant Biology 12:218-222

References on linkage disequilibrium

Flint-Garcia

et al

., 2003. Structure of linkage disequilibrium in plants. Annual Review of Plant Biology 54: 357-374.

Gupta

et al

., 2005. Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Molecular Biology 57:461 –485.

Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics 9:477 –485 Waugh, R., Jean-Luc Jannink, G.J. Muehlbauer, L. Ramsay. 2009. The emergence of whole genome association scans in barley. Current Opinion in Plant Biology 12(2): 218 –222.

Yan, J., T. Shah, M.L Warburton, E.S. Buckler, M.D. McMullen, et al. 2009. Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP Markers. PLoS ONE 4(12): e8451.

Zhu

et al

., 2008. Status and prospects of association mapping in plants. The Plant Genome 1: 5 –20.