Transcript ppt

Lab 7. Estimating Population
Structure
Goals
1. Estimate and interpret statistics
(AMOVA + Bayesian) that characterize
population structure.
2. Demonstrate roles of gene flow and
genetic drift on population structure.
Gene flow and Genetic drift
Gene flow maintains similar allele frequency in
different subpopulations.
Genetic drift causes random differences in
allele frequencies among small subpopulations.
qm
m
m
q0
q0
m
m
m
q0
q0
q0
Wright’s Island model: Assumes
Gene flow occurs with equal
probability from the continent
(large source population) to each
island (smaller subpopulations)
Gene flow and Genetic drift
Assuming equilibrium between gene flow (increasing
variations) and genetic drift (reducing variation in finite
population) and also assuming Wright’s island model,
diversity among subpopulations(FST) can be calculated
as :
FST
(1  m) 2

2 N  (2 N  1)(1  m) 2
1
FST 
4 Nm  1
If, m=0, FST =1; i.e. Strong genetic differentiation
exists among subpopulations.
If, m=1, FST =0; i.e. No genetic differentiation exists
among subpopulations.
F-coefficients with different levels of
structure
F
Formula
Meaning
FIT
HT  HO
FIT 
HT
Measure of deviation (MD) from HWE in total population.
0 : No deviation from HWE in TP.
Positive: Deviation due to deficiency of heterozygotes in TP.
Negative: Deviation due to excess of heterozygotes in TP.
FST
Measure of genetic differentiation among subpopulations.
H T  H S It is always positive.
FST 
H T 0 : No genetic differentiations among subpopulations.
1 :Strong genetic differentiations among subpopulations.
FIS
FIS 
Measure of deviation from HWE within subpopulations.
H S  H O 0 = No deviation from HWE within SP.
Positive: Deviation due to deficiency of heterozygotes within SP.
H S Negative: Deviation due to excess of heterozygotes within SP.
F-coefficients with different levels of
structure
Parameter
FSR
FRT
Formula
Meaning
HR  HS
FSR 
HR
Measure of genetic differentiation
among subpopulations within a region.
HT  H R
FRT 
HT
Measure of genetic differentiation
among regions for the total population.
0 : No genetic differentiation among
subpopulations within a region.
1 :Strong genetic differentiation among
subpopulations within a region.
0 : No genetic differentiation among regions in TP.
1 :Strong genetic differentiation among regions in
TP.
Estimation of F Coefficients using AMOVA
Parameter
AMOVA (Arlequin)
FST
φST or FST
FSR
φSC or FSC
FRT
φCT or FCT
Population structure from worldwide human population
Population = subpopulation.
Group = Regions
Eurasia
East Asia
Oceania
America
Africa
AMOVA result interpretations:
-------------------------------------------------------------------------------------------------------------Source of variations
Percentage of variation
-------------------------------------------------------------------------------------------------------------Among groups(regions)
10
Among sub(populations) within a region
Within sub(populations)
4
86
Fixation Indices:
FST : 0.14
FSC : 0.04
FCT : 0.10
----------------------------------------------------------------------
14 % of total genetic variation is due to differentiation among subpopulations.
86 % of total genetic variation is due to differentiation within subpopulations.
4 % of regional genetic variation is due to differentiation among subpopulations.
10 % of total genetic variation is due to differentiation among regions.
# of
individuals
# of
pops.
87
4
10
Human
structure
data
# individuals in pops.
# of
regions
13
24
25
25
Colombian
Karitiana
Maya
Pima
2
# individuals in
regions
37
50
SA
NA
ID
Population
46
Colombian 120 120
128
124
142
124
133
129
47
Colombian
120 120
128
124
146
124
129
129
48
Colombian
126 126
128
124
146
144
129
129
Infinite Alleles Model (Crow and
Kimura Model)
• Each mutation creates a completely new allele
• Reversion is so rare as to be essentially non-existant
• Any single mutation is as likely as any other single mutation
Stepwise Mutation Model
• Do all loci conform to Infinite Alleles Model?
• Are mutations from one state to another equally probable?
• Consider microsatellite loci: small insertions/deletions more likely
than large ones?
Problem 1. File human_struc.xls (which is already in GenAlEx format) contains data for 10
microsatellite loci used to genotype 41 human populations from a worldwide sample.
a) Five regions are already defined in the file (AFRICA, AMERICA, EAST ASIA, EURASIA,
and OCEANIA). Convert the file into Arlequin format and perform AMOVA based on this
grouping of populations within regions using distance measures based on the Infinite
Alleles Model (IAM) and the Stepwise Mutation Model (SMM). How do you interpret
these results? Report values of Φ-statistics and their statistical significance for each
AMOVA you run.
b) Do you think that any of these regions can justifiably be divided into subregions? Pick a
region, form a hypothesis for what would be a reasonable grouping of populations into
subregions (see information in Appendix 1 and map in Appendix 2), then run AMOVA
only for the region you selected using distance measures based on both the IAM and the
SMM. Was your hypothesis supported by the data?
c) How do Φ-statistics calculated from distance measures based on the SMM compare to
those based on the IAM?
d) GRADUATE STUDENTS ONLY: Which of the 5 initially defined regions has the highest
diversity in terms of effective number of alleles? What is your biological explanation for
this? Make sure that you cite your sources, and avoid dubious internet sites.
How to choose K?
Picking the Best K
K
Log-likelihood
2
-1235
3
-1238
e-1235
e-1235
1
P(K = 2 | Data) = -1235 -1238 = -1235
=
= 0.9526
-3
-3
e
+e
e
(1+ e ) 1+ e
e-1238
e-1238
1
P(K = 3 | Data) = -1235 -1238 = -1238 3
= 3 = 0.0474
e
+e
e
(e +1) e +1
Picking the Best K
Problem 2. Use Structure to further test the hypotheses you developed in Problem 1.
a) Calculate the posterior probabilities to test whether:
(i) All subpopulations form a single, genetically homogeneous group.
(ii) There are two genetically distinct groups within your selected region.
(iii) There are three genetically distinct groups within your selected region.
b) Use the ΔK method to determine the most likely number of groups. How does this
compare to the method based on posterior probabilities?
c) How do the groupings of subpopulations compare to your expectations from Problem 1?
d) Is there evidence of admixture among the groups? If so, include a table or figure showing
the proportion of each subpopulation assigned to each group.
e) GRADUATE STUDENTS ONLY: Provide brief, literature-based explanation for the
groupings you observe.