Transcript Document
Quantifying uncertainty in species discovery
with approximate Bayesian computation
(ABC):
single samples
and
recent radiations
Mike Hickerson
Chris Meyer
Craig Moritz
University of California, Berkeley
Museum of Vertebrate Zoology
Outline
Introduction - Species Discovery
Potential problems - Simulations
Potential problems - Empirical data
Potential statistical solutions
New specimen in the field
Match new specimen’s DNA “barcode” to voucher
specimens with barcodes in database
Organizes an enormous flood of data
Proposed genetic thresholds for discovery
Comparing sample to closest sister taxon in reference database
1. Hebert’s 10X rule
between species divergence must be
> 10 times the average within species divergence
2. Reciprocal Monophyly
Noisy Problem
Species Tree
≠
Gene Tree
Species A
Species B
Usually a
“near miss”
Species C
4 Sampled Individuals
Species C
Doubly Noisy Problem
(mtDNA
Barcode locus)
Genetic
Threshold
Equal?
Species
Delimitation
Criteria
Moving
Target
(Mental
Construct?)
Doubly Noisy Problem
(mtDNA
Barcode locus)
Genetic
Threshold
Not
sensitive
enough
too sensitive
Equal?
Species
Delimitation
Criteria
Moving
Target
Under-Discovery
Over-Discovery
Joint Simulation
Exploration
Simple BDM Model of
Reproductive isolation:
(Bateson-Dobzhansky-Muller)
DNA-barcode gene
(mtDNA, CO1 690 bp)
Coalescent model
Problematic parameter space?
Potential statistical solutions?
BDM Model
(Bateson-Dobzhansky-Muller)
Genotype
A,b
a,b
OK
A, B
a, B
Bad
Neutral and divergent selection (Gavrilets 2004)
Speciation events - Poisson process
BDM loci
Barcode locus
(mtDNA)
Divergence Time (generations)
Island/Continent (peripatric)
Hickerson et al. 2006 (in press; Systematic Biology)
Reciprocal monophyly
Threshold
Hickerson et al. 2006 (in press; Systematic Biology)
Coyne and Orr 1997
Coyne and Orr 1997
10X
Not Species
Coyne and Orr 1997
Not Species
Coyne and Orr 1997
Coyne and Orr 1997
Zigler et al. 2005
Presgraves 2002
Sasa et al. 1998
Mendelson 2003
Bolnick and Near 2005
Move beyond “Yes/No” answers:
Nielsen and Metz 2005
Bayesian posterior probabilities w/ ABC
-answers with quantified uncertainty
-very fast (< 30 seconds per query)
-flexible (parameter threshold, model and prior
changes according to taxonomic group)
= moderate support for new
species
Migration
Isolation time
Prior, parameter threshold and operative model is
adjustable as appropriate for particular taxonomic
group
?
Mymarommatid wasps
(10 rare living fossil species)
African Cichlids
(recent radiation)
Ongoing Work
Extension of msBayes software pipeline
Determining appropriate priors, thresholds and models
Testing:
Simulated data -Yule model
(stochastic speciation/extinction)
Empirical data - Chris Meyer (marine taxa)
Simulated data -Yule model
Speciation and extinction follows a random birth/death
process
Time
Speciation
Extinction
Test = what % of sisters and orphans are detected as
new species “discoveries?
Test Data
1.Closest Divergence times - Sister’s and Orphans
2. Population sizes - Gamma distributed 50K-2.5M
3. Single specimens from “new” species
3,5,10,20, and 40 specimens from reference species
Orphan
Sister-pair
Empirical Data (Cowries)
400
90
350
80
50
40
30
50
20
0
10
Divergence Time
65
60
55
50
45
40
35
30
25
20
0
15
100
135 lineages
60
5
150
70
10
200
100 lineages
per clade
0
250
Count
300
4. 0
66
9. 7
33
3
14
18
.6
23 7
.3
3
28
32
.6
37 7
.3
3
42
46
.6
51 7
.3
3
56
60
.6
65 7
.3
3
Count
Yule model
Divergence Time
Is it a new species?
Function of Posterior
Probability of divergence
Time and gene flow
Discovery?
Reference
Species
msBayes Software pipeline
ABC
observed
data
SIMULATE
1,000,000
\ draws
from
model
Flexible
Pre-simulated
prior
Accept
0.2%
Posterior
probability
surface
~< 1 minute
Approximate Bayesian
Computation (ABC)
Posterior
Prior
Parameter threshold?
Posterior
Prior
M1= yes, new species
M2= no, same old species
f (M1 given Data)
Bayes Factor =
f (M2 given Data)
prior (M1)
prior (M2)
A way to compare evidence for these 2 discrete models
From simulated Yule phylogenies
Freq. False Negatives
0.5
0.4
0.3
0.2
0.1
0
3
5
10
20
40
Sample Size
Sample size optimized at 5 (so far)
Very Near Future
1. Better priors
Species divergence time AND intra-species coalescence
2. Incorporate Migration
3. Hierarchical Model
New species status
Yes
Prior(T,N)
Hyper-Parameter
No
Hyper-Prior
Prior(N, T=0)
ACKNOWLEDGEMENTS
Discussion
C. Moritz
C. Meyer
T. Mendelson
K. Zigler
N. Rosenberg
J. Degnan
Coauthors
C. Meyer
C. Moritz
cpu resources
J. McGuire
Museum of Vertebrate Zoology
Funding
NSF
DIMACS