Transcript Document
Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike Hickerson Chris Meyer Craig Moritz University of California, Berkeley Museum of Vertebrate Zoology Outline Introduction - Species Discovery Potential problems - Simulations Potential problems - Empirical data Potential statistical solutions New specimen in the field Match new specimen’s DNA “barcode” to voucher specimens with barcodes in database Organizes an enormous flood of data Proposed genetic thresholds for discovery Comparing sample to closest sister taxon in reference database 1. Hebert’s 10X rule between species divergence must be > 10 times the average within species divergence 2. Reciprocal Monophyly Noisy Problem Species Tree ≠ Gene Tree Species A Species B Usually a “near miss” Species C 4 Sampled Individuals Species C Doubly Noisy Problem (mtDNA Barcode locus) Genetic Threshold Equal? Species Delimitation Criteria Moving Target (Mental Construct?) Doubly Noisy Problem (mtDNA Barcode locus) Genetic Threshold Not sensitive enough too sensitive Equal? Species Delimitation Criteria Moving Target Under-Discovery Over-Discovery Joint Simulation Exploration Simple BDM Model of Reproductive isolation: (Bateson-Dobzhansky-Muller) DNA-barcode gene (mtDNA, CO1 690 bp) Coalescent model Problematic parameter space? Potential statistical solutions? BDM Model (Bateson-Dobzhansky-Muller) Genotype A,b a,b OK A, B a, B Bad Neutral and divergent selection (Gavrilets 2004) Speciation events - Poisson process BDM loci Barcode locus (mtDNA) Divergence Time (generations) Island/Continent (peripatric) Hickerson et al. 2006 (in press; Systematic Biology) Reciprocal monophyly Threshold Hickerson et al. 2006 (in press; Systematic Biology) Coyne and Orr 1997 Coyne and Orr 1997 10X Not Species Coyne and Orr 1997 Not Species Coyne and Orr 1997 Coyne and Orr 1997 Zigler et al. 2005 Presgraves 2002 Sasa et al. 1998 Mendelson 2003 Bolnick and Near 2005 Move beyond “Yes/No” answers: Nielsen and Metz 2005 Bayesian posterior probabilities w/ ABC -answers with quantified uncertainty -very fast (< 30 seconds per query) -flexible (parameter threshold, model and prior changes according to taxonomic group) = moderate support for new species Migration Isolation time Prior, parameter threshold and operative model is adjustable as appropriate for particular taxonomic group ? Mymarommatid wasps (10 rare living fossil species) African Cichlids (recent radiation) Ongoing Work Extension of msBayes software pipeline Determining appropriate priors, thresholds and models Testing: Simulated data -Yule model (stochastic speciation/extinction) Empirical data - Chris Meyer (marine taxa) Simulated data -Yule model Speciation and extinction follows a random birth/death process Time Speciation Extinction Test = what % of sisters and orphans are detected as new species “discoveries? Test Data 1.Closest Divergence times - Sister’s and Orphans 2. Population sizes - Gamma distributed 50K-2.5M 3. Single specimens from “new” species 3,5,10,20, and 40 specimens from reference species Orphan Sister-pair Empirical Data (Cowries) 400 90 350 80 50 40 30 50 20 0 10 Divergence Time 65 60 55 50 45 40 35 30 25 20 0 15 100 135 lineages 60 5 150 70 10 200 100 lineages per clade 0 250 Count 300 4. 0 66 9. 7 33 3 14 18 .6 23 7 .3 3 28 32 .6 37 7 .3 3 42 46 .6 51 7 .3 3 56 60 .6 65 7 .3 3 Count Yule model Divergence Time Is it a new species? Function of Posterior Probability of divergence Time and gene flow Discovery? Reference Species msBayes Software pipeline ABC observed data SIMULATE 1,000,000 \ draws from model Flexible Pre-simulated prior Accept 0.2% Posterior probability surface ~< 1 minute Approximate Bayesian Computation (ABC) Posterior Prior Parameter threshold? Posterior Prior M1= yes, new species M2= no, same old species f (M1 given Data) Bayes Factor = f (M2 given Data) prior (M1) prior (M2) A way to compare evidence for these 2 discrete models From simulated Yule phylogenies Freq. False Negatives 0.5 0.4 0.3 0.2 0.1 0 3 5 10 20 40 Sample Size Sample size optimized at 5 (so far) Very Near Future 1. Better priors Species divergence time AND intra-species coalescence 2. Incorporate Migration 3. Hierarchical Model New species status Yes Prior(T,N) Hyper-Parameter No Hyper-Prior Prior(N, T=0) ACKNOWLEDGEMENTS Discussion C. Moritz C. Meyer T. Mendelson K. Zigler N. Rosenberg J. Degnan Coauthors C. Meyer C. Moritz cpu resources J. McGuire Museum of Vertebrate Zoology Funding NSF DIMACS