DNA Barcoding: An Emerging Global Standard for Species

Download Report

Transcript DNA Barcoding: An Emerging Global Standard for Species

DNA Barcoding:
An Emerging Global
Standard for Species
Identification
Consortium for the Barcode of Life
National Museum of Natural History
Smithsonian Institution
http://www.barcoding.si.edu
202/633-0808; fax 202/633-2938
Academia Sinica, 16 January 2007
A DNA barcode is a
short gene sequence
taken from
standardized portions
of the genome,
used to identify species
Academia Sinica, 16 January 2007
Characteristics of Barcode
Regions
•
•
•
•
•
•
Flanked by conserved regions
Easy to amplify
Low intraspecies variability
Discontinuous variation between species
Long enough to work in all groups
Short enough for single reads
Academia Sinica, 16 January 2007
The Mitochondrial Genome
D-Loop
Small ribosomal RNA
Large
ribosomal RNA
Cyt b
ND1
ND6
ND5
L-strand
COI
COI
ND2
H-strand
ND4
COI
ND4L
ND3
COIII
COII
ATPase subunit 8
ATPase subunit 6
Academia Sinica, 16 January 2007
Using DNA Barcodes
• Establish reference library of barcodes
from identified voucher specimens
• If necessary, revise species limits
• Then:
– Identify unknowns by searching against
reference sequences
– Look for matches (mismatches) against
‘library on a chip’
– Before long: Analyze relative abundance in
multi-species samples
Academia Sinica, 16 January 2007
Analytical chain
1. Databasing
2. Labeling
3. Imaging
4. Tissue sampling
5. DNA extraction
6. PCR
7. PCR check
8. Sequencing reaction
9. Sequencing cleanup
10. Sequencing
11. Trace editing & submission
Academia Sinica, 16 January 2007
BoLD Data System
• Developed/hosted by Univ. Guelph
• Workbench for most barcode projects
• Laboratory Information Management
System (LIMS) for assembling data
• Management and Analysis System
• Identification system for matching
unknowns to reference records
• Uploading to GenBank
Academia Sinica, 16 January 2007
Methods
Academia Sinica, 16 January 2007
Barcode of Life Database
Academia Sinica, 16 January 2007
Analytical chain
1. Databasing
2. Labeling
3. Imaging
4. Tissue sampling
5. DNA extraction
6. PCR
7. PCR check
8. Sequencing reaction
9. Sequencing cleanup
10. Sequencing
11. Trace editing & submission
Academia Sinica, 16 January 2007
Current Norm: High throughput
Large capacity PCR and
sequencing reactions
ABI 3100 capillary
automated sequencer
Academia Sinica, 16 January 2007
Cost of Reagents and Disposables
Fresh/Frozen
Museum
Tissue Sampling
$0.41
$0.41
DNA Extraction
$0.34
$2.00
PCR Amplification
$0.24
$0.48
PCR Product Check
$0.35
$0.70
Cycle Sequencing
$1.04
$2.08
Sequencing Cleanup
$0.32
$0.64
Sequence
$0.40
$0.80
Total:
$3.10
$7.11
Academia Sinica, 16 January 2007
Producing Barcode Data: 2008
Faster, more portable: Hundreds of samples per hour
Integrated DNA microchips
Table-top microfluidic systems
Academia Sinica, 16 January 2007
Producing Barcode Data: 2010?
Barcode data anywhere, instantly
• Data in seconds to
minutes
• Pennies per
sample
• Link to reference
database
• A taxonomic GPS
• Usable by nonspecialists
Academia Sinica, 16 January 2007
Methods
Academia Sinica, 16 January 2007
Uses of DNA Barcodes
Applied tool for identifying regulated species:
• Disease vectors, agricultural pests, invasives
• Environmental indicators, protected species
• Using minimal samples, damaged specimens, gut
contents, droppings
Research tool for improving species-level taxonomy:
• Associating all life history stages, genders
• Testing species boundaries, finding new variants
“Triage” tool for flagging potential new species:
• Undescribed and cryptic species
Academia Sinica, 16 January 2007
Uses of DNA Barcodes
Applied tool for identifying regulated species:
• Disease vectors, agricultural pests, invasives
• Environmental indicators, protected species
• Using minimal samples, damaged specimens, gut
contents, droppings
Research tool for improving species-level taxonomy:
• Associating all life history stages, genders
• Testing species boundaries, finding new variants
“Triage” tool for flagging potential new species:
• Undescribed and cryptic species
Academia Sinica, 16 January 2007
Associating Life Stages, Processed Parts,
Dimorphic Genders
Academia Sinica, 16 January 2007
Steatogenini until the early 90’s
Hypopygus lepturus
Hoedeman 1962
Steatogenys elegans
Steatogenys duidae
Academia Sinica, 16 January 2007
Color patterns in Hypopygus
Nijssen & Isbrüker 1972
Academia Sinica, 16 January 2007
Steatogenini during the 90’s
Hypopygus lepturus
Hoedeman 1962
Hypopygus neblinae
Mago-Leccia 1994
Steatogenys
Academia Sinica, 16 January 2007
Steatogenini during the 90’s / today
Hypopygus lepturus
Hoedeman 1962
Hypopygus neblinae
Mago-Leccia 1994
Stegostenopos
Triques 1997
Steatogenys
Academia Sinica, 16 January 2007
Steatogenys sp.
Stegostenopos cryptogenes
Hypopygus lepturus
R. Bernhard, 2004
Academia Sinica, 16 January 2007
8a
RAG 1
MP/ML/Dist
Stegostenopus
Hypopygus neblinae
A
C
D
H. lepturus
Steatogenys
Academia Sinica, 16 January 2007
12S16S
Strict of
Stegostenopus
H. neblinae
A
C
D
E
H. lepturus
ML/MP/Dist
Steatogenys
Academia Sinica, 16 January 2007
D-loop
MP/ML/Dist
2781
2885
2876
H. lepturus
2845
2792
2791
E
D
Academia Sinica, 16 January 2007
Stegostenopus
H. neblinae
E
D
C
H. lepturus
COI - BARCODE
MP
2792
2791
A
2781
2845
Eigenmannia sp.
Academia Sinica, 16 January 2007
Uses of DNA Barcodes
Applied tool for identifying regulated species:
• Disease vectors, agricultural pests, invasives
• Environmental indicators, protected species
• Using minimal samples, damaged specimens, gut
contents, droppings
Research tool for improving species-level taxonomy:
• Associating all life history stages, genders
• Testing species boundaries, finding new variants
“Triage” tool for flagging potential new species:
• Undescribed and cryptic species
Academia Sinica, 16 January 2007
Wider Impacts of Barcoding: 2008
• Catalyzing interoperability of databases
– Barcode data standards link sequences,
specimens, species names and publications
• Improving the information infrastructure
– Digital library initiative in taxonomy
• Renewing the mission of museums
– DNA recovery from formalin-fixed specimens
– Promoting the growth of DNA banks
• Expanding analytical toolbox for taxonomy
Academia Sinica, 16 January 2007
What DNA Barcoding is NOT
• Barcoding is not DNA taxonomy; no
single gene (or character) is adequate
• Barcoding is not Tree of Life; barcode
clusters are not phylogenetic trees
• Barcoding is not just COI; standardizing
on one region has benefits and limits
• Molecules in taxonomy is not new; but
large-scale and standardization are new
• Barcoding can help to create a 21st
century research environment for
taxonomy
Academia Sinica, 16 January 2007
Academia Sinica, 16 January 2007
Academia Sinica, 16 January 2007
What DNA Barcoding is NOT
• Barcoding is not DNA taxonomy; no
single gene (or character) is adequate
• Barcoding is not Tree of Life; barcode
clusters are not phylogenetic trees
• Barcoding is not just COI; standardizing
on one region has benefits and limits
• Molecules in taxonomy is not new; but
large-scale and standardization are new
• BUT…Barcoding can help to create a
21st century research environment for
taxonomy
Academia Sinica, 16 January 2007
Consortium for the
Barcode of Life (CBOL)
•
•
•
•
•
•
First barcoding publications in 2002
Cold Spring Harbor planning workshops in 2003
Sloan Foundation grant, launch in May 2004
Secretariat opens at Smithsonian, September 2004
First international conference February 2005
Now an international affiliation of:
–
–
–
–
130+ Members Org’s, 40 countries, 6 continents
Natural history museums, biodiversity organizations
Users: e.g., government agencies
Private sector biotech companies, database providers
Academia Sinica, 16 January 2007
CBOL Member Organizations June
2006: 120 Member Organizations, 40 countries
Academia Sinica, 16 January 2007
CBOL’s Working Groups
• Database: Designing/constructing the
Barcode Section of GenBank
• DNA: Protocols for formalin-fixed and old
museum specimens; Producing LIMS for
dissemination
• Data Analysis: Beyond phenetic methods;
population genetics perspective
• Plants: Identify gene region(s) for
barcoding
Academia Sinica, 16 January 2007
Infrastructure of Taxonomy:
Fragmented, Disconnected
• Collections and databases of specimens
• Compilations of taxonomic names
• Data repositories (characters, gene
sequences, images, trees)
• Monographs
• Floristic and faunistic surveys/inventories
• Revisions
• The (undigitized) Taxonomic Literature
Academia Sinica, 16 January 2007
Barcode Records in INSDC
• Consensus results of Front Royal meeting
– GBIF
– NBII
– ICZN
•
•
•
•
•
 ITIS
 Species2000
 ZooRecord
 GRIN
 IPNI
 OBIS
Structured link to voucher specimen
Species name selected from authority
Online access to metadata
Trace files and quality scores
Minimum sequence length
Academia Sinica, 16 January 2007
BARCODE records in GenBank
Specimen
Metadata
Georeference
Habitat
Character sets
Images
Behavior
Other genes
Other
Databases
Phylogenetic
Pop’n Genetics
Ecological
Voucher
Specimen
Species
Name
Indices
- Catalog of Life
- GBIF/ECAT
Barcode
Sequence
Trace files
Primers
Nomenclators
- Zoo Record
- IPNI
NameBank
Literature
Publication links
- New species
(link to content or
citation)
Academia Sinica, 16 January 2007
Digitizing Taxonomic Literature
• CBOL’s catalytic efforts:
– Library-Laboratory meeting in London on
electronic access to taxonomic literature
– Led to formation of Biodiversity Heritage
Library initiative
– Proactive steps with PubMed to add
taxonomic journals to online abstracts
– Aggressive negotiation with publishers of
barcoding papers
Academia Sinica, 16 January 2007
CBOL’s Working Groups
• Database: Designing/constructing the
Barcode Section of GenBank
• DNA: Protocols for formalin-fixed and old
museum specimens; Producing LIMS for
dissemination
• Data Analysis: Beyond phenetic methods;
population genetics perspective
• Plants: Identify gene region(s) for
barcoding
Academia Sinica, 16 January 2007
The Barcode Assembly Line: 2006
Freshly collected
specimens
Frozen tissue
Young museum
specimens
DNA Barcode Data
Academia Sinica, 16 January 2007
The Barcode Assembly Line: 2008
Opening the museum treasure-trove
Formalin-fixed
specimens
Frozen tissue
Freshly collected
specimens
Older museum
specimens
Young museum
specimens
DNA Barcode Data
Academia Sinica, 16 January 2007
CBOL Formalin Workshop
• Literature survey of DNA recovery
protocols from formalin-fixed specimens
• Solicited proposal from National Research
Council
• May 8-9 workshop in Washington
• Chemists, biochemists, biophysicists,
biomedical researchers
• Create a new research agenda
Academia Sinica, 16 January 2007
CBOL’s Working Groups
• Database: Designing/constructing the
Barcode Section of GenBank
• DNA: Protocols for formalin-fixed and old
museum specimens; Producing LIMS for
dissemination
• Data Analysis: Beyond phenetic methods;
population genetics perspective
• Plants: Identify gene region(s) for
barcoding
Academia Sinica, 16 January 2007
Data analysis protocols in 2008
A Bigger, Better Analytical Toolkit
to handle the Barcode Data Explosion
• Collaboration of statisticians, computer
scientists, population geneticists
• Sampling issues:
– Sample size versus confidence level
– Sample size in light of geography, gene flow
• Analytical tools and protocols:
– Treatment of missing DNA site data
– Identification versus species delimitation
(classification versus clustering)
Academia Sinica, 16 January 2007
CBOL’s Working Groups
• Database: Designing/constructing the
Barcode Section of GenBank
• DNA: Protocols for formalin-fixed and old
museum specimens; Producing LIMS for
dissemination
• Data Analysis: Beyond phenetic methods;
population genetics perspective
• Plants: Identify barcode gene region(s)
for land plants
Academia Sinica, 16 January 2007
Progress toward Plant Barcode
• Kress 2005 proposal for ITS and trnh-psbA
• Kew Garden receives Sloan/Moore
Foundation support
• Phase 1 screens 100 genes across 50
sibling species pairs
• Phase 2 tests of matK, rpcoC1, rpoB, ndhJ,
and accD
• Canadian proposal for rbcL
• CBOL protocols for approving barcode
regions
Academia Sinica, 16 January 2007
Current and Planned
CBOL Barcoding Projects
• FishBOL and All Birds Initiatives
• “Demonstrator Systems: by 2008:
– Tephritid fruit flies (agricultural pests)
– Mosquitoes (disease vectors)
• African Scale Insect Barcoding Initiative
(planned at Cape Town Regional
Meeting)
• Barcoding for Conservation Committee
Academia Sinica, 16 January 2007
Launching CBOL Projects
Assembling Steering Committee
– Users
– Taxonomists, collection curators
– Service providers (BoLD, analytical labs)
• Plan for scope, timetable, logistics
• Pilot tests of primers, PCR amplification
• Assemble pipeline of specimens to lab
Academia Sinica, 16 January 2007
ABBI and FISH-BOL
• Global initiatives to create reference library
• Enable users to adopt barcode ID systems
• All-species barcode database will:
– Strengthen specimen/species data
– Improve collections, tissue/DNA resources
– Attract users to barcoding for specimen IDs
• Regional Working Groups
• Small Steering Committee and CBOL
Academia Sinica, 16 January 2007
Planned Outreach
• Regional meetings in:
– Cape Town, South Africa, 7-8 April 2006, SANBI
– Nairobi, Kenya, 18-19 October 2006, NMK
– Sao Paolo, Brazil, February 2007, INPA
– Southern/SE Asia, mid-2007
• Second International Barcode Conference
– Southeast Asia (?), September 2007 (?)
• Support from CBOL, host governments and
international development agencies
Academia Sinica, 16 January 2007
Milestones for 2008
Q1
2006
Q2
Q3
Q4
Q1
Database:
100K records
DNA WG:
Formalin
Study
Plant WG:
Development of Consensus Plant
Barcode Region
2007
Q2
Q3
200K records
500K records
Advanced Lab
Protocols
International
Conference
Data Analysis WG:
Database WG:
Campaigns:
Q4
2008
Q1 Q2
Demonstrator
System Launched
Data Analysis Protocols and S/W
Data
Standards
BoLI Data Portal
Launched
Extended DB
Interoperability
Regional Groups
Operational
First Data
Releases
10K birds
30K fish
Academia Sinica, 16 January 2007