FishTrace_Paris 07_06 - Consortium for the Barcode of Life

Download Report

Transcript FishTrace_Paris 07_06 - Consortium for the Barcode of Life

Fish barcoding from the FishTrace database:
The control gene, the data validation analysis and the
backup reference biological data
José M. Bautista
FishTrace Consortium
DNA Barcoding of Life
Sequencing billions of base pairs but first sampling millions DNAs
From M. Blaxter, Nature (2003)
Fish
> 30,000 teleost species (~ 60%
of vertebrates)
> 500 species consumed in
Europe (FAO 2003)
> 100 MTons world fish catch
Practical - ecological interest
"Marina con Pesci". Detail in a Mosaic found at Pompeii.
Source of data for taxonomic identification
 Dichotomic identification keys (e.g. FNAM, Whitehead, et al.
1984-1986)
 Fisheries field guides (e.g. FAO)
 Fish systematics (e.g. Nelson, 1994)
 Taxonomic databases (e.g. FishBase)
 Molecular databases (e.g. GenBank)
+ expert taxonomists
Obstacles to reliable taxonomic fish identification
 Definition of Standards: standard names and species covered
 Methodology available
 Control sample: Analytical data from standard (control) sample
 Availability of regional samples
 Availability of regional information
 Adequate sample conservation for analysis (countercheck)
Fish barcoding scenario
 30,000 fish species
 Sampling cost
 Ad hoc sampling vs. random sampling
 Regional divergence = regional sampling
 Limited availability of taxonomic expertise
Fish barcoding relies on sampling:
How many fish species will be available for
DNA barcoding in 5 years?
Fish barcoding: Links taxonomic and molecular ID
Trazability, population and fish species identification has
been traditionally associated to molecular markers
DNA: Restriction Fragments Lenght Polymorphisms (RFLP)
Microsatellites (VNTR)
Sequencing mitochondrial genes
Mitochondrial DNA is a tradicional marker for species identification
 Neutral marker
 Multicopy: present in a large number at every cell.
 Evolutionary rate 5 -10 times higher than nuclear genes.
 Maternally inherited without genetic recombination.
 The larger number of mtDNA sequences in GenBank from many fish species
corresponds to rRNA16S, cyt b and COI
FishTrace www.fishtrace.org
2003-2006
Barcoding of European Marine Fish:
Molecular ID + Taxonomy + Reference Collections
in a single Database
2000: Pilot project in Canary Islands [92 species]: www.pescabase.org
2002: FishTrace Consortium: 10 European Institutions
2003: FishTrace starts as EC Project (January)
2006: FishTrace ends as EC Project (June)
FishTrace www.fishtrace.org
FishTrace: Geographical areas covered
FishTrace www.fishtrace.org
Fishtrace in numbers:
3240 specimens sampled: tissue taken for DNA analysis
9 geographical sampling areas
5 specimens/specie/area
221 species [516 geo-overlapping]
2724 total number of sequences
FishTrace www.fishtrace.org
Taxonomy and
sampling
Molecular
genetics
Biological
collections
Standardization
Standardization
Standardization
Ad hoc Sampling
PCR + Sequencing
Storage
Identification
Genetic Variation
Cataloging
Local description
Methodological
description
Sharing
Validation
Validation
Validation
Database
WWW
The
Structure
of the
FishTrace
Database
Flow of information in the database for reliable fish identification
Taxonomy and
sampling
Rhod
Biological
collections
Genetic
Identification
Cyt b
 Sampling error (misidentified specimens): 2.1 %
 Low quality DNA or inhibitors (no PCR amplification): 0.0 %
 PCR not amplify one of the fragments (cytb or Rhod): 1.8 %
 PCR or sequencing error: final sequence not matching phylogeny: 3.0 %
Fish DNA barcoding reliabilty in the
FishTrace database upon:
The control gene: Rhodopsin
The data validation strategy: from taxonomy to sequence
The backup reference biological data: from sequence to collections
Trustworthiness to do what the system is designed to do
The control gene: Rhodopsin
 Single copy
 Nuclear gene that does not contains introns in Teleostei
 Encodes a transmembrane G-protein-coupled receptor for visual transduction
cascade
 In fish, evolutionary rate is below 2-fold lower than cytb gene [0.167 vs 0.247]
 Rhodopsin trees contains a high number of well supported Teleostei clades
The control gene: Rhodopsin
The benefits
 More information: When cytb is not obtained, rhodopsin is available
 Increased Reliability: Branch consistency in Rhod and Cytb trees is a control
of correct PCR and sequencing
 Prediction: Unknown fish specimens with no representation in database are
taxonomically well classified in phylograms from Cytb + Rhod + Cytb-Rhod.
 Phylogenetic Performance: Rhod trees perform better than mtDNA trees with
Teleostei clades
Taxa clade
Clupeiformes
Angulliformes
Cypriniformes
Gadiformes
Osmeriformes
Salmoniformes
Aulopiformes
Beryciformes 1
Perciformes – Sparidae 1
Perciformes - Centracanthidae
Perciformes – Sparidae 2
Tetraodontiformes 1
Scopaeniformes 1
Lophiiformes
Perciformes - Zoarcidae
Perciformes - Anarhichadidae
Scopaeniformes 2
Perciformes - Scombridae
Perciformes - Bramidae
Perciformes - Centrolophidae
Perciformes - Pomatomidae
Perciformes - Mullidae
Ophidiiformes
Beryciformes 2
Perciformes - Labridae
Perciformes - Scaridae
Atheriniformes
Beloniformes
Perciformes - Mugilidae
Perciformes - Scianidae
Perciformes - Trachinidae
Perciformes - Serranidae 1
Scopaeniformes 3
Perciformes – Serranidae 2
Tetraodontiformes 2
Perciformes - Polyprionidae
Perciformes - Haemulidae
Perciformes - Xiphiidae
Perciformes - Carangidae
Perciformes - Sphyraenidae
Perciformes - Moronidae
Perciformes - Pleuronectiformes
Cytb
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Rhod Cyt + Rhod
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Alosa fallax
Alosa alosa
Sardina pilchardus
Sardinella maderensis
Engraulis encrasicolus
Gymnothorax afer
Muraena robusta
Anguilla anguilla
Anguilla japonica
Danio rerio
Carassius auratus
Melanogrammus aeglefinus
Merlangius merlangus
Gadus morhua
Pollachius virens
Micromesistius poutassou
Gadiculus argenteus
Molva molva
Brosme brosme
Enchelyopus cimbrius
Ciliata septentrionalis
Phycis phycis
Phycis blennoides
Merluccius merluccius
Osmerus eperlanus
Plecoglossus altivelis
Oncorhynchus mykiss
Salmo salar
Salmo trutta
Zeus faber
Chlorophthalmus agassizi
Argentina sphyraena
Beryx decadactylus
Lithognathus mormyrus
Diplodus vulgaris
Boops boops
Sparus auratus
Sarpa salpa
Spicara smaris
Spicara maena
Dentex dentex
Pagrus pagrus
Pagellus erythrinus
Ranzania laevis
Sphoeroides pachygaster
Fugu rubripes
Tetraodon nigroviridis
Chelidonichthys lucernus
Chelidonichthys gurnardus
Aspitrigla cuculus
Lophius piscatorus
Lophius budegassa
Zoarces viviparus
Anarhichas lupus
Myoxocephalus scorpius
Liparis liparis
Cyclopterus lumpus
Thunnus albacares
Thunnus obesus
Thunnus alalunga
Thunnus thynnus
Sarda sarda
Katsuwonus pelamis
Euthynnus alletteratus
Scomber scombrus
Scomber japonicus
Taractichthys longipinnis
Schedophilus ovalis
Pomatomus saltatrix
Mullus surmuletus
Mullus barbatus
Brotula barbata
Sargocentron sp
Myripristis sp
Xyrichthys novacula
Labrus bergylta
Scarus hoefleri
Sparissoma rubripinne
Atherina presbyter
Atherina boyeri
Oryzias latipes
Tylosurus acus
Belone belone
Mugil cephalus
Liza ramado
Liza aurata
Chelon labrosus
Umbrina canariensis
Trachinus draco
Echiichthys vipera
Serranus hepatus
Serranus cabrilla
Scorpaena porcus
Sebastes viviparus
Helicolenus dactylopterus
Epinephelus costae
Epinephelus marginatus
Sparissoma cretense
Balistes capriscus
Polyprion americanus
Pomadasys perotaei
Pomadasys incisus
Xiphias gladius
Trachurus trachurus
Caranx crysos
Seriola rivoliana
Seriola fasciata
Sphyraena sphyraena
Dicentrarchus labrax
Dicologlossa cuneata
Buglossidium luteum
Microchirus azevia
Pegusa lascaris
Solea solea
Lepidorhombus boscii
Limanda limanda
Hippoglossoides platessoides
Hippoglossus hippoglossus
Microstomus kitt
Glyptocephalus cynoglossus
The data validation strategy: from taxonomy to sequence
The online database
Species
data
Specimen
data
General Info
Taxonomy + Biology
DNA Sequences
Biological Collections
Regional Info
VALIDATION
Reference Sequences
Genetic Variation
Reference material
The data validation strategy: from taxonomy to sequence
The online database
The data validation strategy: from taxonomy to sequence
The online database
The data validation strategy: from taxonomy to sequence
The online database
The data validation strategy: from taxonomy to sequence
The online database
The data validation strategy: specimen comparison of the collections
The data validation strategy: specimen comparison
The data validation strategy: specimen comparison
FishTrace Database: 82 Different PCR Conditions
COMPLETE Cytb (20):
Cytb 5' (18):
Cytb 3' (31):
cytB3-RIVO-cytb6/Tru
Rhodopsin (13):
CytB-IFRE_Direct
cytB5-IFRE_met1
RIVO-cytb6/THR
cytB3-RIVO-melaeg
rhod-IFRE_RO1-1
cytBcp-IFRE_met1
cytB5-IFRE_met9
cytB3-IFRE_met10
cytB3-RIVO-pleuronec
rhod-NAGREF
cytBcp-IFRE_met11
cytb5-IFRE_met14_dir
cytB3-IFRE_met16_dir cytB3-RIVO_cytb7/THR
rhod-NRM_flatfish
cytBcp-IFRE_met12
cytb5-IFRE_met15
cytB3-IFRE_met18_dir cytB3-UCM_11
rhod-NRM_std
cytBcp-IFRE_met20
cytb5-IFRE_met17_dir
cytB3-IFRE_met19
cytB3-UCM_12
rhod-RIVO-F2-F4
cytBcp-IFRE_met21
cytb5-IFRE_met24
cytB3-IFRE_met2
cytB3-UCM_13
rhod-RIVO-nested
cytBcp-IFRE_met7
cytb5-IFRE_met27_dir
cytB3-IFRE_met22
cytB3-UCM_14
rhod-RIVO_mulsurDNA1
cytBcp-IFRE_met8
cytb5-IFRE_met4
cytB3-IFRE_met23_dir cytB3-UCM_Mer1
rhod-UCM_11
cytBcp-NRM_Anguilla
cytb5-IFRE_met5_dire
cytB3-IFRE_met25
cytB3-cytb7/truc
rhod-UCM_12
cytBcp-NRM_Gadus
cytb5-NAGREF
cytB3-IFRE_met26
cytB3_sol_NS
rhod-UCM_13
cytBcp-NRM_Scombrus
cytb5-NAGREF1_direct
cytB3-IFRE_met3
rhod-UCM_21
cytBcp-NRM_Sebastes
cytb5-NRM_std
cytB3-IFRE_met6_dire
rhod-UCM_22
cytBcp-NRM_std
cytb5-RIVO-CilCep
cytB3-NAGREF-1
rhod-UCM_23
cytBcp-RICO-TriEsm
cytb5-RIVO_pol
cytB3-NAGREF-2
cytBcp-RIVO-Diclab
cytb5-UCM_11
cytB3-NAGREF-3
cytBcp-RIVO-ScoSco
cytb5-UCM_12
cytB3-NAGREF1_direct
cytBcp-RIVO-analup
cytb5-UCM_21
cytB3-NAGREF_MS
cytBcp-RIVO-pleuron.
cytb5_direct_NS
cytB3-NRM_std
cytBcp-RIVO_TriLus
cytB3-RIVO-Micpou
cytBcp-Rivo_direct
cytB3-RIVO-MulSur
The data validation strategy: from taxonomy to sequence
Standard Protocols
Automated sequencing: ABI 3730 (real picture)
100% Efficiency
FishcytB-F
CytBI-5R
Sequencing chromatogram run (~800 bp)
48 samples / 2 different primers / 96 reactions
The backup reference biological data: from sequence to collections
The backup reference biological data: from sequence to collections
The backup reference biological data: from sequence to collections
Combined information for fish identification
Taxonomy
and sampling
Rhod
Genetic
Identification
Biological
collections
Cyt b
Counterchecking
Further genetic analysis
Counterchecking
Further genetic analysis
Trisopterus minutus Trisopterus
minutus capellanus
Fish Barcoding Database
+
Specimen Identification
Solea solea
Microchirus azevia
Combined information for fish identification
Trisopterus minutus Trisopterus
minutus capellanus
TriMin-CB-02
TriMin-EM-04
Solea solea
Microchirus azevia
SolSol-CS-02
MicAze-CI-04
Combined information for fish identification
Sample/Specimen Identificación
Diagnostic Tools Tailored to Users
FishTrace Database: Sample/Specimen Identificación: BLAST
FishTrace Database: Diagnostic Tools Tailored to Users: RFLP
FishTrace Database: Sample/Specimen Identificación: TREE
No. of Taxa : 24
No. of Groups : 3
Data Title : FishTrace Cytb+Rhod
Data Type : Nucleotide
Analysis : Phylogeny reconstruction
Tree Inference : ==============================
Method : Minimum Evolution
Phylogeny Test and options : Bootstrap (1000 replicates)
Search Options : CNI (level=1) with initial tree=NJ MaxTrees=1
Include Sites : ==============================
Gaps/Missing Data : Pairwise Deletion
Substitution Model : ==============================
Model : Nucleotide: Kimura 2-parameter
Substitutions to Include : d: Transitions + Transversions
Pattern among Lineages : Same (Homogeneous)
Rates among sites : Uniform rates
No. of Sites : 1601
Pollachius virens
100
57
Pollachius pollachius
Gadus morhua
100
Merlangius merlangus
34
Melanogrammus aeglefinus
97
Micromesistius poutassou
100
Trisopterus esmarkii
54
Sample EM0789
100
83
Trisopterus minutus
100
Gadiculus argenteus
Brosme brosme
73
Molva molva
99
Phycis blennoides
100
Phycis phycis
99
Merluccius polli
71
Merluccius merluccius
100
100
62
Merluccius capensis
Ciliata septentrionalis
Enchelyopus cimbrius
Gaidropsarus biscayensis
98
45
Gaidropsarus mediterraneus
Oncorhynchus mykiss
Engraulis encrasicolus
83
Gymnothorax afer
0.05
FishTrace Database: Sample/Specimen Identificación: MORPHO
Acknowledgements
European
Commission
Amalia Diez
Antonio Puyet
Guy Duhamel
MNHN - Paris
NAGREF - Kavala
Sebastián Jiménez
Manuel Biscoito
TFMC - Tenerife
Hilde Van Pelt
IMAR - Funchal
Véronique Verrez
RIVO - Ijmuiden
Monique Etienne
IFREMER - Nantes
ICCM - Las Palmas
NRM - Stockholm
JRC - Ispra
UCM - Madrid
José A. González
Sven O. Kullander
Naouma Kourti
Philippe Carreau
Rafael G. Sevilla
Rosa Pestana
Antonio M. García
Gema Escalera
Gema González
João Delgado
José A. Pérez
Jesús Soria
Grigorios Krey
Mafalda Freitas
Ignacio J. Lozano
Susana Pérez
Patrice Pruvost
Kees Groeneveld
José I. Santana
Fátima Hernández
Rosa Domínguez
Montserrat Gimeno
Hamid R. Ghanavi
Panos Leontarakis
Fernanda Marrero
Daniel San Andrés
Rocío González
Laurence Favre-Krey
Faye Taylor
Miguel Rabassó
Delphine Ortega
Víctor M. Tuset
Romas Statkus
Prudencio Calderín
Michael Norén
Erik Alander
Georg Fridriksson
Anders Silfvergrip
Laura Ramírez
Marc Jérôme
Olivier Mouchel
Afne Stein
Alejandro De Vera
Angeliki Adamidou
Alexis Tsangridis
Samuel Iglésias
Mélyne Hautecoeur
Romain Causse
Laurent Nandrin
Corinne Guchereau