DeepFin Update: how far are we from “a phylogeny of all

Download Report

Transcript DeepFin Update: how far are we from “a phylogeny of all

Recent advances in molecular
phylogenies of
actinopterygian fishes
Guillermo Ortí
University of Nebraska, USA
Molecular Systematics of Ray-finned Fishes
DeepFin will Advance The Phylogeny of “Fishes”
A Research Coordination Network
1. To promote fish phylogenetics (resolve the fish tree!)
2. To develop cyberinfrastructure, a portal for fish
phylogenetics (www.deepfin.org) with networking
tools and interconnected relational databases
3. To develop educational material to foster education
on fish biodiversity, fish evolution, and current
knowledge on the phylogenetic relationships of fishes
DeepFin will Advance The Phylogeny of “Fishes”
A Research Coordination Network
1. To promote fish phylogenetics—how far are we from
the “tree of all fishes”??
Integrate all sources of information:
 Morphology
 Genetics
 Paleontology
Issues with molecular phylogenies
based on a single gene or few loci
• Low resolution or low support
(characters v taxa)
• Conflicts among trees inferred from
different loci.
– Analytical reasons (base compositional
bias / long branch attraction /
heterotachy).
GC% at the 3rd codon position of RAG 1
1
0.9
Muraenesox
galaxiids
Albuliformes
Gonostoma Ogcocephalus
Elops
Engraulis
Arnoglossus
scorpaenids
Sparus
Albula
0.8
0.7
Colisa
Trigla
Gasterosteus
Megalops
Mean
0.6
1
Protacanthop.
Stomiiformes
Elasmobranchii
Tetrapoda
Polypteriformes
Basal actinop.
Osteoglosso.
0.2
Clupeomorpgha
0.3
Ostariophysi
0.4
Elopomorpha
0.5
Basal neoteleosts
“Paracanthop.”
Zeus
L
s
ophiiforme
0.5
Acanthopterygii
0
Issues with molecular phylogenies
based on a single gene or few loci
• Low resolution or low support
(characters v taxa)
• Conflicts among trees inferred from
different loci.
– Biological reasons (gene tree vs.
organismal tree)
Gene trees within organismal trees
Lineage
sorting
Gene
duplication
Horizontal
transfer
“Phylogenomics”: use many (genomescale) loci to infer phylogeny
• Large number of characters will
increase statistical power
• Analysis of many independent loci may
reduce systematic error
• Genome-scale nuclear gene markers
will be more likely to represent
organismal evolution
How to collect “phylogenomic” data
(from multiple loci)
• Using available genome databases
(model organisms)
• Sequencing cDNA/EST libraries
• Directly amplify and sequence target
fragments from genomic DNA using
‘universal’ nuclear markers
How can we find new ‘universal’
nuclear gene markers???
Three criteria to choose “good”
nuclear gene markers*
1)
Orthologous genes should be easy to identify and
amplify in all taxa of interest. To minimize the
chance of “mistaken paralogy”, we seek only
single-copy genes (so, what about gene
duplications?)
* Chenhong Li (UNL) and Guoqing Lu (UN-Omaha)
a
b
c
Taxon 1
Taxon 2
Taxon 3
Taxon 1
Taxon 2
Taxon 3
Taxon 1
Taxon 2
Taxon 3
a1
a2
a3
a1
a2
a3
a1 b1
a2 b2
a3 b3
Gene loss
2 nd speciation
Gene loss
1st speciation
Gene loss
a
b
a
Gene duplication
b
a
Gene duplication
b
Gene duplication
Three criteria to choose “good”
nuclear gene markers
2) The amplicon (i.e. target sequences
amplified by the PCR primers) should be of
reasonable size (exons >800 bp).
zebrafish elongation factor 1-alpha (ef1a)
Three criteria to choose “good”
nuclear gene markers
3)
The gene should be “reasonably”
conserved, so universal primers can be
designed and the sequences can be easily
aligned.
Gonadotropin-releasing hormone
 If we agree with these 3 criteria (single copy,
long exon, reasonable conservation) for good
nuclear makers, “randomly testing” genes
provides a poor chance to finding a good marker
(additional criteria are possible)
 Directly apply the 3 criteria to screen genomes
of two model organisms, zebrafish (Danio rerio)
and pufferfish (Takifugu rubripes).
Scheme of our marker-developing strategy
130 candidate loci were identified ‘in silico’
Distribution of 109 candidate markers in
zebrafish chromosomes
109 are located on 24 of the 25 chromosomes (21 with no location information).
Chi-square test did not reject the Poisson distribution of these markers (p=0.0746).
Summary of the 130
candidate loci
• Size range: from 802 bp to 5811 bp in zebrafish.
• Base composition: GC content ranges from 41.6% to
63.9% in zebrafish.
• Identity: of these markers between zebrafish and
pufferfish ranges from 77.3% to 93.2%.
Experimental test of
the candidate markers
• A random sample of 15 candidate markers was
examined in 52 ray-finned fish taxa (40/47 orders of
Actinopterygii).
• PCR primers were designed to conserved regions
(nested PCR strategy)
• 10 out of the 15 markers tested were successfully
amplified by PCR from genomic DNA in most taxa
Marker*
Exon ID†
PCR Fragment
Size (bp)
No. of PI
sites‡
Average p-distance§
zic1
ENSDARE00000015655
945
344
0.156
myh6
ENSDARE00000025410
735
329
0.179
RYR3
ENSDARE00000465292
837
421
0.210
ptr
ENSDARE00000145053
708
372
0.205
tbr1
ENSDARE00000055502
723
313
0.189
ENC1
ENSDARE00000367269
810
360
0.180
Gylt
ENSDARE00000039808
882
510
0.211
SH3PX3
ENSDARE00000117872
708
317
0.167
plagl2
ENSDARE00000136964
690
345
0.173
sreb2
ENSDARE00000029022
987
387
0.149
New Data
10 genes
8025 bp
52 taxa
ML tree
POLYPTERIFORMES
Basal lineages
ACIPENSERIFORMES
SEMIONOTIFORMES
AMIIFORMES
Basal teleosts
CLUPEOMORPHA
OSTARIOPHYSI
PROTACANTHOPTERYGII
STOMIIFORMES
ATELEOPODIFORMES
LAMPRIDIFORMES
POLYMIXIIFORMES
PARACANTHOPTERYGII
Nelson 94
ACANTHOPTERYGII
ACANTHOMORPHA
Neoteleosts
MYCTOPHIFORMES
NEOTELEOSTEI
AULOPIFORMES
EUTELEOSTEI
Protacantho
ELOPOMORPHA
TELEOSTEI
Clupeo-Ostario
OSTEOGLOSSOMORPHA
Polypterus
sturgeons
gars
Holostei
Amia
Neopterygii
Teleostei
1. G. Nelson (1969) -- branchial arch morphology
Polypterus
gars
Holostei
Amia
Teleostei
sturgeons
2. Jessen (1973) -- pectoral anatomy
Polypterus
sturgeons
Amia
Teleostei
Neopterygii
gars
3. Olsen (1984) -- skull and pectoral girdle
Chondrostei
Polypterus
sturgeons
gars
Amia
Neopterygii
Teleostei
4. J. Nelson (1994) -- most reasonable
Polypterus
sturgeons
gars
Amia
Neopterygii
Teleostei
5. Bemis et al (1997) -- morphology
Polypterus
sturgeons
gars
Amia
Neopterygii
Teleostei
6. Lê et al (1993) -- 28S rRNA;
Venkatesh et al (1999)-- 8 nuclear introns
Polypterus
sturgeons
gars
Holostei
Amia
Teleostei
7. Inoue et al -- mtDNA; Ortí et al
RAG-1, and rhodopsin
Basal Actinopterygians
Teleostei
Holostei
Chondrostei
Raja radiat a
S c yliorhinus c anic ula
S qualus ac ant hia
Neoc erat odus f ors t eri
99
94
Lepidosiren paradox a
P rot opt erus s p
68
100
100
Lat im eria chalum nae
#Calam oic ht hy s c alabaric us
P oly pt erus ornat ipinnis
96
#P oly pt erus c ongicus
A m ia c alva
#Lepis os t eus plat os t om us
Lepis os t eus oc ulat us
100
A c ipens er s t urio
P s ephurus gladius
100
A c ipens er gueldens t aedt ii
P s eudos c aphirhy nc hus herm anni
Hus o huso
A c ipens er t rans m ont anus
#P olyodon spat hula
A c ipens er f ulv es c ens
#S c aphirhync hus albus
71
M egalops at lant ic us
#E lops s aurus
100
E
lops
hawaiiens is
80
2 100 Hiodon alos oides 1
#Hiodon alos oides 2
73
58
POL YPTER IFOR MES
AMIIFOR MES
SEMION OTIFOR MES
Holostei
AC IPEN SER IFOR MES
EL OPIFOR MES
P ant odon buc hholz i1
100
P ant odon buc hholz i 2
Not opt erus c hit ala
2
100 Chit ala ornat a
#A rapaim a gigas
1
Os t eoglos s um f erreirai
82
100 Os t eoglos s um bic irrhosum
93
X enom y s t us nigri
Gym narchus nilot ic us 1
100 Gy m narc hus nilot ic us2
P et roc ephalus s p.
2
74
B rienom y rus bat es ii
98
Gnat honem us pet ers ii
A lbula vulpes
#Halosaurops is m ac rochir
Not ac ant hus bonapart ei
#S t rophodon s at het e
#E chidna nebulos a
100
61
#Gy m not horax ret icularis
#Nem ic ht hys s c olopac eus
S t em onidium hy pom elas
3
A
nguilla
ros
t
rat
a
1 97
A nguilla japonica
Conger m yrias t er
#M uraenes ox c inereus
#Ophicht hus c ephalazona
E c hiophis punc t if er
100
Ophic ht hus rex
#Ophic ht hus gom es ii
1
71
100
69
100
2
60
3
89
76
75
3
0.02
79
82
Euteleostei
mtDNA,
OSTEOGL OSSIFOR MES
1 . Oste o g lo sso id e i
2 . N o to p te ro id e i
AL BU L IFOR MES
AN GU IL L IFOR MES
1 . An g u illo id e i
2 . Mu ra e n o id e i
3 . C o n g ro id e i
421 taxa, ME tree
POLYPTERIFORMES
ACIPENSERIFORMES
SEMIONOTIFORMES
AMIIFORMES
Basal teleosts
OSTEOGLOSSOMORPHA
ELOPOMORPHA
CLUPEOMORPHA
PROTACANTHOPTERYGII
ATELEOPODIFORMES
MYCTOPHIFORMES
POLYMIXIIFORMES
PARACANTHOPTERYGII
Nelson 94
ACANTHOPTERYGII
ACANTHOMORPHA
LAMPRIDIFORMES
NEOTELEOSTEI
AULOPIFORMES
EUTELEOSTEI
STOMIIFORMES
TELEOSTEI
OSTARIOPHYSI
Elopomorpha
10 genes
8025 bp
POLYPTERIFORMES
ACIPENSERIFORMES
SEMIONOTIFORMES
AMIIFORMES
OSTEOGLOSSOMORPHA
ELOPOMORPHA
OSTARIOPHYSI
PROTACANTHOPTERYGII
ATELEOPODIFORMES
MYCTOPHIFORMES
POLYMIXIIFORMES
PARACANTHOPTERYGII
Nelson 94
ACANTHOPTERYGII
ACANTHOMORPHA
LAMPRIDIFORMES
NEOTELEOSTEI
AULOPIFORMES
EUTELEOSTEI
STOMIIFORMES
TELEOSTEI
Clupeo-Ostario
CLUPEOMORPHA
10 genes
8025 bp
POLYPTERIFORMES
ACIPENSERIFORMES
SEMIONOTIFORMES
AMIIFORMES
OSTEOGLOSSOMORPHA
ELOPOMORPHA
CLUPEOMORPHA
PROTACANTHOPTERYGII
STOMIIFORMES
ATELEOPODIFORMES
MYCTOPHIFORMES
POLYMIXIIFORMES
PARACANTHOPTERYGII
Nelson 94
ACANTHOPTERYGII
ACANTHOMORPHA
LAMPRIDIFORMES
NEOTELEOSTEI
AULOPIFORMES
EUTELEOSTEI
Protacantho
TELEOSTEI
OSTARIOPHYSI
10 genes
8025 bp
Table 1 | Summary Information of the 10 gene markers developed.
Gene*
ENSEMBLE
gene ID
No. of
bp
% of
variable
sites
% of
PI sites
Genetic
distance (%)
Sub.
rate
SDR
CI-MP

RCV
Treeness
FEM- 1
00000010567
894
33.11
23.49
28(2.6-65.8)
0.64
1.00
0.61
1.64
0.13
0.23
PAK-6
00000041216
735
38.44
25.53
36(10.1-59.5)
1.35
2.06
0.54
0.68
0.11
0.22
RAB15
00000026484
825
37.04
27.26
36(10.1-58.1)
1.25
1.65
0.56
0.67
0.11
0.21
Frizzled-3
00000021664
705
34.85
22.80
41(6.1-93.6)
1.03
1.64
0.57
1.64
0.12
0.29
CENP-A
00000016745
666
43.12
33.19
28(3.1-79.1)
0.65
0.67
0.67
2.91
0.10
0.28
CutA
00000035396
810
41.13
32.06
38(8.4-78.0)
1.13
1.48
0.55
1.10
0.16
0.33
Gyl t
00000010941
870
38.52
30.62
41(7.6-77.0)
1.18
1.35
0.60
1.70
0.12
0.27
HMG20A
00000031198
705
53.22
38.51
30(7.5-60.0)
1.11
1.70
0.55
1.53
0.14
0.22
00000020319
675
47.15
31.27
29(6.0-60.6)
0.81
1.04
0.61
0.92
0.10
0.33
00000038383
987
43.95
31.97
30(4.6-75.5)
0.85
1.33
0.61
0.88
0.11
0.23
-
1344
50.89
38.24
38(9.8-75.0)
1.28
1.51
0.57
1.68
0.05
0.23
UNK-1
UNK-2
RAG1
ą
PI, parsimony informative sites;
SDR, standard deviation of substitution rates among three codon positions;
CI-MP, consistency index;
, gamma distribution shape parameter;
RCV, relative composition variability.
Treeness, ratio of internal branch length to total branch length.
Summary
• Gene markers that satisfied the three criteria
are widely distributed in zebrafish genome
• Ten out of 15 markers tested seem useful for
phylogenetic inference. Their profiles are
comparable to the popular RAG1 gene
• The strategy is successful!
– The new markers developed will help to infer the
tree of ray-finned fishes
– The bioinformatic tool developed can be used in
other taxonomic groups (S: similarity may vary)
www.deepfin.org
www.deepfin.org
Member data base (Directory) currently has 606 records
www.deepfin.org
Literature data base currently has ~800 records
(for members only)
www.deepfin.org -- Literature data base
www.deepfin.org
Literature data base: upload pdf files to share with other members
“Collaboratories”: a virtual environment to share files and information
www.tolweb.org
Contribute tree pages!
Contact
Join DeepFin !
Questions, comments, suggestions?