iwgscStdProtocolesPAG2014_Choulet - WHEAT URGI
Download
Report
Transcript iwgscStdProtocolesPAG2014_Choulet - WHEAT URGI
A pseudomolecule of 774 Mb:
the 3B experience
Frédéric CHOULET
INRA GDEC – Clermont-Ferrand, France
3B MTP-BAC sequencing
Sequenced
physical map
#MTP BACs
8452
#BAC pools
922
#Roche 8 kb MP lib.
922
bp coverage (Roche/454)
36x
3B
BAC-ends (Sanger)
42,551
Whole Genome Prof. tags
327,282
Whole 3B shotgun (Illumina)
82x
3B physical map
3B
900 Mb
Physical map
#BACs
132,000 (19x)
#BAC-contigs
1282
#MTP-BACs
8452
Assembly and scaffolding
3B
ACGTAGACTACA
3B-v1
16,136 scaff
1,040 Mb
Assembly and scaffolding
3B-v1
16,136 scaff
1,040 Mb
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
18% Ns
3B-v3
Parsing of MP read positions
Integration of BAC-end match positions
scaff00001
scaff00024
4,999 scaff
992 Mb
13% Ns
scaff00013
scaff00008
scaff00011
scaff00007
scaff00005
Assembly and scaffolding
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
o Gap filling
o Seq. error corrections
JM. Aury, A. Couloux (Genoscope)
3B-v1
16,136 scaff
1,040 Mb
18% Ns
3B-v3
4,999 scaff
992 Mb
13% Ns
3B-v4
Illumina reads
Whole 3B Shotgun
109,914 gaps filled
126,290 bases corrected (error rate: 0.1%)
8% Ns
Assembly and scaffolding
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
o Gap filling
o Seq. error corrections
3B-v1
16,136 scaff
1,040 Mb
3B-v4
4,999 scaff
992 Mb
JM. Aury, A. Couloux (Genoscope)
o Redundancy removal and scaffold merging
S. Theil (INRA GDEC)
Pool_A
3B-v443
ctg1
2,808 scaff
833 Mb
ctg2
Pool_B
scaffAssembler.pl
redundancy:
160 Mb
Search for shared TE-junctions
3B-v443
Ordering scaffolds
o
2,808 scaff
833 Mb
SNP discovery
SureSelect® seq. capture (E. Paux, N. Cubizolles, E. Rey)
Bait
gene
TE
DNA captured from
10 genotypes
52,265 baits
39,077 SNPs
isbpProbeDesign.pl
Ordering scaffolds
o
SNP discovery
o
Genotyping mapping pop
3,075 SNPs
Genetic mapping (P. Sourdille)
•
Anchor map: 384 indiv Cs x Renan
+ Neighbor map: 3865 markers
LD mapping (F. Balfourier)
•
367 lines from a core-collection
Ordering scaffolds
3B
genetic map
0 cM
44.8 cM
152 scaffolds
133 cM
366 bins
LD map
19 LD blocks
554 bins
Linkage
Disequilibrium
64 markers at
the same
genetic position
Ordering scaffolds
o
SNP discovery
o
Genotyping mapping pop
o
Integration of phys. map info
pseudomolBuilder.pl
pseudomolecule
N
N
N
N
N
N
N
N
93%
1358 scaff
774 Mb
unlocalized
7%
1450 scaff
59 Mb
cM
0
1
A
2
3
3
3
4
?
?
?
B
C
D
5
6
E
o orientation unknown: 48% of the seq.
o micro-order unknown: 554 bins / 1358 scaff
Future Improvements
o RH map
o Optical map
o Long reads
Annotation
CLARI-TE
TRIANNOT
774 Mb
•
7264 protein coding genes
• 234,606 TEs
Bioinformatics
Assembly
Newbler
gapCloser
ssrFinishing
Scaffolding/pseudomolecule construction
isbpProbeDesign.pl
scaffAssembler.pl
pseudomolBuilder.pl
Annotation
triAnnot (new modules: filtering, pseudogenes, transfer annotation)
clari-TE & clari-TE-lib
Data management
gowDB (Bio::DB::seqFeatureStore)
Gbrowse @ URGI
Acknowledgments
Catherine Feuillet
Sébastien Theil
Natasha Glover
Josquin Daron
Lise Pingault
Hélène Rimbert
Nelly Cubizolles
Etienne Paux
Pierre Sourdille
François Balfourier
Jacques Le Gouis
Nicolas Guilhot
Philippe Leroy
Aurélien Bernard
Genoscope
URGI
A. Alberti
V. Barbe
J. Poulain
C. Durand
S. Mangenot
JM. Aury
A. Couloux
P. Wincker
M. Alaux
L. Couderc
V. Jamilloux
H. Quenesville
BIA
C. Gaspin
VIB
K. Vandepoele
MIPS
K. Mayer et al.
CNRGV
H. Berges
A. Bellec
IEB
J. Dolezel
J. Safar
TGAC
J. Rogers, M. Caccamo
et al.
SAB
P. Schnable J. Rogers
S. Rounsley K. Eversole
D. Ware