iwgscStdProtocolesPAG2014_Choulet - WHEAT URGI

Download Report

Transcript iwgscStdProtocolesPAG2014_Choulet - WHEAT URGI

A pseudomolecule of 774 Mb:
the 3B experience
Frédéric CHOULET
INRA GDEC – Clermont-Ferrand, France
 3B MTP-BAC sequencing
Sequenced
physical map
 #MTP BACs
8452
 #BAC pools
922
 #Roche 8 kb MP lib.
922
 bp coverage (Roche/454)
36x
3B
 BAC-ends (Sanger)
42,551
 Whole Genome Prof. tags
327,282
 Whole 3B shotgun (Illumina)
82x
3B physical map
3B
900 Mb
Physical map
 #BACs
132,000 (19x)
 #BAC-contigs
1282
 #MTP-BACs
8452
 Assembly and scaffolding
3B
ACGTAGACTACA
3B-v1
16,136 scaff
1,040 Mb
 Assembly and scaffolding
3B-v1
16,136 scaff
1,040 Mb
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
18% Ns
3B-v3
 Parsing of MP read positions
 Integration of BAC-end match positions
scaff00001
scaff00024
4,999 scaff
992 Mb
13% Ns
scaff00013
scaff00008
scaff00011
scaff00007
scaff00005
 Assembly and scaffolding
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
o Gap filling
o Seq. error corrections
JM. Aury, A. Couloux (Genoscope)
3B-v1
16,136 scaff
1,040 Mb
18% Ns
3B-v3
4,999 scaff
992 Mb
13% Ns
3B-v4
Illumina reads
Whole 3B Shotgun
 109,914 gaps filled
 126,290 bases corrected (error rate: 0.1%)
8% Ns
 Assembly and scaffolding
o Curation of the scaffolding
V. Barbe, S. Mangenot (Genoscope)
o Gap filling
o Seq. error corrections
3B-v1
16,136 scaff
1,040 Mb
3B-v4
4,999 scaff
992 Mb
JM. Aury, A. Couloux (Genoscope)
o Redundancy removal and scaffold merging
S. Theil (INRA GDEC)
Pool_A
3B-v443
ctg1
2,808 scaff
833 Mb
ctg2
Pool_B
scaffAssembler.pl
redundancy:
160 Mb
Search for shared TE-junctions
3B-v443
 Ordering scaffolds
o
2,808 scaff
833 Mb
SNP discovery
SureSelect® seq. capture (E. Paux, N. Cubizolles, E. Rey)
Bait
gene
TE
DNA captured from
10 genotypes
 52,265 baits
 39,077 SNPs
isbpProbeDesign.pl
 Ordering scaffolds
o
SNP discovery
o
Genotyping mapping pop
 3,075 SNPs

Genetic mapping (P. Sourdille)
•
Anchor map: 384 indiv Cs x Renan
+ Neighbor map: 3865 markers

LD mapping (F. Balfourier)
•
367 lines from a core-collection
 Ordering scaffolds
3B
genetic map
0 cM
44.8 cM
152 scaffolds
133 cM
366 bins
LD map
19 LD blocks
554 bins
Linkage
Disequilibrium
64 markers at
the same
genetic position
 Ordering scaffolds
o
SNP discovery
o
Genotyping mapping pop
o
Integration of phys. map info
pseudomolBuilder.pl
pseudomolecule
N
N
N
N
N
N
N
N
93%
1358 scaff
774 Mb
unlocalized
7%
1450 scaff
59 Mb
cM
0
1
A
2
3
3
3
4
?
?
?
B
C
D
5
6
E
o orientation unknown: 48% of the seq.
o micro-order unknown: 554 bins / 1358 scaff
 Future Improvements
o RH map
o Optical map
o Long reads
 Annotation
CLARI-TE
TRIANNOT
774 Mb
•
7264 protein coding genes
• 234,606 TEs
 Bioinformatics
 Assembly
 Newbler
 gapCloser
 ssrFinishing
 Scaffolding/pseudomolecule construction
 isbpProbeDesign.pl
 scaffAssembler.pl
 pseudomolBuilder.pl
 Annotation
 triAnnot (new modules: filtering, pseudogenes, transfer annotation)
 clari-TE & clari-TE-lib
 Data management
 gowDB (Bio::DB::seqFeatureStore)
 Gbrowse @ URGI
Acknowledgments
Catherine Feuillet
Sébastien Theil
Natasha Glover
Josquin Daron
Lise Pingault
Hélène Rimbert
Nelly Cubizolles
Etienne Paux
Pierre Sourdille
François Balfourier
Jacques Le Gouis
Nicolas Guilhot
Philippe Leroy
Aurélien Bernard
Genoscope
URGI
A. Alberti
V. Barbe
J. Poulain
C. Durand
S. Mangenot
JM. Aury
A. Couloux
P. Wincker
M. Alaux
L. Couderc
V. Jamilloux
H. Quenesville
BIA
C. Gaspin
VIB
K. Vandepoele
MIPS
K. Mayer et al.
CNRGV
H. Berges
A. Bellec
IEB
J. Dolezel
J. Safar
TGAC
J. Rogers, M. Caccamo
et al.
SAB
P. Schnable J. Rogers
S. Rounsley K. Eversole
D. Ware