Genome Assembly- Final presentation

Download Report

Transcript Genome Assembly- Final presentation

Final Results
Genome Assembly Team
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin
Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick
Original Pipeline
454 raw
reads
Illumina DeNovo
• Allpaths LG
• SOAP DeNovo
• Velvet
• Taipan
• SUTTA
Illumina
raw reads
Statistical
analysis
Hybrid DeNovo
• Ray
• MIRA
454 DeNovo
• Newbler
• CABOG
• SUTTA
454
reads
Read stats
V.
vulnificus
YJ016
V.
vulnificus
CMCP6
Published Genomes from public databases
All possible
combinations of the
best 3
contigs * 3
Align illumina reads
against 454 contigs
Mac vector
CLC wb
Mimimus
MAIA
Scaffolds
GRASS
Built-in
contigs
MUMmer
Reference
evaluation
AMOScmp
REFERENCE BASED ASSEMBLY
Chosen Ref.
LEGEND
Finished
genome
Nulceotide
identity
Gap filling
GENOME FINISHING
Illumina/(454?)
reference based
assembly
Assemblers
MUMmer
PAGIT
Mauve
contigs
Unmapped
reads
Reference
genome
Assemblers
CONTIG MERGING
Unmapped
reads
DENOVO ASSEMBLY
Align Illumina against
the reference
REFERENCE SELECTION
Info.
GAGE
Hawk-eye
Unmapped
reads
bwa
Compare mapping
statistics
Illumina
454
GAGE
Evaluation
Illumina
reads
V. vulnificus
MO6-24/O
hybrid
Process
Illumina/ 454/ Hybrid
DeNovo assembly
PRE-PROCESSING
Illumina
Parameter
optimization
Pre-processing
Fastqc
Prinseq
NGS QC
samstats
454
Draft/ Finished genome
DNA Diff
Reference
evaluation
Read Visualization – spot the differences
Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage)
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization - more is better!
Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization – cousins or siblings?
Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Data Quality
Effect of pre-processing data (using prinseq)
V. navarensis (454; non-preprocessed|pre-processed)
Metric
2423-01
08-2462
2541-90
Per Base
Seq. Quality
Per Seq.
Quality Sc
Per Base
Seq.
Content
Per Base
GC Content
Per Seq. GC
Content
Per Base N
Content
Seq. Length
Dist.
Seq. Dup.
Levels
Overreprese
nted Seqs.
Kmer
Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
2756-81
V. Vulnificus (454; non-preprocessed|preprocessed)
Metric
2009
V_13
68
06-2432
08-2435
08-2439
Per Base Seq.
Quality
Per Seq.
Quality Score
Per Base Seq.
Content
Per Base GC
Content
Metric
Per Seq. GC
Content
Per Base N
Content
Seq. Length
Dist.
Seq. Dup.
Levels
Overrepresente
d Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
07-2444
V. navarensis (Illumina; non-preprocessed|preprocessed)
Metric
2423-01
08-2462
2541-90
Per Base Seq. Quality
Per Seq. Quality Score
Per Base Seq. Content
Per Base GC Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
2756-81
V. vulnificus (Illumina; non-preprocessed|preprocessed)
Metric
2009V_1368
06-2432
08-2435
08-2439
Per Base Seq.
Quality
Per Seq. Quality
Score
Per Base Seq.
Content
Per Base GC
Content
Per Seq. GC Content
Per Base N Content
Seq. Length Dist.
Seq. Dup. Levels
Overrepresented
Seqs.
Kmer Content
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
07-2444
Assembly
Reference-guided and de-Novo
Reference guided assembly
Comparison of reference guided assembly vs de-novo assembly
ARE – Assembly Score
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Reference-guided vs de-Novo assembly
90
80
70
ARE
60
50
40
30
20
10
454 (Vul_06-2432)
454 (Nav_2541-90)
Illumina (Vul_06-2432)
Illumina (Nav_2541-90)
0
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of Reference-guided assembly

Using V. vulnificus (CMCP6) reference strain


84% coverage
De-Novo assemblers overall provided higher assembly score
than reference based assembly
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
ARE
Newbler (denovo)
100
90
80
70
60
50
40
30
20
10
0
Nav_2541-90
Vul_06-2432
40
50
K-MER SIZE
100
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
CABOG
50
ARE
40
30
Nav_2541-90
20
Vul_06-2432
10
0
15
22
25
K-MER Size
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
ARE
SOAPdenovo
4
3.5
3
2.5
2
1.5
1
0.5
0
-0.5
Nav_2541-90
Vul_06-2432
20
30
40
50
K-MER Size
60
70
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly
Velvet
7
6
ARE
5
4
3
Nav_2541-90
2
Vul_06-2432
1
0
19
25
31
K-MER Size
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De-Novo Assembler Comparison (Optimal Parameters)
100
90
ARE
80
70
454 (Vul_06-2432)
Illumina (Vul_06-2432)
454 (Nav_2541-90)
Illumina (Nav_2541-90)
Hybrid (Vul_06-2432)
Hybrid (Nav_2541-90)
60
50
40
30
20
10
0
CABOG Newbler
(dn)
Ray
Ray SOAPdn Velvet
(hybrid)
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus
Velvet
6
CABOG
5
Ray (hybrid)
4
SOAPdenovo
Newbler (ref;Illumina)
Ray (Illumina)
3
2
Newbler (ref;454)
Assembly Score
Ray (454)
AMOScmp
1
0.828
0.6
0.837
0.8
0.846
1
1.2
0.855
1.4
0.864
1.6
Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are
preferable. Newbler (dn) has been removed to show variance in other tools.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus
Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are
preferable.
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of de-Novo results

OLC assemblers showed considerable differences in ARE than
de-Brujin based assemblers


Cabog/Newbler vs Soap de-Novo/Velvet
Hybrid assembler, Ray, did not perform as well in terms of
assembly score
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Vul_06-2432
AMOScmp
AMOScmp
CABOG
CABOG
164.00
164.00
Newbler
(dn;454)
Newbler
(ref;454)
Newbler ref
ill
Ray (454)
Ray(Ill)
Ray
(hybrid)
6.35
4.69
63.51
55.13
64.51
44.38
67.22
225.12
101.30
62.66
73.23
93.88
98.11
75.98
113.08
5.48
ND
311.98
ND
419.76
104.46
127.01
1.44
67.72
64.99
72.79
35.07
72.34
35.28
ND
ND
ND
ND
33.81
49.94
22.92
37.68
ND
ND
ND
ND
ND
234.69
221.89
Newbler
(ref;454)
6.35
99.30
5.48
Newbler
(ref;Illumina)
4.69
62.66
ND
1.44
63.50
72.56
311.99
67.72
35.28
55.13
93.88
ND
64.99
ND
33.81
Ray (hybrid)
64.51
97.17
419.76
72.79
ND
49.94
ND
SOAPdn
44.38
75.98
104.46
35.07
ND
22.92
ND
ND
Velvet
67.22
113.08
127.01
72.34
ND
37.68
ND
ND
Ray
(Illumina)
Velvet
234.69
Newbler
(dn;454)
Ray (454)
SOAPdn
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
ND
ND
Merging-Nav_2541-90
AMOScmp
AMOScmp
Cabog
Newblerdn
Newbler
(ref;454)
Cabog
133.95
133.95
Newblerdn
Newbler
Newbler
(ref;454)
(ref;Illumina
)
Ray (454)
Ray
Ray
(Illumina)
(hybrid)
SOAPdn
Velvet
ND
0.03
0.03
15.26
14.00
15.77
11.23
45.32
ND
107.60
114.60
82.62
92.44
92.53
80.73
123.02
ND
ND
54.21
59.81
60.47
33.17
94.89
0.11
11.6
11.78
11.86
10.17
39.2
12.66
12.15
12.41
9.6
39.60
59.19
76.36
13.65
63.75
24.21
11.54
39.84
14.06
ND
ND
ND
0.03
107.60
59.94
0.03
114.60
ND
0.28
15.26
82.62
54.21
11.60
12.66
14.01
92.44
59.81
11.78
12.15
33.79
15.77
92.53
60.47
11.86
12.41
40.33
36.79
11.22
80.73
33.17
10.04
9.54
13.61
11.40
13.91
45.32
123.02
94.89
39.20
39.84
64.54
39.84
ND
Newbler
(ref;Illumina)
Ray (454)
Ray
(Illumina)
Ray (hybrid)
SOAPdenov
o
Velvet
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
8.47
8.31
Assembler Review
Assembler
Status
Allpaths LG
Paired-end only
454
Illumina
Hybrid
Algorithm
DBG
AMOScmp
BB
CABOG
OLC
MIRA
ZEBRA
Newbler
OLC
Ray
DBG
SOAPdenovo
DBG
SUTTA
Unresolved errors
Velvet
BB
DBG
Mira worked as good as our merged contigs but it is impractical – 40hr run time
BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Pipeline
454 raw
reads
Illumina DeNovo
•
Velvet
Illumina raw
reads
454
Hybrid DeNovo
•
Ray
•
Mira
Illumina
hybrid
454 DeNovo
•
Newbler
•
CABOG
Process
Illumina
Statistical
analysis
454 reads
Info.
Assemblers
Illumina/ 454/ Hybrid
DeNovo assembly
Fastqc
Prinseq
Read stats
454
Pre-processing
Assemblers
Merge Ray –hyb/ Newbler
Merge CABOG/Velvet
MIRA-hyb
Illumina
reads
contigs
LEGEND
Mimimus
Draft genome
PRE-PROCESSING
Align illumina reads against
454 contigs
CONTIG MERGING
contigs
DENOVO ASSEMBLY
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Splinter
Pipeline 1
NUM
Nav_2423
-01
Nav_082462
Nav_2541
-90
Nav_2756
-81
Vul_2009
v-1368
Vul_062432
Vul_082435
Vul_082439
Vul_072444
Pipeline 2
AVG
Assembly Assembly
Size
Score
N50
106
42657.2
156064
4.52
136.53
149
25736.8
51230
3.83
19.48
166
26172.5
130386
4.34
62.57
107
42939.4
131591
4.59
122.31
83
57787.2
401973
4.80
345.03
57
85122.7
322525
4.85
419.76
111
42872.9
230373
4.76
144.01
98
50885.7
250789
4.99
210.94
70
73255.1
492706
5.13
656.10
NUM
Nav_2423
-01
Nav_082462
Nav_2541
-90
Nav_2756
-81
Vul_2009
v-1368
Vul_062432
Vul_082435
Vul_082439
Vul_072444
AVG
Assembly Assembly
Size
Score
N50
125
35357.0
164305
4.42
111.36
451
311.9
2253
0.14
0.09
106
40547.5
169781
4.30
123.02
111
41840.8
132119
4.64
124.55
97
49705.8
228408
4.82
170.81
167
28489.7
78353
4.76
32.53
193
24903.7
204178
4.85
75.19
114
44047.9
180889
5.02
134.64
143
35905.1
130942
5.13
85.93
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Visualization
Newbler
Ray Hybrid
Merged
Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Demo