Transcript Workflowx

SOAP3-dp
Workflow
SOAP3-dp workflow for paired-end alignment
Step 1: Use SOAP3 to align pair-ended reads
paired alignments
Paired-end reads
chr 6, +4,059, -4,369;
............
………….
SOAP3 (2-mismatch)
…………………..
…………………..
Step 2: For reads with one end mapped but another not, use Default-DP to align the unmapped ends
One ends’ alignments
Default-DP
chr 9, +49,538;
………..
…….….
paired alignments
mapped region candidate region
for the unmapped end
of one end
+
The unmapped ends
chr 9
chr 9, +49,538, -49,829;
............
………….
+
49,538
use DP to align
……….
……….
Step 3: For reads with both ends unaligned, use SOAP3 to align the seeds and then use Deep-DP to align both ends
seed alignments
of second end
seed alignments
of first end
chr 18,
+349,683;
............
………….
SOAP3
(1-mismatch)
seeds
+
chr 18, -349,998;
............
………….
Pair up the seed alignments
Deep-DP
paired alignments
chr 18, +349,664, -349,923;
............
………….
candidate region
+
chr 18
-
349,998
349,683
use DP to align
paired seed alignments
chr 18, +349,683, -349,998;
............
………….
Step 1: SOAP3
Both ends can be mapped and paired properly
Report the alignments
SOAP3 (2-mismatch)
A read pair is paired properly if:
1. Both ends are mapped within
the insert size (i.e. a range of
distance between two ends
inputted by the user).
2. In proper orientation (for
illumina reads, the end aligned
to left side is in forward strand,
while another aligned to right in
reverse strand.)
Only one end can be mapped
with not too many hits (i.e. <= 30)
Store the readID (of aligned end)
and hits to ARRAY A
Only one end can be mapped
with too many hits (i.e. > 30)
Store the readID ( of aligned
end) and hits to ARRAY B
both ends cannot be mapped
Store the readID (of the first read
of the pairs) and hits to ARRAY C
Both ends can be mapped but not paired properly
Store the readID and hits to ARRAY
A or B
(describe more in next slides)
Step 1: SOAP3 -- Both ends can be mapped but not paired properly
read 1
YES
Not paired properly
Let x = # of all valid hits of read 1
read 2
YES
Let y = # of all valid hits of read 2
If x > 30, only retains the best hits of read 1 and reset x = # of best hits of read 1.
If y > 30, only retains the best hits of read 2 and reset y = # of best hits of read 2.
a) x,y <= 30
YES
NO
NO
YES
ARRAY A
b) x <= 30 < y
YES
NO
ARRAY A
c) y <= 30 < x
NO
YES
ARRAY A
d) 30 < x < y
YES
NO
ARRAY B
e) 30 < y <= x
NO
YES
ARRAY B
Store the read ID and hits of YES to ARRAY A or B
Step 2 and step 3:
default DP and new default DP
Both ends can be mapped and paired properly
Report the alignments
Otherwise
Array A
Store the readID of the first
read of the pairs to ARRAY C
Default DP
Both ends can be mapped and paired properly
Report the alignments
Otherwise
Array B
New default DP
Store the readID of the first read
of the pairs to ARRAY C
Detailed picture of Default DP and New Default DP
For reads with one end mapped but another not, AND the number of hits is not too many,
use Default-DP to align the unmapped ends
One ends’ alignments
Default-DP
chr 9, +49538;
………..
…….….
paired alignments
mapped region candidate region
for the unmapped end
of one end
+
The unmapped ends
chr 9
chr 9, +49538, -49829;
............
………….
+
49538
use DP to align
……….
……….
For reads with one end mapped but another not, AND the number of hits is too many,
use New-Default-DP to align the unmapped ends
One ends’ alignments
chr 18, +349683;
………..
…….….
seed alignments
of unmapped end
The unmapped ends
+
SOAP3
(1-mismatch)
seeds
seeds
chr 18, -349998;
............
………….
Pair up the seed alignments
with the alignments of another end
paired alignments
New-Default-DP
candidate region
chr 18, +349683, -349923;
............
………….
+
349683
mapped region
of one end
349998
-
use DP to align
chr 18
chr 18, +349683, -349998;
............
………….
Step 4:
2-level Deep DP
ARRAY C
ROUND 1 SEEDING for both ends
Seed length: 26
Sample rate: 1/13
Max # of hits allowed: 100
If (1) there exists a seed with
too many hits; AND
(2) no pairs of hits
within insert size.
If there exists pairs of
hits within insert size.
Perform DP
for those pairs of hits
within insert size.
If there exists pairs of
hits within insert size.
Case 1:
Valid paired alignments found
Case 2:
No valid paired alignment found
ROUND 2 SEEDING for both ends
Seed length: 30
Sample rate: 1/15
Max # of hits allowed: 1000
Report the alignments
Store the readID of both
ends to ARRAY D
Step 5:
Single DP
The end can be mapped
Report the alignments
Otherwise
Array D
Single DP
Report the ends cannot
be aligned
Detailed picture of Single DP
seed alignments
seeds
Report the alignments
chr 18, +349,664;
............
………….
SOAP3
(1-mismatch)
chr 18, +349,683;
............
………….
Single-DP
Candidate region
+
349,683
use DP to align
Chr18
Paired-end alignment
(overall workflow)
Load 6M reads (3M pairs)
SOAP3 (2-mismatch)
Note: New-default DP
needs 2BWT in GPU,
while default DP does
not. Thus we run newdefault DP before
default DP,
because after SOAP3,
2BWT index is already
inside GPU.
New default DP
Default DP
2-level deep DP
single DP
Yes
More reads to process?
No
END
Create a new CPU thread
to load next 6M reads
SOAP3 Architecture
Device (GPU)
Host (CPU)
Memory-resident data structures
Memory-resident data structures
2BWT
2BWT + SA
Execution
Process 1M reads for round 1
and round 2 alignments
Execution
Process round 3 alignment &
Report results
Process 1M reads for round 1
and round 2 alignments
Process round 3 alignment &
report results
Process 1M reads for round 1
and round 2 alignments
Process round 3 alignment &
report results
Process 1M reads for round 1
and round 2 alignments
Process round 3 alignment &
report results
Process 1M reads for round 1
and round 2 alignments
……..
……..
DP with seeding
Host (CPU)
Device (GPU)
Memory-resident data structures
Memory-resident data structures
2BWT / DP tables
2BWT + SA
Execution
Execution
Copy 2BWT index to GPU &
Extract seeds of reads in Array C
SOAP3
(1-mismatch)
Process 1M seeds for round 1
and round 2 alignments
Process round 3 alignment
……..
Pair-up the seed alignments,
Clear 2BWT index in GPU &
Create DP tables in GPU
Process 1M seeds for round 1
and round 2 alignments
……..
Perform DP between the reads
and the candidate regions
Default DP
Host (CPU)
Memory-resident data structures
Device (GPU)
Memory-resident data structures
DP tables
2BWT + SA
Execution
Create DP tables in GPU
Execution
Perform DP between the reads
and the candidate regions
Single-end alignment
(overall workflow)
Load 6M single-end reads
SOAP3 (2-mismatch)
single DP
Yes
More reads to process?
No
END
Create a new CPU thread
to load next 6M reads
Paired-end alignment
(For read length > 150)
Load 6M reads (3M pairs)
2-level deep DP
single DP
Yes
More reads to process?
No
END
Create a new CPU thread
to load next 6M reads