Transcript Workflowx
SOAP3-dp Workflow SOAP3-dp workflow for paired-end alignment Step 1: Use SOAP3 to align pair-ended reads paired alignments Paired-end reads chr 6, +4,059, -4,369; ............ …………. SOAP3 (2-mismatch) ………………….. ………………….. Step 2: For reads with one end mapped but another not, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49,538; ……….. …….…. paired alignments mapped region candidate region for the unmapped end of one end + The unmapped ends chr 9 chr 9, +49,538, -49,829; ............ …………. + 49,538 use DP to align ………. ………. Step 3: For reads with both ends unaligned, use SOAP3 to align the seeds and then use Deep-DP to align both ends seed alignments of second end seed alignments of first end chr 18, +349,683; ............ …………. SOAP3 (1-mismatch) seeds + chr 18, -349,998; ............ …………. Pair up the seed alignments Deep-DP paired alignments chr 18, +349,664, -349,923; ............ …………. candidate region + chr 18 - 349,998 349,683 use DP to align paired seed alignments chr 18, +349,683, -349,998; ............ …………. Step 1: SOAP3 Both ends can be mapped and paired properly Report the alignments SOAP3 (2-mismatch) A read pair is paired properly if: 1. Both ends are mapped within the insert size (i.e. a range of distance between two ends inputted by the user). 2. In proper orientation (for illumina reads, the end aligned to left side is in forward strand, while another aligned to right in reverse strand.) Only one end can be mapped with not too many hits (i.e. <= 30) Store the readID (of aligned end) and hits to ARRAY A Only one end can be mapped with too many hits (i.e. > 30) Store the readID ( of aligned end) and hits to ARRAY B both ends cannot be mapped Store the readID (of the first read of the pairs) and hits to ARRAY C Both ends can be mapped but not paired properly Store the readID and hits to ARRAY A or B (describe more in next slides) Step 1: SOAP3 -- Both ends can be mapped but not paired properly read 1 YES Not paired properly Let x = # of all valid hits of read 1 read 2 YES Let y = # of all valid hits of read 2 If x > 30, only retains the best hits of read 1 and reset x = # of best hits of read 1. If y > 30, only retains the best hits of read 2 and reset y = # of best hits of read 2. a) x,y <= 30 YES NO NO YES ARRAY A b) x <= 30 < y YES NO ARRAY A c) y <= 30 < x NO YES ARRAY A d) 30 < x < y YES NO ARRAY B e) 30 < y <= x NO YES ARRAY B Store the read ID and hits of YES to ARRAY A or B Step 2 and step 3: default DP and new default DP Both ends can be mapped and paired properly Report the alignments Otherwise Array A Store the readID of the first read of the pairs to ARRAY C Default DP Both ends can be mapped and paired properly Report the alignments Otherwise Array B New default DP Store the readID of the first read of the pairs to ARRAY C Detailed picture of Default DP and New Default DP For reads with one end mapped but another not, AND the number of hits is not too many, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49538; ……….. …….…. paired alignments mapped region candidate region for the unmapped end of one end + The unmapped ends chr 9 chr 9, +49538, -49829; ............ …………. + 49538 use DP to align ………. ………. For reads with one end mapped but another not, AND the number of hits is too many, use New-Default-DP to align the unmapped ends One ends’ alignments chr 18, +349683; ……….. …….…. seed alignments of unmapped end The unmapped ends + SOAP3 (1-mismatch) seeds seeds chr 18, -349998; ............ …………. Pair up the seed alignments with the alignments of another end paired alignments New-Default-DP candidate region chr 18, +349683, -349923; ............ …………. + 349683 mapped region of one end 349998 - use DP to align chr 18 chr 18, +349683, -349998; ............ …………. Step 4: 2-level Deep DP ARRAY C ROUND 1 SEEDING for both ends Seed length: 26 Sample rate: 1/13 Max # of hits allowed: 100 If (1) there exists a seed with too many hits; AND (2) no pairs of hits within insert size. If there exists pairs of hits within insert size. Perform DP for those pairs of hits within insert size. If there exists pairs of hits within insert size. Case 1: Valid paired alignments found Case 2: No valid paired alignment found ROUND 2 SEEDING for both ends Seed length: 30 Sample rate: 1/15 Max # of hits allowed: 1000 Report the alignments Store the readID of both ends to ARRAY D Step 5: Single DP The end can be mapped Report the alignments Otherwise Array D Single DP Report the ends cannot be aligned Detailed picture of Single DP seed alignments seeds Report the alignments chr 18, +349,664; ............ …………. SOAP3 (1-mismatch) chr 18, +349,683; ............ …………. Single-DP Candidate region + 349,683 use DP to align Chr18 Paired-end alignment (overall workflow) Load 6M reads (3M pairs) SOAP3 (2-mismatch) Note: New-default DP needs 2BWT in GPU, while default DP does not. Thus we run newdefault DP before default DP, because after SOAP3, 2BWT index is already inside GPU. New default DP Default DP 2-level deep DP single DP Yes More reads to process? No END Create a new CPU thread to load next 6M reads SOAP3 Architecture Device (GPU) Host (CPU) Memory-resident data structures Memory-resident data structures 2BWT 2BWT + SA Execution Process 1M reads for round 1 and round 2 alignments Execution Process round 3 alignment & Report results Process 1M reads for round 1 and round 2 alignments Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments …….. …….. DP with seeding Host (CPU) Device (GPU) Memory-resident data structures Memory-resident data structures 2BWT / DP tables 2BWT + SA Execution Execution Copy 2BWT index to GPU & Extract seeds of reads in Array C SOAP3 (1-mismatch) Process 1M seeds for round 1 and round 2 alignments Process round 3 alignment …….. Pair-up the seed alignments, Clear 2BWT index in GPU & Create DP tables in GPU Process 1M seeds for round 1 and round 2 alignments …….. Perform DP between the reads and the candidate regions Default DP Host (CPU) Memory-resident data structures Device (GPU) Memory-resident data structures DP tables 2BWT + SA Execution Create DP tables in GPU Execution Perform DP between the reads and the candidate regions Single-end alignment (overall workflow) Load 6M single-end reads SOAP3 (2-mismatch) single DP Yes More reads to process? No END Create a new CPU thread to load next 6M reads Paired-end alignment (For read length > 150) Load 6M reads (3M pairs) 2-level deep DP single DP Yes More reads to process? No END Create a new CPU thread to load next 6M reads