Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II Original Algorithm - mrFAST Goal : Find out matched coordination of fragment.
Download ReportTranscript Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II Original Algorithm - mrFAST Goal : Find out matched coordination of fragment.
Parallelization of mrFAST on GPGPU Hongyi Xin, Donghyuk Lee Milestone II Original Algorithm - mrFAST Goal : Find out matched coordination of fragment on reference …ACAGTAACTATT ACAAAAAAAACACGATTCAGATTAAACATAACATACGACCCTTACACTG… Address: 1225 Algorithm Reference DNA Sequence Sample fragment Sequence Create hash table AAAA Coordinate 1 Coordinate 2 Coordinate 3 AAAC 1225 Coordinate 2 Coordinate 3 Coordinate 1 Coordinate 2 Coordinate 3 -------TTTT Get coordinate list Compare against reference for each coordinate by Edit-distance calculation --- Expansive! Problem - High cost of edit-distance calculation (High complexity and memory accesses) 1 memory access to hash table. / 188 in average Reference DNA lookups. At least 108 character compares and at lest 324 addes Average 188 edit distance calculation for each Fragment! 2 Edit-Distance Calculation 3 New Idea : Binary Search Filtering Insight Search expected coordinate of each fragment's substring with hash table. Pros. + Avoid accessing to the reference sequence. + Less memory access. Individual DNA Sequence ACCCTTACACTAAAAA …CAGTACCCTTACACTAAAAAGTMTTCCAAACC… m AAAA m+4 m+8 Reference DNA Sequence m+12 Coordinate f 1 Coordinate m+12 2 Coordinate n+11 3 Coordinate m 1 Coordinate n 2 Coordinate p 3 Coordinate d 1 Coordinate m+8 2 Coordinate n+7 3 Coordinate m+4 1 Coordinate n+4 2 Coordinate t 3 Coordinate 1 Coordinate 2 Coordinate 3 ------- ACCC ------ACTA ------TTAC -------TTTT 4 Load imbalance of Hash-table These keys have really large entries 5 New Idea : Prefiltering to load balancing Insight Pick the cheap keys in binary search filtering, which has small coordinate list size Pros. + Reducing # of binary search. + Balancing computation Load of binary search. AAAATTACACTAAAAA AAAA TTAC # of same pattern Large Small # of coordinate Large Small # of computation Large Small Individual DNA Sequence Balance the load of binary search computation by selecting key, based on the coordinate size. Effectiveness of Binary Search Filtering We want all dots to fall into the left box. As left as possible! 7 Effectiveness of Binary Search Filtering 8 Future Work Implement in GPU Analyze the load imbalance problem Coordinates passed binary search filtering may vary Solve the divergence problem Edit distance may diverge Divergence is bad for GPU SIMT model 9 Q&A Thank you! 10