Transcript Slides
CS 6293 AT: Current Bioinformatics HW2 Papers 1. BLAT--The BLAST-Like Alignment Tool 2. Classification of DNA sequences using Bloom Filters Course Intructor Dr. Jianhua Ruan Presenters Husnu Narman Nihat Altiparmak BLAT--The BLAST-Like Alignment Tool W. James Kent (2002) UCSC Cited by 2229(Google Scholar) Brief Information About BLAST • BLAST: Basic Local Allignment Search Tool • Find a gene in different kinds of databases Divide query to small part words and compare High Scoring Segments Pairs(HSP) Evaluate, handle exceptions, and reports Scan for exact matches in HSP List all of the HSPs in the database Extend exact matches to HSP BLAT • BLAT: The Blast-Like Alignment Tool • Find a gene in different kinds of databases • Why new search tool? Differences between BLAST and BLAT BLAST BLAT • Index of Query • Triggers extension one or two hit occur • List of exons sorted by size • Index of Database • Triggers extensions any number perfect or near perfect hits • Look up location of a sequence in genome or determine exon structure of a mRNA Classification of DNA sequences using Bloom Filters Strannheim et al. (2010) Stockholm, SWEDEN Classification of DNA sequences using Bloom Filters • New generation sequencing technologies – Complex datasets – New efficient, specialized sequence analysis algorithms • Often, only noval sequences required, unnecessary sequences(belonging to a known genome) need to be removed • A new algorithm(FACS) to classify sequences as belonging or not belonging to a reference sequence • Source code available at; – http://facs.biotech.kth.se Bloom Filter • A memory efficient data structure for testing whether an element is part of a reference set • m bit vector with k hash functions • Never returns a false negative; may however return a false positive • Optimal number of hash functions; 𝑚 𝑘 = 𝑙𝑛2 𝑛 Example Bloom Filter x y z 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 √ √ w 𝑚 = 18, 𝑘 = 3 x Method • Bloom filter is created from the reference sequence with desired K-mer and false positive rate. • The query sequences are then classified by using the bloom filter Evaluation • Experimental metagenome dataset(Allander et al. 2005) containing 177184 reads • Analysis using human genome as a reference • FACS, BLAT and SSAHA2 compared 21x 31x Evaluation False Positive Rate(Missed) False Positive Rate Percentage (%) 0.06 0.05 0.04 0.03 0.02 0.01 0 FACS BLAT SSAHA2 Any Questions?