Transcript Slides

CS 6293 AT: Current Bioinformatics
HW2
Papers
1. BLAT--The BLAST-Like Alignment Tool
2. Classification of DNA sequences using Bloom Filters
Course Intructor
Dr. Jianhua Ruan
Presenters
Husnu Narman
Nihat Altiparmak
BLAT--The BLAST-Like Alignment
Tool
W. James Kent (2002)
UCSC
Cited by 2229(Google Scholar)
Brief Information About BLAST
• BLAST: Basic Local Allignment Search Tool
• Find a gene in different kinds of databases
Divide query to
small part words
and compare
High Scoring
Segments
Pairs(HSP)
Evaluate, handle
exceptions, and
reports
Scan for exact
matches in HSP
List all of the
HSPs in the
database
Extend exact
matches to HSP
BLAT
• BLAT: The Blast-Like Alignment Tool
• Find a gene in different kinds of databases
• Why new search tool?
Differences between BLAST and BLAT
BLAST
BLAT
• Index of Query
• Triggers extension one
or two hit occur
• List of exons sorted by
size
• Index of Database
• Triggers extensions
any number perfect or
near perfect hits
• Look up location of a
sequence in genome
or determine exon
structure of a mRNA
Classification of DNA sequences
using Bloom Filters
Strannheim et al. (2010)
Stockholm, SWEDEN
Classification of DNA sequences using
Bloom Filters
• New generation sequencing technologies
– Complex datasets
– New efficient, specialized sequence analysis algorithms
• Often, only noval sequences required, unnecessary
sequences(belonging to a known genome) need to be
removed
• A new algorithm(FACS) to classify sequences as
belonging or not belonging to a reference sequence
• Source code available at;
– http://facs.biotech.kth.se
Bloom Filter
• A memory efficient data structure for testing
whether an element is part of a reference set
• m bit vector with k hash functions
• Never returns a false negative; may however
return a false positive
• Optimal number of hash functions;
𝑚
𝑘 = 𝑙𝑛2
𝑛
Example Bloom Filter
x
y
z
0 1
0 0 1
0 1
0 1
0 0 0 0 0 0 1
0 0 1
0 0 0 1
0 0
√
√
w
𝑚 = 18, 𝑘 = 3
x
Method
• Bloom filter is created from the reference
sequence with desired K-mer and false
positive rate.
• The query sequences are then classified by
using the bloom filter
Evaluation
• Experimental metagenome dataset(Allander et al.
2005) containing 177184 reads
• Analysis using human genome as a reference
• FACS, BLAT and SSAHA2 compared
21x
31x
Evaluation
False Positive Rate(Missed)
False Positive Rate
Percentage (%)
0.06
0.05
0.04
0.03
0.02
0.01
0
FACS
BLAT
SSAHA2
Any Questions?