Transcript Slides

Sequence Alignment technology
Chengwei Lei
Fang Yuan
Saleh Tamim
Goal
• Save time
“PASS: a Program to Align Short Sequences
Davide Campagna et al. Bioinformatics (2009)”
• Save money
“Optimal pooling for genome re-sequencing with ultrahigh-throughput short-read technologies,
Iman Hajirasouliha, Bioinformatics (2008) ”
Keywords in both paper
• Reference sequence:
A long Genomic sequence.
• Short reads:
Input short strings. e.g. ATGCGTAC
Save time – PASS program
• PASS, a new algorithm to align short DNA
sequences allowing gaps and mismatches.
• The performance of the program is very
striking both for sensitivity and speed. For
instance, gap alignment is achieved hundreds
of times faster than BLAST and several times
faster than SOAP, especially when gaps are
allowed.
PASS
• Program to Align Short Sequences
• Performs gapped and ungapped alignment
onto a reference sequence
• Seed words (11 and 12 bases)
• Short reads (7 and 8 bases)
• PST - calculated with the Needleman and
Wunsch algorithm supplied with PASS
• Handles data generated by Solexa, SOLiD
or 454 technologies
Approach/Algorithm
Analysis and Results
• Comparison of PASS with SOAP
• PASS has better sensitivity with words of 11 and
runs at least 10 times faster
Save money - Optimal pooling method
• A set of experiments using the Solexa technology, based
on bacterial artificial chromosome (BAC) clones, and
address an experimental design problem.
• Basic idea: More than one BAC per lane in order to
maximize the throughput of the Solexa technology, hence
minimize its cost.
Inputs
Input strings (short reads)
Reference sequences
Normal pooling method
• One other hurdle in designing a globally optimal experiment is
the rapid proliferation of number of possible configurations. For
instance, if we would like to pool m=150 BACs into 15 groups of
size=10, we would need to consider
Infeasible to search all
these configurations
Optimal Pooling method
Optimal Pooling method
Input strings (short reads)
Reference sequence
Optimal Pooling method
Input strings (short reads)
Reference sequences
Optimal Pooling method
Reference sequences
Input strings (short reads)
Pool
Problem
• How to separate the groups of short reads?
Optimal Pooling method
Reference sequence
Input strings (short reads)
Pool
Two cases
Result
Conclusion
• Program for Save time
“PASS: a Program to Align Short Sequences
Davide Campagna et al. Bioinformatics (2009)”
• Algorithm for Save money
“Optimal pooling for genome re-sequencing with ultrahigh-throughput short-read technologies,
Iman Hajirasouliha, Bioinformatics (2008) ”
Q&A