Transcript Slides

A Fast Hybrid Short Read
Fragment Assembly Algorithm
Introduction
•
•
•
•
Second-generation DNA technologies
Traditional : Sanger shotgun techniques
New techniques(2007 & 2008):
SSAKE, UCAKE and SHARCGS
--based on greedy extension
• Edena, Velvet, Euler-SR
--based on graph
Taipan Method: Two steps
• 1. Greedy Extension
• iteratively extended by one base at a time both in
3’ direction and 5’ direction
• 2. Graph-based Method
• to assembly the constructed contig from previous
step.
Example
• Usage:
taipan -f {inputfilename} -k {minimal_overlap} [-t {threshold}] [-o {seed_occ}] [-v
{verbose}] [-c {min_contig_length}]
• Result:
Optimal spliced alignments of
short sequence reads
Fabio De Bona
Bioinfromatics, 2008
Genome VS Transcriptome
• Analysis sequence reads from genomic DNA
Sequence assemble
Align them to the genome
• Transcriptome analysis
First align the single reads to the genome
Then merges the alignments to infer gene structures.
Genome VS Transcriptome
• Reconstruct the
whole genome from
cDNA data
• Reconstruct the
transcriptome from
EST data
(transcripted cDNA)
DNA
Problem Formulation
DNA
Limitation:
1 read length of the NG is relatively small.
2 read error rate(assuming 5%)
General Description
Smith-Waterman
– Quality Score
– Slicing Site Info
– Intron Length
Method
3. With Slicing Info
1. Original
2. With Quality Score
4. With Intron
Test Data
• 10 000 sequences with known alignments
• three different scorings
1. quality information
2. splice site predictions
3. intron length