msa2015 12542

Download Report

Transcript msa2015 12542

Ultra-large alignments using
Ensembles of HMMs
Nam-phuong Nguyen
Institute for Genomic Biology
University of Illinois at Urbana-Champaign
UPP: Ultra-large alignment
• UPP: Ultra-large alignments using Phylogenyaware Profiles
• Objective: Estimate accurate alignments on
large datasets, which may be evolutionarily
divergent and contain fragmentary sequences
• Nguyen N., Mirarab S., Kumar K., and Warnow,
T. RECOMB 2015.
UPP Algorithmic Strategy
RNASim: alignment error
1 Million RNASim:
UPP(Fast) generated an
alignment in 12 days
compared to 15 days for
PASTA. UPP(Fast)
resulted in a better
alignment (5.7% lower
error), but PASTA
resulted in a better tree
(1.5% lower error).
Note: All methods given 24 hrs on a 12-core machine.
Mafft fails to complete on 200K sequences. Clustal-Omega
only completes on 10K dataset.
Running Time
Wall-clock time used (in hours) given 12 processors
Ensemble of HMMs
• Use a collection of HMMs instead of a single
HMM to represent a backbone alignment
• Improves alignment accuracy, which can lead
to better downstream analyses
– Phylogenetic placement (SEPP; PSB 2012)
– Taxonomic identification (TIPP, Bioinformatics 2014)