Multiple Sequence Alignment

Download Report

Transcript Multiple Sequence Alignment

Multiple Sequence Alignment
Urmila Kulkarni-Kale
Bioinformatics Centre
University of Pune
[email protected]
October 2k5
1
Approaches: MSA
• Dynamic programming
• Progressive alignment: ClustalW
• Genetic algorithms: SAGA
October 2k5
2
Progressive alignment approach
•
•
•
•
•
•
Align most related sequences
Add on less related sequences to initial alignment
Perform pairwise alignments of all sequences
Use alignment scores to produce phylogenetic tree
Align sequences sequentially, guided by the tree
Gaps are added to an existing profile in
progressive methods
October 2k5
3
No of pairwise alignments: N*(N-1)/2
October 2k5
4
October 2k5
5
Pairwise alignment:
Calculate the distance matrix
Unrooted Neighbor-joining tree
Rooted NJ tree
Sequence weights
Progressive alignment using
Guide tree
October 2k5
Steps in ClustalW Algorithm
6
ClustalW: weight
• groups of related sequences receive lower
weight
• highly divergent sequences without any
close relatives receive high weights
October 2k5
7
ClustalW: affine Gap penalty
• GOP: Gap Opening Penalty
• GEP: Gap Extension Penalty
Heuristics in calculating gap penalty
• Position specific penalty
– gap at position?
• yes  lower GOP and GEP
• no, but gap within 8 residues  increase GOP
– stretch of hydrophilic residues?
• yes  lower GOP
• no  use residue-specific gap propensities
Once a gap, always a gap
October 2k5
8
Variation in local GOP
Lowest GOP in
Hydrophilic
regions
Highest GOP in ‘Gapped regions’
Initial GOP
October 2k5
9
MSA: help detect Similarity
Hemoglobin:
Human, chimpanzee,
Goat, pig, horse & mouse
October 2k5
10
Sample MSA
October 2k5
11
Applications of MSA
•
•
•
•
•
•
Detecting diagnostic patterns
Phylogenetic analysis
Primer design
Prediction of protein secondary structure
Finding novel relationships between genes
Similar genes conserved across organisms
– Same or similar function
• Simultaneous alignment of similar genes yields:
– regions subject to mutation
– regions of conservation
– mutations or rearrangements causing change in
conformation or function
October 2k5
12
Limitations of Progressive
alignment approach
• Greedy nature
• Any errors in the initial alignment are
carried through
• More efficient for closely related sequences
than for divergent sequences
October 2k5
13