Sequence analysis course

Download Report

Transcript Sequence analysis course

Introduction to bioinformatics lecture 9 Multiple sequence alignment (II)

Scoring a profile position Profile 1 Profile 2

A C D .

.

Y A C D .

.

Y  At each position (column) we have different residue frequencies for each amino acid (rows)

SO:

  Instead of saying S=M(aa

1 , aa 2

) (one residue pair) For frequency f>0 (amino acid is actually there) we take: S  20 20  i j faa i  faa j  M(aa i , aa j )

Progressive alignment

1.

2.

3.

Perform pair-wise alignments of all of the sequences; Use the alignment scores to produces a dendrogram using neighbour-joining methods (guide-tree); Align the sequences sequentially, guided by the relationships indicated by the tree.

 Biopat (first method ever)  MULTAL (Taylor 1987)  DIALIGN (1&2, Morgenstern 1996)  PRRP (Gotoh 1996) 

ClustalW (Thompson et al 1994)

PRALINE (Heringa 1999)

T Coffee (Notredame 2000)

POA (Lee 2002)

MUSCLE (Edgar 2004)

Progressive multiple alignment

1 2 1 3 4 5 Score 1-2 Score 1-3 Score 4-5

5 ×5

Scores

Similarity matrix

Scores to distances Iteration possibilities

Guide tree Multiple alignment

General progressive multiple alignment technique

(follow generated tree)

d 1 3

root

1 3 2 5 1 3 2 5 1 3 2 5

PRALINE progressive strategy

d 1 3 1 3 2 1 3 2 5 4 1 3 2 5 4

There are problems …

Accuracy is very important !!!!

  Alignment errors during the construction of the MSA cannot be repaired anymore: propagated into the progressive steps.

The comparisons of sequences at early steps during progressive alignments cannot make use of information from other sequences.

 It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in the alignment steps.

“Once a gap, always a gap”

Feng & Doolittle, 1987

Additional strategies for multiple sequence alignment

Profile pre-processing

• Secondary structure-induced alignment • Globalised local alignment • Matrix extension Objective: try to avoid (early) errors

1 13 5 1 1 2 3 4 5 1 .

.

Y A C D Pi Px

Profile pre-processing

Score 1-2 Score 1-3 Score 4-5

Key Sequence

Pre-alignment

Master-slave (N-to-1) alignment

Pre-profile

1 2 1 2 3 4 5 2 1 3 4 5 5 5 1 2 3 4 1 2 1 3 4 5

Pre-profile generation

Score 1-2 Score 1-3 Pre-alignments Cut-off A C D .

.

Y A C D .

.

Y Score 4-5 Pre-profiles A C D .

.

Y

1 2 3 4 5 A C D .

.

Y A C D .

.

Y .

.

Y A C D .

.

Y A C D .

.

Y A C D

Pre-profile alignment

Pre-profiles 1 2 4 Final alignment

1 2 3 5 4 5 3 5 1 4 3 2 1

Pre-profile alignment

1 2 4 Final alignment

1 2 3 5 4 5 3 5 1 4 3 2 1

Pre-profile alignment

Alignment consistency

1 A131 A131 L133 C126 A131

Ala131

2 5

PRALINE pre-profile generation

• Idea: use the information from all query sequences to make a pre-profile for each query sequence that contains information from other sequences • You can use all sequences in each pre-profile, or use only those sequences that will probably align ‘correctly’. Incorrectly aligned sequences in the pre profiles will increase the noise level.

• Select using alignment score: only allow sequences in pre-profiles if their alignment with the score higher than a given threshold value. In PRALINE, this threshold is given as

prepro=1500

(alignment score threshold value is 1500 – see next two slides)

Flavodoxin-cheY consistency scores

(PRALINE prepro=0) 1fx1 --7899999999999 TEYTAETIARQL 8776-6657777777777777553799 VL 999 ST 97775599989-435566677798998878 AQGRKVACF FLAV_DESVH -46788999999999 TEYTAETIAREL 7777-7757777777777777553799 VL 999 ST 97775599989-435566677798998878 AQGRKVACF FLAV_DESDE -47899999999999999999999988776695658888777777778763 YDAVL 999 SAW 9877789877753556666669777776789 GRKVAAF FLAV_DESGI -46788999999999 TEGVAEAIAKTL 9997-76678888777777887539 DVVL 999 ST 987776--9889546667776697776557777888888 FLAV_DESSA 93677799999999999999999999988759765777888888888876399999999 2fcr --87899999999999 TEVADFIGK 996541900300000112233355679 DLLF STW 77765--9999536666677797998779999999999 4fxn -878779999999999999999999776666967567788888888888777999999988777776--9889577788888897773237888888888 FLAV_MEGEL 9776779999999999999999997777766-665666677788899976799999999987777669--887362334466695555455778888888 99999855312888111224555555407777777888888888 FLAV_ANASP -47899 LFYGTQTGKTESVAEIIR 9777653922356677777777897779999999999988843--9998555778777899998879999999999 FLAV_ECOLI 997789999GSDTGNTENIAKMI Q 8774222922456678889999995569999999999755553----99262225555495777767778999999 FLAV_AZOVI --79 IGLFFGSNTGKTRKVAKSIK FLAV_CLOAB -86999 ILYSSKTGKTERVAK 99887759657577888888999777899999999999877761112222222244555-5555555778999999 FLAV_ENTAG 94789999999999999999999998755229223234555555555555688899999998875521111111133477777-7777777999999999 9997555555057678887888887777765778899998522223--9888342234455597777777777777777 3chy 0122222223333335666665555555222922222222222221112163335555755553222888877674533344493332222222222222 Avrg Consist Conservation Avrg Consist Conservation 8667778888888889999999998776554844455566666666665557888888888766544887666334445566586666556778888888 0125538675848969746963946463343045244355446543473516658868567554455000000314365446505575435547747759 1fx1 G 888799955555559888888888899777----7777797787787978---555555566776555677777778888799----- FLAV_DESVH G 888799955555559888888888899777----7777797787787978---555555566776555677777778888799----- FLAV_DESDE A 88878685555555999988888889998879--8777788-98777777--8555555554433245667777777777599----- FLAV_DESGI 87775977755555677777777777777778---88888887667778777775555555555542424667888887777------- FLAV_DESSA 977768777555556777777777777777767887777777778888-978985555555556536556888888888877------- 4fxn 867777555555552666666666555555577887767999877777977777665555555555444466666666555798----- FLAV_MEGEL 8577775666666525556777778888888689977888988776558677885544333222222212233223355557------- 2fcr 877773573333333777766667777765533333333333333322833333333332244444567777777888777633----- FLAV_ANASP 977773775333344777888888777777733334444444444433833333344444444444455577777788777734----- FLAV_ECOLI 977743786444444777788888888888833334444444444444244444555554555775667788888888877734110000 FLAV_AZOVI 97776355333333466666667777777773333444444444444482333355555555555545558888888877772311--- FLAV_ENTAG 977773886555555866666666677666633333333333333322123333344444444455555665566666555582----- FLAV_CLOAB 766627222222212444444444455555587882222222222222111111122222222222344443333333233399----- 3chy 222227222222224111355431113324578-87778997666556877776322222222222322222323344444422----- 866656564444444666666666666666656665555565555555655565444443444443344455666666666666889999 73663057433334163464534444*746710000011010011000000010434744645443225474454448434301000000

Flavodoxin-cheY consistency scores

(PRALINE prepro=1500)

1fx1 -42444 IVYGSTTGNTEYTAETIARQL 886666666577777775667888 DLVLLGCSTW 77766----995476666769-77888788 AQGRKVACF FLAV_DESVH -34444 IVYGSTTGNTEYTAETIAREL 776666666577777775667888 DLVLLGCSTW 77766----995476666769-77888788 AQGRKVACF FLAV_DESSA -33444 IVYGSTTGNTET 99999888777655777668888899666686 YDIVLFGCSTW 77777----996466666779-88SL98 ADLKGKKVSVF FLAV_DESGI -34444 IVYGSTTGNTEGVA 9999999999765555677777886666678 DVVLLGCSTW 77777----995466666779-88887688888 KKVGVF FLAV_DESDE -44777 IVFGSSTGNTE 988777666655566777778899999777777Y DAVLFGCSAW 88877----997587777779-8887766777 GRKVAAF 4fxn -32222 IVYWSGTGNTE 8888888876666778888888888 NI 8888586 DILILGCSA 888888------8-8888886--66665378IS GKKVALF FLAV_MEGEL -12222 IVYWSGTGNTEAMA 8888888888888888555555555555485 DVILLGCPAMGSE 77------572222288--8888755588 GKKVGLF 2fcr -41456 IFFSTSTGNTTEVA 999998865432222765554443244779 YDLLFLGAPT9 44411999-111112454441-8D KLPEVDMKDLPVAIF FLAV_ANASP -00456 LFYGTQTGKTESVAEII 987755323322427776666623589 YQYLIIGCPTW 55532--999843678 W 988899998888888 GKLVAYF FLAV_AZOVI -42445 LFFGSNTGKTRKVAKSIK 87777434333536666665467777 YQFLILGTPTLGEG 862222222222355558-45666666888 KTVALF FLAV_ENTAG -266IG IFFGSDTGQTRKVAKLIHQKL 6664664424 DVRRATR 88888 SYPVLLLGTPT 88888644444444446 WQEF 8-8 NTLSEADLTGKTVALF FLAV_ECOLI -51114 IFFGSDTGNTENIAKMI 987743311111555555588355599 YDILLLGIPT 954431----88355225544--44666666779 KLVALF FLAV_CLOAB -63666 ILYSSKTGKTERVAKLIE 63333333333333333333366 LQESEGIIFGTPTY 63--6--------66 SWE 33333333333333 GKLGAAF 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ AGGYGFVI -- SDWNMPNM --------- DGLEL - LKTIRADGAMSALPVLM Avrg Consist Conservation 9334459999999999999999988776655555555666667756667889999999999767658888775555566668967777677889999999 0236428675848969746963946463344354312564565414344366588685675544550000003144654460055575345547747759 1fx1 G 98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899 FLAV_DESVH G 98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899 FLAV_DESSA G 98878-688688888-88--88999999999999979988888887788889-89-9787777666756645577776666654466899899 FLAV_DESGI G 98879-898688888987--788888999 GATLV 7698899-9998789888-8899787878776663122477788888333276899899 FLAV_DESDE A S8888-68-888888899--9999999999988888-99988888988778897888776668854222212255555555333277999999 4fxn GS 2228-228222222222--2388888888888888888888888888888888888887778866765535577555533221288888888 FLAV_MEGEL G 4888--28-8888882 MD - AWKQRTEDTGATVI 77---------------------77222--224444222222244222112------- 2fcr GLGDA 5-8 Y 5 DNFC 88-88--8877777777777765444555555555544385555777774465333357799999987555333899899 FLAV_ANASP GTGDQ 5 GY 5899999-99--99 EEKISQRGG 99975555544444444433284444466665555555556666676666433333899899 FLAV_AZOVI GLGDQ 5-885777555-55--55555788888888555555555555555554855555555555666555555888855555544442--288 FLAV_ENTAG FLAV_CLOAB GLGDQL STANS NYSKNFVSA MR - ILYDLVIARGACVVG 8888 EGYKFSFSAA 6664 NEFVGLPLDQEN 6366663333333333336666666666666666663333363366336663333336 88888 EERIDSWLE 88842242688688 FLAV_ECOLI GC99549784688888987997777777778888855444444444444444114444777774455775567788888887433322100100 EDENARIFGERIANKVKQI 333333666666 3chy VTAEA -- KKENIIAA ---------- AQAGAS ------------------------ GYVVK ---- PFTAATLEEKLNKIFEKLGM ----- Avrg Consist Conservation 9988779787777777777997788888888888866777777777767766677777676667766655455577776666433355788788 746640037154545706300354534444 * 745753000001010010000000010683760144442335574454448434301000000 Iteration 0 SP= 136702.00 AvSP= 10.654 SId= 3955 AvSId= 0.308

Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)

Strategies for multiple sequence alignment • Profile pre-processing

Secondary structure-induced alignment

• Globalised local alignment • Matrix extension Objective: integrate secondary structure information to anchor alignments and avoid errors

Protein structure hierarchical levels

PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands)

VHLTPEEKSAVTALWGKVNVDE VGGEALGRLLVVYPWTQRFFE SFGDLSTPDAVMGNPKVKAHG KKVLGAFSDGLAHLDNLKGTFA TLSELHCDKLHVDPENFRLLGN VLVCVLAHHFGKEFTPPVQAAY QKVVAGVANALAHKYH

QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

Why use (predicted) structural information

• “Structure more conserved than sequence” – Many structural protein families (e.g. globins) have family members with very low sequence similarities. For example, globin sequences identities can be as low as 10% while still having an identical fold.

• This means that you can still observe equivalent secondary structures in homologous proteins even if sequence similarities are extremely low.

• But you are dependent on the quality of prediction methods. For example, secondary structure prediction is currently at 76% correctness. So, 1 out of 4 predicted amino acids is still incorrect.

Two superposed protein structures with two well superposed helices Red : well superposed Blue : low match quality C5 anaphylatoxin -- human (PDB code 1kjs) and pig (1c5a)) proteins are superposed

How to combine ss and aa info

M D A A S T I L C G S

Dynamic programming search matrix

MDAGSTVILCFV HHHCCCEEEEEE H H H H C C E E E C C H C E H E

Amino acid substitution matrices

C Default

In terms of scoring…

• So how would you score a profile using this extra information?

– Same formula as in lecture 6, but you can use sec. struct. specific substitution scores in various combinations.

• Where does it fit in?

– Very important: structure is always more conserved than sequence so it can help with the insertion(or not) of gaps.

Sequences to be aligned Predict secondary structure

Secondary structure HHHHCCEEECCCEEECCHH CCCCCCEECCCEEEECCHH HHHCCCCEECCCEEHHH HHHHHCCEEEECCCEECCC HHHHHHHHHHHHHCCCEEEE

Align sequences using secondary structure

Multiple alignment

Using predicted secondary structure

1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF e eeee b ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b ee sss ee ttthhhhtt ttss tt eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf e eeeeee hhhhhhhhhhhhhhh eeeeee eeeeee hhhhhh eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf e eeeeee hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee hhhhhh eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf eeeeee hhhhhhhhhhhhhh eeeee eeeee hhhhhhh h eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf eeee hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee hhhhhhh hh eeeee 2fcr --K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF eeeee ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee stt s s s sthhhhhhhtggg tt eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLYSE-LDDVDFNGKLVAYf eeeee hhhhhhhhhhhh eee hhh hhhhhhheeeeee hhhhhhhhh eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QCDWDDFFPT-LEEIDFNGKLVALf eee hhhhhhhhhhhh eee hhh hhhhhhheeeee hhhhh eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf eee hhhhhhhhhhhhh hhh hhhhhhheeeee hhhhhhhhh eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf eeee hhhhhhhhhhhh hhh hhhhhhheeeee hhhhh eeeee 4fxn ----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF eeeee ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee btttb ttthhhhhhh hst t tt eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWEMKKWIDE-SSEFNLEGKLGAAf eee hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee hhhhhhhhh eeeee 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSALPVLMV tt eeee s hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s sss hhhhhhhhhh ttttt eeee 1fx1 GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI------- eee s ss sstthhhhhhhhhhhttt ee s eeees gggghhhhhhhhhhhhhh FLAV_DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI------- eee hhhhhhhhhhhh eeeee eeeee hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS---------------------SLKIDGE--P--DSAEVLDwAREVLARV------- eee hhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_DESSA GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD---------------------SLKIDGD--P--ERDEIVSwGSGIADKI------- hhhhhhhhhhhh eeeee e eee FLAV_DESDE ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL------- e hhhhhhhhhhhhhh eeeee ee hhhhhhhhhhh 2fcr GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV----- eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_ANASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL----- hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhh FLAV_ECOLI GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhhhh FLAV_AZOVI GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L- e hhhhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_ENTAG GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L----- hhhhhhhhhhhhhhh eeee hhhhhhh hhhhhhhhhhhh 4fxn G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI-------- e eesss shhhhhhhhhhhhtt ee s eeees ggghhhhhhhhhhhht FLAV_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNAPE-CKElGEAAAKA-------- hhhhhhhhhhh eeeee eeee h hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF- hhhhhhhhhhhhhh eeeee hhhh hhh hhhhhhhhhhhh h 3chy -----------TAEAKKENIIAAAQAGASGY-------------------------VVK----P-FTAATLEEKLNKIFEKLGM----- ess hhhhhhhhhtt see ees s hhhhhhhhhhhhhhht G

Strategies for multiple sequence alignment • Profile pre-processing • Secondary structure-induced alignment

Globalised local alignment

• Matrix extension Objectives:

Instead of single amino acid positions, focus on local alignments Consider best local alignment through each cell in DP matrix Try to avoid (early) errors

Globalised local alignment

1.

Local (SW) alignment (M + P o,e )

+ =

2.

Global (NW) alignment (no M or P o,e ) Double dynamic programming

Strategies for multiple sequence alignment

• Profile pre-processing • Secondary structure-induced alignment • Globalised local alignment

Matrix extension

Objective: try to avoid (early) errors

Integrating alignment methods and alignment information with T-Coffee

• Integrating different pair-wise alignment techniques (NW, SW, ..) • Combining different multiple alignment methods (consensus multiple alignment) • Combining sequence alignment methods with structural alignment techniques • Plug in user knowledge

Matrix extension T-Coffee

T

ree

-

based

C

onsistency

O

bjective

F

unction

F

or alignm

E

nt

E

valuation

Cedric Notredame Des Higgins Jaap Heringa

J. Mol. Biol.,

302, 205-217;2000

Using different sources of alignment information

Clustal Clustal Structure alignments Dialign Lalign Manual

T-Coffee

Search matrix extension – alignment transitivity

T-Coffee

Other sequences Direct alignment

Search matrix extension

but.....

T-COFFEE (V1.23) multiple sequence alignment Flavodoxin-cheY 1fx1 ----P KALIV YGSTTG NTEYTAETIARQLA NAG-Y EVDSR DA ASVE AGGLFEG FD LVLL GCSTWGDDSIE------LQDD FIPL F DSLEETGAQGR K ---- FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK---- FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK---- FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK---- FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK---- 4fxn ------M KIVYW SGTG NTEKMAELIAKGIIE SG-KDV NTIN VSDVN-I DELL N ED ILILGC SAMGDEVLE-------ESE FEPF IEEI S-TKISGK K ---- FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK---- FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL---- 2fcr -----K IGIFF STSTG NTTEVADFIGKTL GAKA---DAP ID VDDVTDPQAL KD YD LLFLGAP TWNTGA----DTERSGT SWDEFLYDKLPE VDMKDL P ---- FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT---- FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL---- FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT---- FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL---- 3chy ADKELK FLVV D--DF STMRRIVRNLLKEL GFN-N VE EA ED GVDALNKLQ A GGYG FVISD WNMPNMD GLE ------------- LLKTIRA DGAMSALP VLMV :. . . : . :: 1fx1 -------- VACFGCG DSS--YEYFC GA -V DAIEEKLK NLGAEIVQDG-------------------- LRID GDPRAA- RDDIVGWAHDVRGA I------- FLAV_DESVH ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI------- FLAV_DESGI ---------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV------- FLAV_DESSA ---------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI------- FLAV_DESDE ---------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL------- 4fxn -------- VALFGS ------YGWGDG KWMRDFEERMNG YGCVVVETP-------------------- LIVQ NEPD--EA EQDCIEFGKKIA NI-------- FLAV_MEGEL ---------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA-------- FLAV_CLOAB ---------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF---------- 2fcr -------- VAIFGLG DAEGYPDNFCD A IEEIHDCFAK QGAKPVGFSNPDDYDYEESKSVRDG-KFLG LPLD MVNDQIP MEKRVAGWVEAVVSET GV----- FLAV_ENTAG ---------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------ FLAV_ANASP ---------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL----- FLAV_AZOVI ---------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL--- FLAV_ECOLI ---------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy T AEAK KENIIAAAQ AGAS GYVV KPFT-- AATLEEKL NKIFEKLGM--------------------------------------------------------- .

Multiple alignment methods

 Multi-dimensional dynamic programming > extension of pairwise sequence alignment.

 Progressive alignment > incorporates phylogenetic information to guide the alignment process 

Iterative alignment

> correct for problems with progressive alignment by repeatedly realigning subgroups of sequence