Introduction to bioinformatics

Download Report

Transcript Introduction to bioinformatics

C
E
N
T
R
E
F
O
R
I
N
T
E
G
R
A
T
I
V
E
B
I
O
I
N
F
O
R
M
A
T
I
C
S
V
U
Introduction to bioinformatics
2007
Lecture 10
Multiple Sequence Alignment (II)
Progressive multiple alignment
1
2
1
3
Score 1-2
4
5
Score 4-5
Score 1-3
Scores
5×5
Scores to distances
Guide tree
Similarity
matrix
Iteration possibilities
Multiple alignment
Progressive alignment strategy
1. Perform pair-wise alignments of all of
the sequences (all against all; e.g.
make N(N-1)/2 alignments);
2. Use the alignment scores to make a
similarity (or distance) matrix
3. Use that matrix to produce a guide
tree;
4. Align the sequences successively,
guided by the order and relationships
indicated by the tree (N-1 alignment
steps).
Progressive alignment strategy
Methods:
Biopat
(Hogeweg and Hesper 1984 -- first integrated
method ever)
MULTAL
(Taylor 1987)
DIALIGN
PRRP
(1&2, Morgenstern 1996)
(Gotoh 1996)
ClustalW
(Thompson et al 1994)
PRALINE
(Heringa 1999)
T-Coffee
POA
(Notredame 2000)
(Lee 2002)
MUSCLE
(Edgar 2004)
PROBSCONS
(Do, 2005)
Pair-wise alignment quality versus sequence identity
(Vogt et al., JMB 249, 816-831,1995)
Flavodoxin fold: aligning 13 Flavodoxins + cheY
5() fold
Flavodoxin-cheY NJ tree
Flavodoxin fold: helix-beta-helix
Flavodoxin family - TOPS diagrams
The basic topology of
the flavodoxin fold is
given below, the other
four TOPS diagrams
show flavodoxin folds
with local insertions of
secondary structure
elements.
4
5
4
5
3 2
3
1
1
2
-helix
-strand
Flavodoxin-cheY NJ tree
Flavodoxin-cheY: Pre-processing (prepro1500)
Protein structure hierarchical levels
PRIMARY STRUCTURE (amino acid sequence)
SECONDARY STRUCTURE (helices, strands)
VHLTPEEKSAVTALWGKVNVDE
VGGEALGRLLVVYPWTQRFFE
SFGDLSTPDAVMGNPKVKAHG
KKVLGAFSDGLAHLDNLKGTFA
TLSELHCDKLHVDPENFRLLGN
VLVCVLAHHFGKEFTPPVQAAY
QKVVAGVANALAHKYH
QUATERNARY STRUCTURE (oligomers)
TERTIARY STRUCTURE (fold)
Clustal, ClustalW, ClustalX
• CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining
(NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic
analysis, to construct a guide tree (see lecture on phylogenetic
methods).
• Sequence blocks are represented by profile, in which the individual
sequences are additionally weighted according to the branch
lengths in the NJ tree.
• Further carefully crafted heuristics include:
– (i) local gap penalties
– (ii) automatic selection of the amino acid substitution matrix, (iii) automatic
gap penalty adjustment
– (iv) mechanism to delay alignment of sequences that appear to be distant at
the time they are considered.
• CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper,
1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)
ClustalW web-interface
CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
FLAV_CLOAB
FLAV_MEGEL
4fxn
FLAV_ANASP
FLAV_AZOVI
2fcr
FLAV_ENTAG
FLAV_ECOLI
3chy
-PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK
MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK
MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK
MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK
MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK
-MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID-ESSEFNLEGKL
--MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK
---MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK
SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL
-AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT
--KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP
MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT
-AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP-TLEEIDFNGKL
--ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--.
...
:
.
.
:
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
FLAV_CLOAB
FLAV_MEGEL
4fxn
FLAV_ANASP
FLAV_AZOVI
2fcr
FLAV_ENTAG
FLAV_ECOLI
3chy
VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS----------------LKIDGEPDSAE--VLDWAREVLARV--------------VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS----------------LKIDGDPERDE--IVSWGSGIADKI--------------VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG----------------LKMEGDASNDPEAVASFAEDVLKQL--------------GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF----------VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA----------------IVN-EMPDNAPECKE-LGEAAAKA---------------VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP----------------LIVQNEPDEAEQDCIEFGKKIANI---------------VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL-----VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA
AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS----------------GYV-VKPFTAATLEEKLNKIFEKLGM-------------. .
:
.
.
The secondary structures of 4 sequences are known and can be
used to asses the alignment (red is -strand, blue is -helix)
There are problems …
Accuracy is very important !!!!

Progressive multiple alignment is a greedy strategy:
Alignment errors during the construction of the MSA cannot
be repaired anymore and these errors are propagated
into later progressive steps.

Comparisons of sequences at early steps during progressive
alignment cannot make use of information from other
sequences.

It is only later during the alignment progression that more
information from other sequences (e.g. through profile
representation) becomes employed in the alignment steps.
Progressive multiple alignment
“Once a gap, always a gap”
Feng & Doolittle, 1987
Additional strategies for multiple
sequence alignment
• Profile pre-processing
(Praline)
• Secondary structure-induced
alignment
• Globalised local alignment
• Matrix extension
Objective: try to avoid (early) errors
PRALINE web-interface
Profile pre-processing
1
2
1
3
4
5
Score 1-2
Score 1-3
Score 4-5
1
1
1
2
3
4
5
A
C
D
.
.
Y
Pi
Px
Key Sequence
Pre-alignment
Master-slave (N-to-1)
alignment
Pre-profile
Pre-profile generation
1
2
1
3
Score 1-2
4
5
Score 4-5
Score 1-3
Cut-off
1
1
2
3
4
5
2
2 134
5
5
5
1
2
3
4
Pre-alignments
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
Pre-profiles
Pre-profile alignment
Pre-profiles
1
2
3
4
5
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
A
C
D
.
.
Y
Final alignment
A
C
D
.
.
Y
1
2
3
4
5
Pre-profile alignment
1
2
3
4
5
12
3
4
5
21
3
4
5
31
2
4
5
41
2
3
5
5
1
2
3
4
Final alignment
1
2
3
4
5
Pre-profile alignment
Alignment consistency
1
2
3
4
5
12
3
4
5
21
3
4
5
1
2
31
2
4
5
41
2
3
5
5
1
2
3
4
5
Ala131
A131
A131
L133
C126
A131
PRALINE pre-profile generation
• Idea: use the information from all query sequences to
make a pre-profile for each query sequence that
contains information from other sequences
• You can use all sequences in each pre-profile, or use
only those sequences that will probably align
‘correctly’. Incorrectly aligned sequences in the preprofiles will increase the noise level.
• Select using alignment score: only allow sequences in
pre-profiles if their alignment with the score higher
than a given threshold value. In PRALINE, this
threshold is given as prepro=1500 (alignment score
threshold value is 1500 – see next two slides)
Reliable sequences for pre-profiles
The curve each time gives the
number of pairwise
alignments (y) scoring less
than x. The range
1500<x<1800 shows a flat
section of the curve that can
serve as a natural cut-off
point for admitting sequences
into the pre-alignment blocks
Global pre-processing
(prepro0)
Preprocessed profile for sequence 2:
2fcr
1fx1
4fxn
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD
KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDSRDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACFGCGDS-SY-E
-MKIVYWSGTGNTEKMAELIAKGISGKDVNTINVSDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKKVALGSYGWGDGKWMRD
KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD
KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNVNRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE
KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEGKLGAAfSTANAGGSDI
KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTLLNAADASALADYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E
KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDEIELQEDFVP-LYEDLDRAGLKDKKVGVfGCGDS-SY-T
KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVELKNVTDVSVANGYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T
KALIVYGSTTGNTEYTaETIAREL-ADAGYEVDSRDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACfGCGDS-SY-E
AIGIFFGSDTGNTENIaKMIQKQLG--KDV-ADVHDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE
TIGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPTLGDGLPGVEAGSSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK
MVEIVYWSGTGNTEAMaNEIEAAVAAGADVSVRFED-TNVDDVASKDVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGVDALNKLQA-GGYGFVI---SDWNM---PNMDGL---ELLKTIRADGAMSALPVLMV---TAEAKKE
2fcr
1fx1
4fxn
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV
YFCGAVDAIEEKLKNLGA----------------EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI--FEERMNG-YGCVVVE--TPLIVQNEPD----EAE---------------QDCIEFGKKIANI---------NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL
NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL
ALLTILNHVKgMLVYSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQIF----HFCGAVPAI-----EERAKELg-----------ATIIAEG--LKMEGDASND--P--EAVASfAEDVLKQL-YFCGAVDVIEKKAEELgATLVA----------SSLKI-DGE-------------PDSAEVLDwAREVLARV-YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI------------------YFCGAVDAIEEKLKNLgA----------------EIVQD----GLRID--GDPRAARDDIVGwAHDVRGAI-YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEELHL
NFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEKL--KPAV
EWMDAWKQRTE---DTgATVIG-----------TAIVNE-----MP-----DNAP-ECKElG--EAAAKA--NIIAA--------AQAGAS--GY------------VVK--PFTAATLE--------EK-----LNKIFEKLGM
Iteration -1
SP= 127728.00
AvSP= 10.705
SId= 3764
AvSId= 0.315
Global pre-processing
(prepro0)
Preprocessed profile for sequence 3:
4fxn
1fx1
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE
ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGSYEYFCGA-VDAIE
IGIFFSTSTGNTTEVADFIGKTL--GAKADAPIDVDDVTDPQALKDDLLFLGANTGADTERSGTSWDEFLYDKLPEVDMKDLPV-AIFGLGDAEGYPDFC
IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAYfGTIGYADNDAIGILE
IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALNVNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVALfGQVGYPEGELYSFFK
MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAAfSTAGGSDIALLTILN
VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEYVPAIE
ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGSYTYFCGA-VDVIE
MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGDYTYFCGA-VDAIE
ALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGSYEYFCGA-VDAIE
TGIFFGSDTGNTENIaKMIQK---QLGKDVADVDIAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGDYAFCDAGTIRDIE
IGIFFGSDTGQTRKVaKLIHQK-LDGIADA-PLDVRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALfGNYSKNFVSAMRILY
VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK
DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVEEAEDGVD-ALNK-LQAGGYGVISDWNMPNMDGLELLKTI--RADGAMSALPVLMVTAEAKKENIIA
4fxn
1fx1
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI
EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHDVRGA
DAIEEHDCFAKQKPVGFSNPDDESKNDQIPMEKRVAGW
EKISGYGSKALRNGKFVGLALDEDNQDLTDDRIKVAQL
DRTDGYEAVVVGLALDLDNQSGKTDERVAAwLAQIAPE
HLMKgYGGVAFGKPYVHINEIQENEDENARfGERiANk
ERAKELgATIIAEGLKMEGDASNDPEAVASfAEDVLKQ
KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV
EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIADI
EKLKNLgAEIVQDGLRIDGDPRAARDDIVGwAHDVRGA
PRTAGYGLAFVGLAIDEDRQPELTAERVEKwVKQISEE
DLVIARgCVVGNWPLLENNEPDQENQDLTELEKKPAVL
QRTEDTgATVIGT-AIVNEMPDNA-PECKElGEAAAKA
AAQAGASGYVVK-PFTAATLEEKLNKIFEKLGM-----
Iteration -1
SP= 121196.00
AvSP= 10.075
SId= 3288
AvSId= 0.273
Reliable sequences for pre-profiles
Pre-profiles (prepro1500)
1
2
Pre-profiles (prepro1500)
13
14
Local pre-processing
Local alignments
are calculated
from high to low
scoring – each
time the sequence
parts
corresponding to a
selected local
alignment are
blocked such that
a next local
alignment has to
emerge before or
after the earlier
selected one –
this preserves colinearity of the
local alignments
and assocaited
sequence
fragments in the
pre-alignments
Local pre-processing
(locprepro0)
Preprocessed profile for sequence 2: 2fcr
2fcr
1fx1
4fxn
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD
...IVYGSTTGNTEYTAETIARQL---ANAGYEVDDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACFGCGDS-SY-E
KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INVSDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKGKKVALFGWGDGKGYGKIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD
KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE
KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTY-------YANISWEKWI-DESSEFNLEGKLGAAfSTANSAGGSD
KVLIVFGSSTGNTESIaQKLEELIAAAADA--SAENLAD-----GYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E
...IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDIELQ----EDFLYEDLDRAGLKDKKVGVfGCGDS-SY-T
...IVYGSTTGNTETAaEYVAEAFENK---EIDVENVTD-VSVADYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T
...IVYGSTTGNTEYTaETIAREL---ADAGYEVDDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACfGCGDS-SY-E
..GIFFGSDTGNTENIaKMIQKQLG-K-----DVADVHDKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE
.IGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPT--LG-DGELPGVSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK
.VEIVYWSGTGNTEAMaNEIEKAAGADVESDTNVDDV----ASK--DVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--...........................................................ADKELKFLVVDDFIVRNL----LKEL-----GFNNVEEAED
2fcr
1fx1
4fxn
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV
YFCDAIEE------K--LKNLG-----------AEIVQD----GLRID--GD--PRAARIVGWAHDV......
--CVVVE-----------TPLIVQNPDE---AEQDCIEFGK................................
NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL
NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL
---IALLTIH-LMVKSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQI......
HFCGAVPAI-----EERAKELg-----------ATIIAEGKMEG---DASND--P--EAVASfAEDVLKQ...
YFCGAVDVIEKKAEELgATLVASSEPD------SAEVLD..................................
YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI...................
YFCDAIEE------K--LKNLg-----------AEIVQD----GLRID--GD--PRAARIVGwAHDV......
YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEE...
NFVSAMRILYDLVIARgACVVG--NPEGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEAVL.....
EWMDAWKQTED----TgATVIGTANPDN.............................................
G-VDALNKLQ-------AGGYGFSNMPNMDLELLKTIRDGAMSALPVLMVTAEAKKENIIAGYVAATLEE...
Local pre-processing
(locprepro0)
Preprocessed profile for sequence 3: 4fxn
4fxn
1fx1
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE
..IVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGC---GDSSYVDAIE
.KIIFFSSTGNTTEVADFIGKTL---GAKADAIDVDDVTDPQALKDDLLFLGAPTTGADT-ERSSWDEFLPEVDMK--DLPVAIF---GLGDAE-----..LFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAYfGTIGYADGKWSTDFN
..LFFGSNTGKTRKVaKSIKKRFDETMSD--ALNVNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVALfGQVGYGEGSWSTD-MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIALLGGVAFGKPK-----..IVFGSSTGNTEKLEELIAAG----GHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEY-EHFE
..IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGC---GDSSYTYDIE
..IVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGC---GDS----DYE
..IVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGC---GDSSYVDAIE
..IFFGSDTGNTENIaKMIQK---QLGKDV--ADVHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGC---GD---QEDYA
..IFFGSDTGQTRKVaKLIHQGIADAPLDVRR-----ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALf---GLGDQNYSKNFV
VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK
.RIV......N...LKEL---GFVEEAEDVDALNISDPNMDELLRADVLMVTAEAKKENIIAAAQVKPFLEEKLNKIFEK....................
4fxn
1fx1
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_CLOAB
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
FLAV_DESVH
FLAV_ECOLI
FLAV_ENTAG
FLAV_MEGEL
3chy
ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI
EKLKNLGAEIVQDGLRIDGDPRAARDDIV.........
----GYPCDAIEKPVGFSN-PDDEESKSVRDGK.....
DSRNGVGLALDE-----DNQSDLTD-DRIEFG......
----GYEAVVVGLALDLDNQTDELAQIAPEFG......
THL-GY----VHINEIQENEDENAR---I-fGERiAN.
ERAKELgATIIAEGLKMENDP-EAAEDVLK........
KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV
EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIAD.
EKLKNLgAEIVQDGLRIDGDPRAARDDIV.........
E----YFCDALGTDII---EP.................
SAMRg-ACVVGNWPLLENNEPDQENQDLTE........
QRTEDTgATVIGTAIV--NEPDNA-PECKElGE.....
......................................
CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
FLAV_CLOAB
FLAV_MEGEL
4fxn
FLAV_ANASP
FLAV_AZOVI
2fcr
FLAV_ENTAG
FLAV_ECOLI
3chy
-PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK
MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK
MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK
MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK
MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK
-MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID-ESSEFNLEGKL
--MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK
---MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK
SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL
-AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT
--KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP
MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT
-AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP-TLEEIDFNGKL
--ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--.
...
:
.
.
:
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
FLAV_CLOAB
FLAV_MEGEL
4fxn
FLAV_ANASP
FLAV_AZOVI
2fcr
FLAV_ENTAG
FLAV_ECOLI
3chy
VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS----------------LKIDGEPDSAE--VLDWAREVLARV--------------VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS----------------LKIDGDPERDE--IVSWGSGIADKI--------------VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG----------------LKMEGDASNDPEAVASFAEDVLKQL--------------GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF----------VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA----------------IVN-EMPDNAPECKE-LGEAAAKA---------------VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP----------------LIVQNEPDEAEQDCIEFGKKIANI---------------VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL-----VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA
AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS----------------GYV-VKPFTAATLEEKLNKIFEKLGM-------------. .
:
.
.
Flavodoxin-cheY: Pre-processing (prepro1500)
1fx1
FLAV_DESDE
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
2fcr
FLAV_AZOVI
FLAV_ENTAG
FLAV_ANASP
FLAV_ECOLI
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
-PKALIVYGSTTGNT-EYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACF
MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-EEFNRFGLAGRKVAAf
MPKALIVYGSTTGNT-EYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACf
MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-DSLENADLKGKKVSVf
MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-EDLDRAGLKDKKVGVf
--KIGIFFSTSTGNT-TEVADFIGKTLGA---KADAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLYDKLPEVDMKDLPVAIF
-AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-PKIEGLDFSGKTVALf
MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-NTLSEADLTGKTVALf
SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DVVTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-SELDDVDFNGKLVAYf
-AITGIFFGSDTGNT-ENIaKMIQKQLGK---DVADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-PTLEEIDFNGKLVALf
-MK--IVYWSGTGNT-EKMAELIAKGIIESG-KDVNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-EEIS-TKISGKKVALF
MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-TDLA-PKLKGKKVGLf
-MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-DESSEFNLEGKLGAAf
ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NVEEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-KTIRADGAMSALPVLM
1fx1
FLAV_DESDE
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
2fcr
FLAV_AZOVI
FLAV_ENTAG
FLAV_ANASP
FLAV_ECOLI
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------GCGDS-DY-TYFCGA-VDAIEEKLEKMgAVVIGD---------------------SLKIDGD--PE--RDEIVSwGSGIADKI-------GCGDS-SY-TYFCGA-VDVIEKKAEELgATLVAS---------------------SLKIDGE--PD--SAEVLDwAREVLARV-------GLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKS-VRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----GLGDQVGYPENYLDA-LGELYSFFKDRgAKIVGSWSTDGYEFESSEA-VVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-GLGDQLNYSKNFVSA-MRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L-----GTGDQIGYADNFQDA-IGILEEKISQRgGKTVGYWSTDGYDFNDSKA-LRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL-----GCGDQEDYAEYFCDA-LGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
G-----SY-GWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------G-----SY-GWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNA-PECKElGEAAAKA--------STANSIAGGSDIA---LLTILNHLMVKgMLVYSG----GVAFGKPKTHLGYVHINEIQENEDENARIfGERiANkVKQIF----------VTAEAKK--ENIIAA---------AQAGAS-------------------------GYVV-----KPFTAATLEEKLNKIFEKLGM------
Iteration 0
T
G
SP= 136944.00
AvSP= 10.675
SId= 4009
AvSId= 0.313
Flavodoxin-cheY: Local Pre-processing
(locprepro300)
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
--PKALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACF
-MPKALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACf
-MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--YDSLENADLKGKKVSVf
-MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--YEDLDRAGLKDKKVGVf
-MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--FEEFNRFGLAGRKVAAf
--MK--IVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--IEEIS-TKISGKKVALF
-MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--FTDLA-PKLKGKKVGLf
---KIGIFFSTSTGNTTEVADFIGKTLGAKADAPI--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-YDKLPEVDMKDLPVAIF
-SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTLH--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--YSELDDVDFNGKLVAYf
--AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSDA-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--LPKIEGLDFSGKTVALf
-MATIGIFFGSDTGQTRKVaKLIHQKLDG--IADAPLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--TNTLSEADLTGKTVALf
--AITGIFFGSDTGNTENIaKMIQKQLGKDVADVH--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--FPTLEEIDFNGKLVALf
--MKISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWIDESSEFNLEGKLGAAf
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEIVQD---------------------GLRID--GDPRAARDDIVGWAHDVRGAI-------GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEIVQD---------------------GLRID--GDPRAARDDIVGwAHDVRGAI-------GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVVIGD---------------------SLKID--GDPE--RDEIVSwGSGIADKI-------GCGDS--SY-TYFCGA-VD--VIEKKAEELgATLVAS---------------------SLKID--GEPD--SAEVLDwAREVLARV-------ASGDQ--EY-EHFCGA-VP--AIEERAKELgATIIAE---------------------GLKME--GDASNDPEAVASfAEDVLKQL-------GS------Y-GWGDGKWMR--DFEERMNGYGCVVVET---------------------PLIVQ--NEPDEAEQDCIEFGKKIANI--------GS------Y-GWGSGEWMD--AWKQRTEDTgATVIGT---------------------AI-VN--EMPDNA-PECKElGEAAAKA--------GLGDAE-GYPDNFCDA-IE--EIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----GTGDQI-GYADNFQDA-IG--ILEEKISQRgGKTVGYWSTDGYDFNDSKALRN-GKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL-----GLGDQV-GYPENYLDA-LG--ELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-GLGDQL-NYSKNFVSA-MR--ILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L-----GCGDQE-DYAEYFCDA-LG--TIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
STANSIAGGSDIALLTILNHLMVKgMLVYSGGVAFGKPKTHLGYVH----------INEIQENEDENARIfGERiANkVKQIF----------VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------
G
Strategies for multiple sequence
alignment
• Profile pre-processing
• Secondary structure-induced
alignment (Praline-SS)
• Globalised local alignment
• Matrix extension
Objective: integrate secondary structure
information to anchor alignments and avoid
errors
Protein structure hierarchical levels
PRIMARY STRUCTURE (amino acid sequence)
SECONDARY STRUCTURE (helices, strands)
VHLTPEEKSAVTALWGKVNVDE
VGGEALGRLLVVYPWTQRFFE
SFGDLSTPDAVMGNPKVKAHG
KKVLGAFSDGLAHLDNLKGTFA
TLSELHCDKLHVDPENFRLLGN
VLVCVLAHHFGKEFTPPVQAAY
QKVVAGVANALAHKYH
QUATERNARY STRUCTURE (oligomers)
TERTIARY STRUCTURE (fold)
Why use (predicted) structural
information
• “Structure more conserved than sequence”
– Many structural protein families (e.g. globins) have family
members with very low sequence similarities. For example,
globin sequences identities can be as low as 10% while still
having an identical fold.
• This means that you can still observe equivalent
secondary structures in homologous proteins even if
sequence similarities are extremely low.
• But you are dependent on the quality of prediction
methods. For example, secondary structure prediction is
currently at 76% correctness. So, 1 out of 4 predicted
amino acids is still incorrect.
Two superposed protein structures
with two well-superposed helices
Red: well
superposed
The superposed
structures lead to
close pairs of C
atoms that are taken
as equivalent – this
leads to a structural
alignment in which
the amino acids
corresponding to
equivalent C atom
pairs are matched
Blue: low
match quality
C5 anaphylatoxin -- human (PDB code 1kjs) and pig
(1c5a)) proteins are superposed
How to combine secondary structure
and amino acid information
Amino acid
substitution
matrices
Dynamic programming
search matrix
M
D
A
A
S
T
I
L
C
G
S
MDAGSTVILCFV
HHHCCCEEEEEE
H
H H
H
H
C
C
C
E
E
E
C
C
H
E
E
C
Default
In terms of scoring…
• So how would you score a profile using this
extra information?
– Same way of scoring as before, but you can use
sec. struct. specific substitution scores in
various combinations.
• Where does it fit in?
– Very important: structure is always more
conserved than sequence so secondary structure
elements can help anchoring the alignments
Sequences to be aligned
Predict secondary structure
Secondary
structure
HHHHCCEEECCCEEECCHH
CCCCCCEECCCEEEECCHH
HHHCCCCEECCCEEHHH
HHHHHCCEEEECCCEECCC
HHHHHHHHHHHHHCCCEEEE
Align sequences using secondary structure
Multiple
alignment
Using predicted secondary structure
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
4fxn
FLAV_MEGEL
FLAV_CLOAB
3chy
-PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF
e eeee b ssshhhhhhhhhhhhhhttt eeeee stt
tttttt seeee b ee sss
ee ttthhhhtt ttss tt eeeee
MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf
e eeeeee
hhhhhhhhhhhhhhh
eeeeee
eeeeee
hhhhhh
eeeee
MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf
e eeeeee
hhhhhhhhhhhhhh
eeeeee
hhhhhh eeeeeee
hhhhhh
eeeeee
MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf
eeeeee
hhhhhhhhhhhhhh
eeeee
eeeee
hhhhhhh h
eeeee
MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf
eeee
hhhhhhhhhhhhhh
eeeee
hhhhhhhhhhheeeee
hhhhhhh hh
eeeee
--K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF
eeeee ssshhhhhhhhhhhhhggg
b
eeggg s gggggg seeeeeee stt s
s s sthhhhhhhtggg
tt eeeee
SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLYSE-LDDVDFNGKLVAYf
eeeee
hhhhhhhhhhhh
eee
hhh hhhhhhheeeeee
hhhhhhhhh
eeeeee
-AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QCDWDDFFPT-LEEIDFNGKLVALf
eee
hhhhhhhhhhhh
eee
hhh hhhhhhheeeee
hhhhh
eeeeee
-AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf
eee
hhhhhhhhhhhhh
hhh hhhhhhheeeee
hhhhhhhhh
eeeeee
MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf
eeee
hhhhhhhhhhhh
hhh hhhhhhheeeee
hhhhh
eeeee
----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF
eeeee ssshhhhhhhhhhhhhhhtt
eeeettt sttttt seeeeee btttb
ttthhhhhhh hst t tt eeeee
M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf
hhhhhhhhhhhhhh
eeeee
hhhhhhhh eeeee
eeeee
M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWEMKKWIDE-SSEFNLEGKLGAAf
eee
hhhhhhhhhhhhhh
eeeeee
hhhhhhhhhh eeee
hhhhhhhhh
eeeee
ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSALPVLMV
tt eeee s hhhhhhhhhhhhhht
eeeesshh hhhhhhhh
eeeee
s sss
hhhhhhhhhh ttttt eeee
GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------eee s ss sstthhhhhhhhhhhttt ee s
eeees
gggghhhhhhhhhhhhhh
GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------eee
hhhhhhhhhhhh
eeeee
eeeee
hhhhhhhhhhhhhh
GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS---------------------SLKIDGE--P--DSAEVLDwAREVLARV-------eee
hhhhhhhhhhhh
eeeee
hhhhhhhhhhh
GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD---------------------SLKIDGD--P--ERDEIVSwGSGIADKI-------hhhhhhhhhhhh
eeeee
e
eee
ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------e
hhhhhhhhhhhhhh
eeeee
ee
hhhhhhhhhhh
GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV-----eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhht
GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL-----hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhh
GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA
hhhhhhhhhhhhhh
eeee
hhhhhhhhhhhhhhhhhh
GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-e
hhhhhhhhhhhhhh
eeeee
hhhhhhhhhhh
GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L-----hhhhhhhhhhhhhhh
eeee
hhhhhhh
hhhhhhhhhhhh
G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------e
eesss shhhhhhhhhhhhtt ee s
eeees
ggghhhhhhhhhhhht
G-----SYGWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNAPE-CKElGEAAAKA--------hhhhhhhhhhh
eeeee
eeee
h hhhhhhhh
STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF-hhhhhhhhhhhhhh eeeee
hhhh hhh
hhhhhhhhhhhh h
-----------TAEAKKENIIAAAQAGASGY-------------------------VVK----P-FTAATLEEKLNKIFEKLGM-----ess hhhhhhhhhtt see
ees
s
hhhhhhhhhhhhhhht
G
Strategies for multiple sequence
alignment
not for exam
• Profile pre-processing
• Secondary structure-induced alignment
• Globalised local alignment
• Matrix extension
Objectives:
• Instead of single amino acid positions, focus on local
alignments
• Consider best local alignment through each cell in DP matrix
• Try to avoid (early) errors
Globalised local alignment
not for exam
1. Local (SW) alignment (M + Po,e)
+
=
2. Global (NW) alignment (no M or Po,e)
Double dynamic programming
Globalised local alignment
not for exam
1.
2.
M = BLOSUM62, Po= 0, Pe= 0
not for exam
M = BLOSUM62, Po= 12, Pe= 1
not for exam
M = BLOSUM62, Po= 60, Pe= 5
not for exam
Strategies for multiple sequence
alignment
• Profile pre-processing
• Secondary structure-induced alignment
• Globalised local alignment
• Matrix extension
Objective: try to avoid (early) errors
Integrating alignment methods
and alignment information with
T-Coffee
• Integrating different pair-wise alignment
techniques (NW, SW, ..)
• Combining different multiple alignment
methods (consensus multiple alignment)
• Combining sequence alignment methods
with structural alignment techniques
• Plug in user knowledge
Matrix extension
T-Coffee
Tree-based Consistency Objective Function
For alignmEnt Evaluation
Cedric Notredame (“Bioinformatics for dummies”)
Des Higgins
Jaap Heringa
J. Mol. Biol., 302, 205-217;2000
Using different sources of alignment information
Clustal
Clustal
Structure alignments
Dialign
Lalign
Manual
T-Coffee
T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight
3
3
V31
V31
5
6
L33
L34
10
14
5
5
L33
l33
6
6
R35
I36
21
35
Matrix extension
1
1
1
2
2
3
2
3
4
3
4
4
Search matrix extension – alignment transitivity
T-Coffee
Other
sequences
Direct
alignment
Search matrix extension
T-COFFEE web-interface
3D-COFFEE
• Computes structural based
alignments
• Structures associated with
the sequences are retrieved
and the information is used
to optimise the MSA
• More accurate … but for
many (many) proteins we
do not have the structure!
but.....
T-COFFEE (V1.23) multiple sequence alignment
Flavodoxin-cheY
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
4fxn
FLAV_MEGEL
FLAV_CLOAB
2fcr
FLAV_ENTAG
FLAV_ANASP
FLAV_AZOVI
FLAV_ECOLI
3chy
1fx1
FLAV_DESVH
FLAV_DESGI
FLAV_DESSA
FLAV_DESDE
4fxn
FLAV_MEGEL
FLAV_CLOAB
2fcr
FLAV_ENTAG
FLAV_ANASP
FLAV_AZOVI
FLAV_ECOLI
3chy
----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-------MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK-------MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK-------MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK-------MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK----------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK---------MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK--------MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL---------KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP-------MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT-------SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL--------AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT--------AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL----ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV
:.
.
. :
.
::
---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI----------------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI----------------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV----------------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI----------------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL----------------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI-----------------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA-----------------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF-------------------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV--------------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL---------------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL--------------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL------------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA
TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM---------------------------------------------------------.
Multiple alignment methods
 Multi-dimensional dynamic programming
> extension of pairwise sequence alignment.
 Progressive alignment
> incorporates phylogenetic information to guide the
alignment process
 Iterative alignment
> correct for problems with progressive alignment by
repeatedly realigning subgroups of sequence
Iteration
Iteration can help in cases
where one can learn from the
data produced in a preceding
step, so that the next step can
be taken in a ‘more informed’
way.
Convergence
Limit cycle
Divergence
Pre-profile alignment
Alignment consistency
1
2
3
4
5
12
3
4
5
21
3
4
5
1
2
31
2
4
5
41
2
3
5
5
1
2
3
4
5
Ala131
A131
A131
L133
C126
A131
Flavodoxin-cheY consistency scores
(PRALINE prepro=0)
Completely
consistently
aligned amino
acids
1fx1
FLAV_DESVH
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
FLAV_CLOAB
3chy
--7899999999999TEYTAETIARQL8776-6657777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF
-46788999999999TEYTAETIAREL7777-7757777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF
-47899999999999999999999988776695658888777777778763YDAVL999SAW9877789877753556666669777776789GRKVAAF
-46788999999999TEGVAEAIAKTL9997-76678888777777887539DVVL999ST987776--9889546667776697776557777888888
93677799999999999999999999988759765777888888888876399999999STW77765--9999536666677797998779999999999
-878779999999999999999999776666967567788888888888777999999988777776--9889577788888897773237888888888
9776779999999999999999997777766-665666677788899976799999999987777669--887362334466695555455778888888
--87899999999999TEVADFIGK996541900300000112233355679DLLF99999855312888111224555555407777777888888888
-47899LFYGTQTGKTESVAEIIR9777653922356677777777897779999999999988843--9998555778777899998879999999999
997789999GSDTGNTENIAKMIQ8774222922456678889999995569999999999755553----99262225555495777767778999999
--79IGLFFGSNTGKTRKVAKSIK99887759657577888888999777899999999999877761112222222244555-5555555778999999
94789999999999999999999998755229223234555555555555688899999998875521111111133477777-7777777999999999
-86999ILYSSKTGKTERVAK9997555555057678887888887777765778899998522223--9888342234455597777777777777777
0122222223333335666665555555222922222222222221112163335555755553222888877674533344493332222222222222
Avrg Consist
Conservation
8667778888888889999999998776554844455566666666665557888888888766544887666334445566586666556778888888
0125538675848969746963946463343045244355446543473516658868567554455000000314365446505575435547747759
1fx1
FLAV_DESVH
FLAV_DESDE
FLAV_DESGI
FLAV_DESSA
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_ECOLI
FLAV_AZOVI
FLAV_ENTAG
FLAV_CLOAB
3chy
G888799955555559888888888899777----7777797787787978---555555566776555677777778888799-----G888799955555559888888888899777----7777797787787978---555555566776555677777778888799-----A88878685555555999988888889998879--8777788-98777777--8555555554433245667777777777599-----87775977755555677777777777777778---88888887667778777775555555555542424667888887777-------977768777555556777777777777777767887777777778888-978985555555556536556888888888877-------867777555555552666666666555555577887767999877777977777665555555555444466666666555798-----8577775666666525556777778888888689977888988776558677885544333222222212233223355557-------877773573333333777766667777765533333333333333322833333333332244444567777777888777633-----977773775333344777888888777777733334444444444433833333344444444444455577777788777734-----977743786444444777788888888888833334444444444444244444555554555775667788888888877734110000
97776355333333466666667777777773333444444444444482333355555555555545558888888877772311---977773886555555866666666677666633333333333333322123333344444444455555665566666555582-----766627222222212444444444455555587882222222222222111111122222222222344443333333233399-----222227222222224111355431113324578-87778997666556877776322222222222322222323344444422------
Avrg Consist
Conservation
866656564444444666666666666666656665555565555555655565444443444443344455666666666666889999
73663057433334163464534444*746710000011010011000000010434744645443225474454448434301000000
Iteration 0
SP= 135136.00
AvSP= 10.473
SId= 3838
AvSId= 0.297
Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)
Flavodoxin-cheY consistency scores
(PRALINE prepro=1500)
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
-42444IVYGSTTGNTEYTAETIARQL886666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
-34444IVYGSTTGNTEYTAETIAREL776666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF
-33444IVYGSTTGNTET99999888777655777668888899666686YDIVLFGCSTW77777----996466666779-88SL98ADLKGKKVSVF
-34444IVYGSTTGNTEGVA9999999999765555677777886666678DVVLLGCSTW77777----995466666779-88887688888KKVGVF
-44777IVFGSSTGNTE988777666655566777778899999777777YDAVLFGCSAW88877----997587777779-8887766777GRKVAAF
-32222IVYWSGTGNTE8888888876666778888888888NI8888586DILILGCSA888888------8-8888886--66665378ISGKKVALF
-12222IVYWSGTGNTEAMA8888888888888888555555555555485DVILLGCPAMGSE77------572222288--8888755588GKKVGLF
-41456IFFSTSTGNTTEVA999998865432222765554443244779YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF
-00456LFYGTQTGKTESVAEII987755323322427776666623589YQYLIIGCPTW55532--999843678W988899998888888GKLVAYF
-42445LFFGSNTGKTRKVAKSIK87777434333536666665467777YQFLILGTPTLGEG862222222222355558-45666666888KTVALF
-266IGIFFGSDTGQTRKVAKLIHQKL6664664424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-8NTLSEADLTGKTVALF
-51114IFFGSDTGNTENIAKMI987743311111555555588355599YDILLLGIPT954431----88355225544--44666666779KLVALF
-63666ILYSSKTGKTERVAKLIE63333333333333333333366LQESEGIIFGTPTY63--6--------66SWE33333333333333GKLGAAF
ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM
Avrg Consist
Conservation
9334459999999999999999988776655555555666667756667889999999999767658888775555566668967777677889999999
0236428675848969746963946463344354312564565414344366588685675544550000003144654460055575345547747759
1fx1
FLAV_DESVH
FLAV_DESSA
FLAV_DESGI
FLAV_DESDE
4fxn
FLAV_MEGEL
2fcr
FLAV_ANASP
FLAV_AZOVI
FLAV_ENTAG
FLAV_ECOLI
FLAV_CLOAB
3chy
G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899
G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899
G98878-688688888-88--88999999999999979988888887788889-89-9787777666756645577776666654466899899
G98879-898688888987--788888999GATLV7698899-9998789888-8899787878776663122477788888333276899899
AS8888-68-888888899--9999999999988888-99988888988778897888776668854222212255555555333277999999
GS2228-228222222222--2388888888888888888888888888888888888887778866765535577555533221288888888
G4888--28-8888882MD--AWKQRTEDTGATVI77---------------------77222--224444222222244222112-------GLGDA5-8Y5DNFC88-88--8877777777777765444555555555544385555777774465333357799999987555333899899
GTGDQ5-GY5899999-99--99EEKISQRGG99975555544444444433284444466665555555556666676666433333899899
GLGDQ5-885777555-55--55555788888888555555555555555554855555555555666555555888855555544442--288
GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG8888EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE88842242688688
GC99549784688888987997777777778888855444444444444444114444777774455775567788888887433322100100
STANS6366663333333333336666666666666666663333363366336663333336EDENARIFGERIANKVKQI333333666666
VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------
Avrg Consist
Conservation
9988779787777777777997788888888888866777777777767766677777676667766655455577776666433355788788
746640037154545706300354534444*745753000001010010000000010683760144442335574454448434301000000
Iteration 0
SP= 136702.00
AvSP= 10.654
SId= 3955
AvSId= 0.308
Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)
Consistency iteration
Pre-profiles
Multiple
alignment
positional
consistency
scores
Pre-profile update iteration
Pre-profiles
Multiple
alignment
Iterate similarity matrix, guide tree and MSA
1
2
1
3
4
5
Score 1-2
Score 1-3
Score 4-5
Scores
5×5
Guide tree
Similarity
matrix
This way of iterating
was already
implemented in 1984
by Hogeweg and
Hesper
Multiple alignment
Secondary structure-induced
alignment
PRALINE
Using secondary structure for
alignment
Dynamic programming
search matrix
M
D
A
A
S
T
I
L
C
G
S
Amino acid exchange
weights matrices
MDAGSTVILCFV
HHHCCCEEEEEE
H
H H
H
H
C
C
E
E
E
C
C
H
C
C
E
E
Default
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Flavodoxin-cheY multiple alignment/
secondary structure iteration
cheY SSEs
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
| EEEEEEE
HHHHHHHHHHHHHHHHH
E HHHHHHHHHH HHHEEE
|
| EEEEEEEE
HHHHHHHHHHHHHHH
HHHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHH EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
EEE
HHHHHH
EEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHH
EEEEEE
|
| EEEEEEEE
HHHHHHHHHHHHHH
HHHHHHHHHH
EEEEE
|
3chy-AA SEQUENCE||
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
AA
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHH HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHEEEEEE
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
| HHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEE
HHHHHHHHHHHHHH
|
|
HHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEEE
HHHHHHHHHHHHHH
|
Is the initial SS prediction good enough?
3chy-AA SEQUENCE||
AA
|ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|
|
|
|
|
|
|
|
|
|
3chy-AA SEQUENCE||
AA
|NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|
3chy-ITERATION-0||
3chy-ITERATION-1||
3chy-ITERATION-2||
3chy-ITERATION-3||
3chy-ITERATION-4||
3chy-ITERATION-5||
3chy-ITERATION-6||
3chy-ITERATION-7||
3chy-ITERATION-8||
3chy-ITERATION-9||
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
PHD
|
|
|
|
|
|
|
|
|
|
EEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
EEEEEEEE
HHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHEEEEEE
HHHHHHEEEEEE
HHHHHHEEEEEE
HHHHHHHHHHHH
HHHHH
HHHHHHHH
HHHHHHHH
HHHHHHHH
HHHHHHHH
HHHHHHHH
E
HHHHHHHHHH HHHEEE
HHHHHHHH
EEEEEE
HHHHHHHHH EEEEEE
EEE
HHHHHH
EEEEE
HHHHHHH
EEEEE
EEE
HHHHHH
EEEEE
HHHHHHHH EEEEEE
EEE
HHHHHH
EEEEE
HHHHHHH
EEEEEE
HHHHHHHHHH
EEEEE
HHHHHHHHHHHHHHHHH
HHH HHHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
HHHHHHHHHHHHHHHHHH
EEEEE HHHHHHHHHHHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEEEE
HHHHHHHHHHHHHHHH
EEEEE
HHHHHHHHHHHHHHHH
EEEEE
HHHHHHHHHHHHHHH
EEE
EEE
EEE
EEE
EEE
EEEE
EEE
EEE
EEEE
|
|
|
|
|
|
|
|
|
|
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
HHHHHHHHHHHHHH
|
|
|
|
|
|
|
|
|
|
MUSCLE
Edgar 2004
PRALINE and MUSCLE method
• PRALINE and MUSCLE use almost the same
formalism to compare two profiles:
• MUSCLE:
LE
xy
 (1  f
x
G
)( 1  f
y
G
) log
 
i
j
x
f
i
f
p ij
y
j
pi p j
• PRALINE:
score
xy
 (1  f
x
G
)( 1  f
y
G
)
i

j
f
x
i
f
y
j
log
p ij
pi p j
The difference is the position of the log in the above equations:
Edgar calls the Muscle scoring scheme “Log-expectation score (LE)”
So what do we do ?
• A single shot for a good alignment without thinking:
MUSCLE, T-COFFEE, PROBCONS (maybe POA)
• If you want to experiment with making alignments for
a given sequence set: PRALINE
–
–
–
–
Profile pre-processing
Iteration
Secondary structure-induced alignment
Globalised local alignment
• There is no single method that always generates the
best alignment
• Therefore best is to use more than one method: e.g.
include Dialign2 (local)
Recap
• Weighting schemes to use information from all sequences
right from the start during the progressive MSA protocol:
– Profile pre-processing (global/local) (PRALINE)
– Matrix extension (well balanced scheme) (T-Coffee)
• Smoothing alignment signals:
– globalised local alignment (PRALINE)
– Consistency based mixing of local and global alignment
(T-Coffee)
• Using additional information:
– secondary structure driven alignment (PRALINE)
• Iterative schemes to alleviate the ‘greediness’ of the
progressive MSA protocol:
– Profile pre-processing iteration (PRALINE)
– secondary structure driven iteration (PRALINE)
– ‘classical’ distance matrix iteration
– Binary cutting of guide tree and realignment of groups
(MUSCLE)
References
• Heringa, J. (1999) Two strategies for sequence
comparison: profile-preprocessed and secondary
structure-induced multiple alignment. Comp. Chem. 23,
341-364.
• Notredame, C., Higgins, D.G., Heringa, J. (2000) TCoffee: a novel method for fast and accurate multiple
sequence alignment. J. Mol. Biol., 302, 205-217.
• Heringa, J. (2002) Local weighting schemes for protein
multiple sequence alignment. Comput. Chem., 26(5),
459-477.
• Simossis, V.A., Kleinjung, J. and Heringa, J. (2005)
Homology-extended sequence alignment. Nucleic Acids
Res. 33(3):816-824.