Scoring multiple sequence alignments

Download Report

Transcript Scoring multiple sequence alignments

Multiple Sequence Alignment
Julie Thompson
Laboratory of Integrative Bioinformatics and Genomics
IGBMC, Strasbourg, France
[email protected]
Multiple Sequence Alignment


Introduction: what is a multiple alignment?
Multiple alignment construction




Multiple alignment analysis



Traditional approaches: optimal, progressive
Alignment parameters
Iterative and co-operative approaches
Quality analysis/error detection
Conserved/homologous regions
Multiple alignment applications
Julie Thompson – IGBMC
What is a multiple alignment?

a representation of a set of sequences, where equivalent residues (e.g.
functional, structural) are aligned in rows or more usually columns
Example: part of an alignment of SH2 domains from 14 sequences
lnk_rat
crk1_mouse
nck_human
ht16_hydat
pip5_human
fer_human
1ab2
1mil
1blj
1shd
1lkkA
1csy
1bfi
1gri
Julie Thompson – IGBMC
* conserved identical residues
: conserved similar residues
What is a multiple alignment?
conserved residues
conservation profile
Julie Thompson – IGBMC
secondary structure
Multiple Sequence Alignment


Introduction: what is a multiple alignment?
Multiple alignment construction




Multiple alignment analysis



Traditional approaches: optimal, progressive
Alignment parameters
Iterative and co-operative approaches
Quality analysis/error detection
Conserved/homologous regions
Multiple alignment applications
Julie Thompson – IGBMC
Multiple Alignment Construction

Optimal multiple alignment
example : MSA (Lipman et al. 1989, Gupta et al. 1995)
Julie Thompson – IGBMC
Optimal multiple alignment
Extension of dynamic programming for 2 sequences => N dimensions
Example : alignment of 3 sequences
Problem : calculation time and memory requirements
Time proportional to Nk for k sequences of length N => limited to less than 10 sequences
Alignment of 5 sulfate binding proteins, length 224-263 residues:
MSA
OMA
ClustalW
>12hours
62.9min
0.6sec
Julie Thompson – IGBMC
Multiple Alignment Construction

Optimal multiple alignment
MSA, OMA

Progressive multiple alignment
ClustalW (Thompson et al. NAR. 1994)
ClustalX (Thompson et al. NAR. 1997)
Julie Thompson – IGBMC
Progressive multiple alignment
Idea :
Progressively align pairs of sequences (or groups of sequences)
Problem :
Start with which sequences ? How to decide order of alignment ?
 first align the most closely related sequences
How to measure the similarity of the sequences ?
 align all the sequences pairwise
 calculate the similarity between each pair from the alignment
Julie Thompson – IGBMC
Progressive multiple alignment
1) Pairwise alignments of all sequences
The alignment can be obtained by :
- local or global method
- dynamic programming or heuristic method (eg. K-tuple count)
Hbb_human
Hba_human
Ex : local pairwise
alignments of globin
sequences
Hbb_human
Hbb_horse
Hba_human
Hbb_horse
Julie Thompson – IGBMC
3 LTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLST ...
|.| :|. | | |||| . | | ||| |: . :| |. :| | |||
2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLS. ...
1 VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST ...
| |. |||.|| ||| ||| :|||||||||||||||||||||:||||||
1 VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSN ...
2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLSH ...
|| :| | | | ||
| | ||| |: . :| |. :| | |||.
3 LSGEEKAAVLALWDKVNEE..EVGGEALGRLLVVYPWTQRFFDSFGDLSN ...
Progressive multiple alignment
2) Construction of
a distance matrix
Example in ClustalW/X :
distance between 2 sequences = 1-
Ex : 7 globin sequences
Julie Thompson – IGBMC
Hbb_human
Hbb_horse
Hba_human
Hba_horse
Myg_phyca
Glb5_petma
Lgb2_lupla
1
2
3
4
5
6
7
.17
.59
.59
.77
.81
.87
1
.60
.59
.77
.82
.86
2
.13
.75
.73
.86
3
No. identical residues
No. aligned residues
.75
.74
.88
4
.80
.93
5
.90
6
7
Progressive multiple alignment
3) Decide order of alignment
• Sequential branching
• Construction of a ‘guide tree’
- Neigbor-Joining (NJ)
- UPGMA
- Maximum likelihood
Progressive alignment using sequential branching
Hba_human
Hba_horse
Hbb_horse
Hbb_human
Glb5_petma
Myg_phyca
Lgb2_lupla
Julie Thompson – IGBMC
Progressive alignment following a guide tree
.081
.226
1
.061
2
3
.015
4
.062
5
6
6
5
4
3
2.084
.055
.219
1.065
Hbb_human
Hbb_horse
Hba_human
Hba_horse
.398
Myg_phyca
.389
Glb5_petma
.442
Lgb2_lupla
Progressive multiple alignment
4) Progressive multiple alignment
The sequences are aligned progressively (global or local algorithm) :
- alignment of 2 sequences
- alignment of 1 sequence and a profile (group of sequences)
- alignment of 2 profiles (groups of sequences)
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx
Julie Thompson – IGBMC
Progressive multiple alignment
H1
H2
H3
H4
HBB_HUMAN
HBB_HORSE
HBA_HUMAN
HBA_HORSE
MYG_PHYCA
GLB5_PETMA
LGB2_LUPLU
--------VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDN
--------VQLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDN
---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSAQVKGHGKKVADALTNAVAHVDD
---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-----HGSAQVKAHGKKVGDALTLAVGHLDD
---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGH
PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDD
--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQV
*: :
: * .
:
.:
*:
* :
..
.:: *.
: .
HBB_HUMAN
HBB_HORSE
HBA_HUMAN
HBA_HORSE
MYG_PHYCA
GLB5_PETMA
LGB2_LUPLU
-----LKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH----------LKGTFAALSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH----------MPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR----------LPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR----------HEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
T--EKMSMKLRDLSGKHAKSFQVDPQYFKVLAAVIADTVAAG---------DAGFEKLMSMICILLRSAY------TGVVVTDATLKNLGSVHVSKG-VADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--: *. * . :
: .
:
: .:
...
.
:
Julie Thompson – IGBMC
H5
H6
H7
Progressive multiple alignment
Global
Local
SBpima
SB
multal
NJ
ML
MLpima
SB - sequential branching
Julie Thompson – IGBMC
UPGMA
clustalx
multalign
pileup
UPGMA- Unweighted Pair Grouping Method
ML - maximum likelihood
NJ - neighbor-joining
Alignment parameters : similarity matrices
Dynamic programming methods score an alignment using residue similarity
matrices, containing a score for matching all pairs of residues
For nucleotide sequences:
A
A 2
C -2
G -1
T -2
C
-2
2
-2
-1
G
-1
-2
2
-2
T
-2
-1
-2
2
Transitions (A-G or C-T) are more frequent
than transversions (A-T or C-G)
More complex matrices exist where matches between ambiguous
nucleotides are given values whenever there is any overlap in the sets of
nucleotides represented
Julie Thompson – IGBMC
Alignment parameters : similarity matrices
For proteins, a wide variety of matrices exist:
Identity, PAM, Blosum, Gonnet etc.
Matrices are generally constructed by observing the mutations
in large sets of alignments, either sequence-based or
structure-based
Matrices range from strict ones for comparing closely related
sequences to soft ones for very divergent sequences.
e.g. PAM250 corresponds to an evolutionary distance of 250%, or
approximately 80% residue divergence
PAM1 corresponds to less than 1% divergence
Julie Thompson – IGBMC
Alignment parameters : similarity matrices
A single best matrix does not exist!
 Altschul, 1991 suggests PAM250 for related sequences, PAM120 when
the sequences are not known to be related and PAM40 to search for short
segments of highly similar sequences.
 Henikoff, Henikoff, 1993 suggest Blosum62 as a good all-round matrix,
Blosum45 for more divergent sequences and Blosum100 for strongly
related sequences
 ClustalW automatically selects a suitable matrix depending on the
observed pairwise % identity:
By default:
ID >35%
35%>ID >25%
<25%ID
Julie Thompson – IGBMC
Gonnet 80
Gonnet 250
Gonnet 350
Alignment parameters : gap penalties
A gap penalty is a cost for introducing gaps into the alignment,
corresponding to insertions or deletions in the sequences
SFGDLSNPGAVMG
HF-DLS-----HG
proportional gap costs charge a fixed penalty for each
residue aligned with a gap - the cost of a gap is proportional to
its length:
GAP_COST=uk where k is the length of gap
linear or ‘affine’ gap costs define a cost for introducing or
‘opening’ a gap, plus a length-dependent ‘extension’ cost
GAP_COST=v+uk where v is the gap opening cost,
u is the gap extension cost
Julie Thompson – IGBMC
Alignment parameters : gap penalties
ClustalW uses position-specific gap penalties to make gaps more or less
likely at different positions in the alignment
30
20
10
0
HLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDL
QLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDL
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS
VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLS
Gap penalties are lowered at existing gaps and increased near to existing gaps
Gap penalties are lowered in hydrophilic stretches
Otherwise, gap opening penalties are modified according to their observed
relative frequencies adjacent to gaps (Pascarella & Argos, 1992)
Goal is to introduce gaps in sequence segments
corresponding to flexible regions of the protein structure
Julie Thompson – IGBMC
Multiple Alignment Construction

Optimal multiple alignment
MSA, OMA

Progressive multiple alignment
ClustalW, ClustalX

Iterative multiple alignment
PRRP (Gotoh, 1993)
SAGA (Notredame et al. NAR. 1996)
DIALIGN (Morgenstern et al. 1999)
HMMER (Eddy 1998), SAM (Karplus et al. 2001)
Julie Thompson – IGBMC
Iterative refinement
PRRP (Gotoh, 1993) refines an initial progressive multiple alignment by
iteratively dividing the alignment into 2 profiles and realigning them.
divide sequences
into 2 groups
initial
alignment
Global
progressif
profile 1
pairwise
profile
alignment
refined
alignment
converged?
profile 2
no
Julie Thompson – IGBMC
Genetic Algorithms
SAGA (Notredame et al.1996) evolves a population of alignments in a quasi
evolutionary manner, iteratively improving the fitness of the population
population n
select a number of individuals to be parents
modify the parents by shuffling gaps, merging 2 alignments etc.
population n+1
evaluation of the fitness using OF
(sum-of-pairs or COFFEE)
END
Julie Thompson – IGBMC
Segment-to-segment alignment
Dialign (Morgenstern et al. 1996) compares segments of sequences instead of single residues
1. construct dot-plots of all possible pairs of sequences
Sequence i
Sequence j
2. find a maximal set of consistent diagonals in all the sequences
.......aeyVRALFDFngndeedlpfkKGDILRIrdkpeeq...............WWNAedsegkr.GMIPVPYVek..........
........nlFVALYDFvasgdntlsitKGEKLRVlgynhnge..............WCEAqtkngq..GWVPSNYItpvns.......
ieqvpqqptyVQALFDFdpqedgelgfrRGDFIHVmdnsdpn...............WWKGachgqt..GMFPRNYVtpvnrnv.....
gsmstselkkVVALYDYmpmnandlqlrKGDEYFIleesnlp...............WWRArdkngqe.GYIPSNYVteaeds......
.....tagkiFRAMYDYmaadadevsfkDGDAIINvqaideg...............WMYGtvqrtgrtGMLPANYVeai.........
..gsptfkcaVKALFDYkaqredeltfiKSAIIQNvekqegg...............WWRGdyggkkq.LWFPSNYVeemvnpegihrd
.......gyqYRALYDYkkereedidlhLGDILTVnkgslvalgfsdgqearpeeigWLNGynettgerGDFPGTYVeyigrkkisp..
3. Local alignment - residues between the diagonals are not aligned
Julie Thompson – IGBMC
Multiple alignment methods
Progressive
Global
Local
SBpima
SB
multal
NJ
ML
UPGMA
MLpima
multalign
pileup
clustalx
prrp
dialign
Iterative
Julie Thompson – IGBMC
Genetic Algo.
HMM
saga
hmmt
Comparison of programs
League Table based on BAliBASE benchmark database
Reference 1: < 6 sequences
Tous
All
multal
multalign
pileup
clustalx
prrp
saga
hmmt
MLpima
SBpima
dialign
< 100
résidues
> 400
résidues
Reference 2:
a family with
an orphan
Reference 3:
several
sub-families
N/A
N/A
Reference 4:
long N/C
terminal
extensions
Reference 5:
long insertions
N/A
N/A
iterative
N/A
N/A
iterative
•
Iterative algorithms can improve alignment quality, but can be slow
•
Global algorithms work well when sequences are homologous over their full
lengths, local algorithms are better for non-colinear sequences
Julie Thompson – IGBMC
Thompson et al. 1999
Multiple Alignment Construction

Optimal multiple alignment
MSA, OMA

Progressive multiple alignment
ClustalW, ClustalX

Iterative multiple alignment
PRRP, SAGA, DIALIGN, HMMER, SAM

Co-operative multiple alignment






Julie Thompson – IGBMC
T-COFFEE (Notredame et al. 2000) http://igs-server.cnrs-mrs.fr/Tcoffee/
DbClustal (Thompson et al. 2000) http://www-igbmc.u-strasbg.fr/BioInfo/
MAFFT (Katoh et al. 2002) http://www.biophys.kyotou.ac.jp/˜katoh/programs/align/mafft/
MUSCLE (Edgar, 2004) http://www.drive5.com/muscle
Probcons (Do et al. 2005)
Kalign (Lassmann et al. 2005)
DbClustal
Blast Database Search
Query Sequence
http://bips.u-strasbg.fr/PipeAlign/
Ballast Anchors
Query Sequence
Anchors
Database Hits
Domain A
Domain B
Domain C
Julie Thompson – IGBMC
DbClustal Alignment
Comparaison ClustalW / DbClustal
ClustalW
DbClustal
Julie Thompson – IGBMC
MAFFT
• Local homologous segments detected using a Fast Fourier
Transform
• Pairwise alignments are performed using restricted global
dynamic programming
• Multiple alignment is built up using a progressive algorithm,
similar to ClustalW
• Multiple alignment is then iteratively refined by dividing
alignment into 2 parts and realigning
Julie Thompson – IGBMC
MAFFT
Pairwise alignments
c(k)
-1 2
k
K=2
GLWGKAAAEEEGLWLFF—--KGVFGAEQEGLFVFFGG
K=-1
-GLWGKAAAEEEGLWLFF
KGVFGAEQEGLFVFFGG-
1. Fast Fourier
Transform
to detect local
conserved segments
Julie Thompson – IGBMC
2. Segment Level Dynamic
Programming
to select ‘consistent’
segments
3. Fix residues at the centre
of each segment pair and
realign between fixed points
(white regions only)
State-of-the-art
Co-operative algorithms have led to significant improvements…
BAliBASE 3 :
Ref 11
<20% ID
Ref 5
insertions
Ref 12
20-40% ID
Ref 4
extensions
Ref 2
orphan
ClustalW (1994)
Dialign (1996)
Mafft (2002)
Probcons (2005)
Ref 3
sub-families
… but none of the methods currently available are capable of
producing high-quality alignments for all test cases
Julie Thompson – IGBMC
Thompson et al. 2005, 2006
RNA alignment methods
Comparison using ‘BRAliBASE’ RNA structure alignments (Gardner et al, 2005)

Above 60% identity, sequence and structure based approaches have similar scores
Algorithms incorporating structural information outperform pure sequence methods. However,
these algorithms are computationally demanding which severely limits their use in practice.



Some more recent methods:


Sequence: R-Coffee (Wilm, 2008), MAFFT (Katoh, 2008)
Structure: LARA (Bauer, 2007), FoldalignM (Torarinsson, 2007), SCARNA (Tabei, 2008)
Julie Thompson – IGBMC
DNA alignment methods

Complete genomes


Local alignments (BlastZ, MultiZ, MUMmer,…)
Global alignments (MGA, Multi-LAGAN, MAVID, MAUVE, MAP2,
Mulan,…)
Julie Thompson – IGBMC
Reviewed in Dewey and Pachter, Human Molecular Genetics, 2006
Multiple Sequence Alignment


Introduction: what is a multiple alignment?
Multiple alignment construction




Multiple alignment analysis



Traditional approaches: optimal, progressive
Alignment parameters
Iterative and co-operative approaches
Quality analysis/error detection
Conserved/homologous regions
Multiple alignment applications
Julie Thompson – IGBMC
Multiple alignment analysis

Are the sequences correctly aligned?



Quality analysis: alignment objective functions (SP, NorMD)
error detection and correction (RASCAL, Refiner)
Are the sequences in the alignment homologous?


Conserved/homologous regions (MCOFFEE, LEON)
Conserved (functional) residues
Julie Thompson – IGBMC
Objective functions
Sum-of-pairs (Carrillo, Lipman, 1988) : Sum of scores for all pairs of sequences
Sequence 1
Sequence 2
Sequence 3
Sequence 4
N
N
N
N
N
N
N
C
N
N
C
C
Seq1-2
Seq1-3
Seq1-4
Seq2-3
Seq2-4
Seq3-4
3 pairs N-N
2 pairs N-N, 1 pair N-C
1 pair N-N, 2 pairs N-C
2 pairs N-N, 1 pair N-C
1 pair N-N, 2 pairs N-C
1 pair N-N, 1 pair N-C, 1 pair CC
3x6=18
2x6+(-3)=9
6+2x(-3)=0
2x6+(-3)=9
6+2x(-3)=0
6+(-3)+9=12
Blosum62
N C
N 6 -3
C -3 9
48
Information content (Hertz et al, 1999)
- Entropy column scores (between 0 and 1), sum for all columns in the alignment
norMD (Thompson et al, 2001)
- Column scores
- normalisation for sequence set to be aligned (number, length, similarity)
<0.3 bad alignment
0.3-0.7 some local errors
>0.7 good alignment
Julie Thompson – IGBMC
Objective functions: NorMD
1gln
1exd
1exd
syq_luplu
syq_human
syq_ecoli
syq_haein
sye_metja
sye_metth
sye_mettm
pyro_hori1
pyro_aby1
sye_arcfu
aero_perni
sye_sulso
syep_human
caeno_eleg
syep_drome
schizo_pom
syec_yeast
arab_thali
syem_yeast
pseudo_aer
sye_rhime
chlamy_psi
sye_mycge
sye_mycpn
sye_mycpu
sye_theth
sye_horvu
sye_tobac
thermo_mar
strepto_co
sye_lacde
sye_bacsu
sye_bacst
mycob_lepr
sye_borbu
sye_haein
sye_ecoli
heli_pylor
caeno_eleg
sye_syny3
sye_aquae
sye_helpy
ricket_pro
rhodo_spha
ricket_pro
sye_azobr
Archeal/
Eukaryotic
GluRS
+
GlnRS
Bacterial
GluRS
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
‘HIGH’
H8
‘KMSKS’
KHLKATG-GKVLTRFPPEPNGYLHIGHAKAMFVDFGLAKDRNGGCYLRFDDTNP--EAEKKEYIDHIEEIVQWMGWEPF----------KITYTSNYFQELYEFAVELIRRGHAYVDHQTADEIKEYR----------EKKLNSPWRDRPISESLKLFEDMRR-GFIEEGKATLRMKQDMQSDNYNMY--------------------DLIAYRIKFTP---HPHAGDKWCIYPSYDYAHCIVDSIENVTHSLCTLEFETRRASYYWLLHALGIY-----QPYVWEYSR-LNVS-NTVMSKRKLNRLVTEK--WVDGWDD
syq_luplu ::: PRLMTLAGLRRR-GMTPTAINAFVRGMGI---------------------------TRSDGTLISVERLEYHVREELNK-TAPRAMVVLHPLKVVITNLEAKSA-IEVDAKKWPDAQADDASAFYKIPFSN--VVYIERSDFR-MQDSKDYYGLAPGKSVILRYA-FPIKCTEVILADDN--ETILEIRAEYDP--------SKKTKPKGVLHWVSQPSP-GVDPLKVEVRLFERLFLSEN----PAELDNWLGDLNPHSKVEISNAYGVSLLKDAKLGDRFQFERLGYFAVDQ---------DSTPEKLVFNRTVTLKD
syq_luplu
syq_luplu
PRLMTLAGLRRR-GMTPTAINAFVRGMGI---------------------------TRSDGTLISVERLEYHVREELNK-TAPRAMVVLHPLKVVITNLEAKSA-IEVDAKKWPDAQADDASAFYKIPFSN--VVYIERSDFR-MQDSKDYYGLAPGKSVILRYA-FPIKCTEVILADDN--ETILEIRAEYDP--------SKKTKPKGVLHWVSQPSP-GVDPLKVEVRLFERLFLSEN----PAELDNWLGDLNPHSKVEISNAYGVSLLKDAKLGDRFQFERLGYFAVDQ---------DSTPEKLVFNRTVTLKD
QHLEITG-GQVRTRFPPEPNGILHIGHAKAINFNFGYAKANNGICFLRFDDTNP--EKEEAKFFTAICDMVAWLGYTPY----------KVTYASDYFDQLYAWAVELIRRGLAYVCHQRGEELKGHN------------TLPSPWRDRPMEESLLLFEAMRK-GKFSEGEATLRMKLVMEDGKM-----------------------DPVAYRVKYTP---HHRTGDKWCIYPTYDYTHCLCDSIEHITHSLCTKEFQARRSSYFWLCNALDVY-----CPVQWEYGR-LNLH-YAVVSKRKILQLVATG--AVRDWDD
syq_human ::: PRLFTLTALRRR-GFPPEAINNFCARVGV---------------------------TVA-QTTMEPHLLEACVRDVLND-TAPRAMAVLESLRVIITNFPAAKS-LDIQVPNFPADETK---GFHQVPFAP--IVFIERTDFK-EEPEPGFKRLAWGQPVGLRHT-GYVIELQHVVKGPS--GCVESLEVTCRRA-------DAGEKPKAFIHWVSQ------PLMC-EVRLYERLFQHKNPEDPTEVPGGFLSDLNLASLHVVDAALVDCSVALAKPFDKFQFERLGYFSVDPD--------SHQGKLVFNRTVTLKED
syq_human
syq_human
PRLFTLTALRRR-GFPPEAINNFCARVGV---------------------------TVA-QTTMEPHLLEACVRDVLND-TAPRAMAVLESLRVIITNFPAAKS-LDIQVPNFPADETK---GFHQVPFAP--IVFIERTDFK-EEPEPGFKRLAWGQPVGLRHT-GYVIELQHVVKGPS--GCVESLEVTCRRA-------DAGEKPKAFIHWVSQ------PLMC-EVRLYERLFQHKNPEDPTEVPGGFLSDLNLASLHVVDAALVDCSVALAKPFDKFQFERLGYFSVDPD--------SHQGKLVFNRTVTLKED
EDLASGKHTTVHTRFPPEPNGYLHIGHAKSICLNFGIAQDYKGQCNLRFDDTNP--VKEDIEYVESIKNDVEWLGFHWSG---------NVRYSSDYFDQLHAYAIELINKGLAYVDELTPEQIREYRGTL------TQPGKNSPYRDRSVEENLALFEKMRA-GGFEEGKACLRAKIDMASPFIVMR--------------------DPVLYRIKFAE---HHQTGNKWCIYPMYDFTHCISDALEGITHSLCTLEFQDNRRLYDWVLDNITIP----VHPRQYEFSR-LNLE-YTVMSKRKLNLLVTDK--HVEGWDD
syq_ecoli ::: PRMPTISGLRRR-GYTAASIREFCKRIGV---------------------------TKQ-DNTIEMASLESCIREDLNE-NAPRAMAVIDPVKLVIENYQGEG--EMVTMPNHPNKPEM---GSRQVPFSG--EIWIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAE--GNITTIFCTYDADTLSKDP-ADGRKVKGVIHWVSAA-----HALPVEIRLYDRLFSVPN----PGAADDFLSVINPESLVIK-QGFAEPSLKDAVAGKAFQFEREGYFCLDSR--------HSTAEKPVFNRTVGLRD
syq_ecoli
syq_ecoli
PRMPTISGLRRR-GYTAASIREFCKRIGV---------------------------TKQ-DNTIEMASLESCIREDLNE-NAPRAMAVIDPVKLVIENYQGEG--EMVTMPNHPNKPEM---GSRQVPFSG--EIWIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAE--GNITTIFCTYDADTLSKDP-ADGRKVKGVIHWVSAA-----HALPVEIRLYDRLFSVPN----PGAADDFLSVINPESLVIK-QGFAEPSLKDAVAGKAFQFEREGYFCLDSR--------HSTAEKPVFNRTVGLRD
EDLASGKHKSVHTRFPPEPNGYLHIGHAKSICLNFGLAKEYQGLCNLRFDDTNP--VKEDVEYVDSIKADVEWLGFKWEG---------EPRYASDYFDALYGYAVELIKKGLAYVDELSPDEMREYRGTL------TEPGKNSPYRDRTIEENLALFEKMKN-GEFAEGKASLRAKIDMASPFMVMR--------------------EPVIYRIKFSS---HHQTGDKWCIYPMYDFTHCISDAIERITHSICTLEFQDNRRLYDWVLENISIER---PLPHQYEFSR-LNLE-GTLTSKRKLLKLVNDE--IVDGWND
syq_haein ::: PRMPTISGLRRR-GYTPASLREFCRRIGV---------------------------TKQ-DNVVEYSALEACIREDLNE-NAPRAMAVIDPVRVVIENFESE---AVLTAPNHPNRPEL---GERQLPFTK--ELYIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAN--GEITTIFCTYDPETLGKNP-ADGRKVKGVIHWVSAV-----NNHPAEFRLYDRLFTVPN----PGAEDDIESVLNPNSLVIK-QGFVEQSLANAEAEKGYQFEREGYFCADSK--------DSRPEHLVFNLTVSLKE
syq_haein
syq_haein
PRMPTISGLRRR-GYTPASLREFCRRIGV---------------------------TKQ-DNVVEYSALEACIREDLNE-NAPRAMAVIDPVRVVIENFESE---AVLTAPNHPNRPEL---GERQLPFTK--ELYIDRADFR-EEANKQYKRLVLGKEVRLRNA--YVIKAERVEKDAN--GEITTIFCTYDPETLGKNP-ADGRKVKGVIHWVSAV-----NNHPAEFRLYDRLFTVPN----PGAEDDIESVLNPNSLVIK-QGFVEQSLANAEAEKGYQFEREGYFCADSK--------DSRPEHLVFNLTVSLKE
-ELP-NVKDKVVMRFAPNPSGPLHIGHARAAVLNDYFVKKYGGKLILRLEDTDP--KRVLPEAYDMIKEDLDWLGVKVD----------EVVIQSDRIELYYEYGRKLIEMGHAYVCDCNPEEFRELR----------NKGVPCKCRDRAIEDNLELWEKMLN-GELEN--VAVRLKTDIKHKNPSIR--------------------DFPIFRVEKTP---HPRTGDKYCVYPLMNFSVPVDDHLLGMTHVLRGKDHIVNTEKQAYIYKYFGWE-----MPEFIHYGI-LKIE-DIVLSTSSMYKGIKEG--LYSGWDD
sye_metja ::: VRLGTLRALRRR-GIKPEAIYEIMKRIGI---------------------------KQA-DVKFSWENLYAINKELIDK-DARRFFFVWNPKKLIIEGAEKKV----LKLRMHPDRPEF---GERELIFDG--EVYVVGDELEE--------------NKMYRLMELFNIVVEKVDDIA----LAKYHSDDFKI---------ARKNKAKIIHWIPVK-----DSVKVKVLMPDGEIK---------------------------EGFAEKDFAKVEVDDIIQFERFGFVRIDKK--------DNDGFVCCYAHR----sye_metja
sye_metja
VRLGTLRALRRR-GIKPEAIYEIMKRIGI---------------------------KQA-DVKFSWENLYAINKELIDK-DARRFFFVWNPKKLIIEGAEKKV----LKLRMHPDRPEF---GERELIFDG--EVYVVGDELEE--------------NKMYRLMELFNIVVEKVDDIA----LAKYHSDDFKI---------ARKNKAKIIHWIPVK-----DSVKVKVLMPDGEIK---------------------------EGFAEKDFAKVEVDDIIQFERFGFVRIDKK--------DNDGFVCCYAHR----RELA-GVKGEVVLRFAPNPSGPLHIGHARAAILNHEYARKYDGRLILRIEDTDP--RRVDPEAYDMIPADLEWLGVEWD----------ETVIQSDRMETYYEYTEKLIERGGAYVCTCRPEEFRELK----------NRGEACHCRSLGFRENLQRWREMFE---MKEGSAVVRVKTDLNHPNPAIR--------------------DWVSMRIVEAE---HPRTGTRYRVYPMMNFSVAVDDHLLGVTHVLRGKDHLANREKQEYLYRHLGWE-----PPEFIHYGR-LKMD-DVALSTSGAREGILRG--EYSGWDD
sye_metth ::: PRLGTLRAIARR-GIRPEAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRSILEE-EARRYFFAADPVKLEVVGLPGPV---RVERPLHPDHPEI---GNRVLELRG--EVYLPGDDLGE---------------GPLRLIDAVNVIYSGG--------ELRYHSEGIEE---------ARELGASMIHWVPAE-----SALEAEVIMPDASRV---------------------------RGVIEADASELEVDDVVQLERFGFARLDS---------AGPGMVFYYAHK----sye_metth
sye_metth
PRLGTLRAIARR-GIRPEAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRSILEE-EARRYFFAADPVKLEVVGLPGPV---RVERPLHPDHPEI---GNRVLELRG--EVYLPGDDLGE---------------GPLRLIDAVNVIYSGG--------ELRYHSEGIEE---------ARELGASMIHWVPAE-----SALEAEVIMPDASRV---------------------------RGVIEADASELEVDDVVQLERFGFARLDS---------AGPGMVFYYAHK----RNLP-DVKGEVVLRFAPNPSGPLHIGHARAAILNHEYARRYDGKLILRIEDTDP--RRVDPEAYDMIPSDLEWLGVEWD----------ETIIQSDRMEIYYEYTERLIERGGAYVCTCTPEAFREFK----------NEGKACHCRDLGVRENLQRWREMFE---MPEGSAVVRVKTDLQHPNPAIR--------------------DWVSMRIVEAE---HPRTGTRYRVYPMMNFSVAVDDHLLGVTHVLRGKDHLANSEKQEYLYRHLGWE-----PPVFIHYGR-LKMD-DIALSTSGAREGIVEG--KYSGWDD
sye_mettm ::: PRLGTIRAIARR-GIRSDAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRNILEE-EARRYFFAADPVRFEIEGLPGPI---RVERSLHPDKPEL---GNRILELNG--DVYLPRGDLRE---------------GPLRLIDAVNVIYSDG--------ELRYHSEGIEE---------ARELQAAMIHWVPAE-----SALKAVVVMPDASEI---------------------------EGVIEGDASELEVDDVVQLERFGFARVDS---------SGERLVFYYAHK----sye_mettm
sye_mettm
PRLGTIRAIARR-GIRSDAIRKLMVEIGV---------------------------KIA-DSTMSWKKIYGLNRNILEE-EARRYFFAADPVRFEIEGLPGPI---RVERSLHPDKPEL---GNRILELNG--DVYLPRGDLRE---------------GPLRLIDAVNVIYSDG--------ELRYHSEGIEE---------ARELQAAMIHWVPAE-----SALKAVVVMPDASEI---------------------------EGVIEGDASELEVDDVVQLERFGFARVDS---------SGERLVFYYAHK----PLLPKAEKGKVVTRFAPNPDGAFHLGNARAAILSYEYAKMYGGKFILRFDDTDPKVKRPEPIFYKMIIEDLEWLGIKPD----------EIVYASDRLEIYYKYAEELIKMGKAYVCTCPPEKFRELR----------DKGIPCPHRDEPVEVQLERWKKMLN-GEYKEGEAVVRIKTDLNHPNPAVR--------------------DWPALRIIDNPN--HPRTGNKYRVWPLYNFASAIDDHELGVTHIFRGQEHAENETRQRYIYEYFGWE-----YPVTIHHGR-LSIE-GVVLSKSKTRKGIEEG--KYLGWDD
pyro_hori1 ::: PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATISWENLAAINRKLVDP-IANRYFFVADPIPMEVEGAPEFI----AEIPLHPDHPER---GVRRLKFTPERPVYVSKDDLNLLK-----------PGNFVRLKDLFNVEILEVGDKI----RARFYSFEYEI---------AKKNRWKMVHWVTE-------GRPCEVIIPEGDELVV------------------------RKGLLEKD-AKVQVNEIVQFERFGFVRIDRI--------EGDKVIAIYAHK----pyro_hori1
pyro_hori1
PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATISWENLAAINRKLVDP-IANRYFFVADPIPMEVEGAPEFI----AEIPLHPDHPER---GVRRLKFTPERPVYVSKDDLNLLK-----------PGNFVRLKDLFNVEILEVGDKI----RARFYSFEYEI---------AKKNRWKMVHWVTE-------GRPCEVIIPEGDELVV------------------------RKGLLEKD-AKVQVNEIVQFERFGFVRIDRI--------EGDKVIAIYAHK----PPLPKAEKGKVVTRFAPNPDGAFHLGNARAAILSYEYAKMYGGKFILRFDDTDPKVKRPEPIFYEMIIEDLEWLGIKPD----------EIVYASDRLELYYKYAEELIKMGKAYVCTCKPEKFRELR----------DKGIPCPHRDEPVEVQLERWRKMLN-GEYKEGEAVVRIKTDLNHPNPAVR--------------------DWPALRIVDNPN--HPRAGNKYRVWPLYNFASAIDDHELGVTHIFRGQEHAENETRQRYIYEYFGWE-----YPVTVHHGR-LSIE-GVILSKSKTRKGIEEG--KYLGWDD
pyro_aby1 ::: PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATVSWDNLAAINRKLVDP-IANRYFFVADPVPMEVEGAPEFI----AKIPLHPDHPER---GTRELRFTPGKPIYVSKDDLDLLK-----------PGSFVRLKDLFNVEIVEVGEKI----KAKFHSFEYEI---------ARKNKWRMIHWVPE-------GRPCEVIIPEGDELIV------------------------RKGLLEKD-ANVKAGEIVQFERFGFVRIDKI--------EGEKVVAIYAHK----pyro_aby1
pyro_aby1
PRLGTIRALRRR-GILPEAIKELIIEVGL---------------------------KKS-DATVSWDNLAAINRKLVDP-IANRYFFVADPVPMEVEGAPEFI----AKIPLHPDHPER---GTRELRFTPGKPIYVSKDDLDLLK-----------PGSFVRLKDLFNVEIVEVGEKI----KAKFHSFEYEI---------ARKNKWRMIHWVPE-------GRPCEVIIPEGDELIV------------------------RKGLLEKD-ANVKAGEIVQFERFGFVRIDKI--------EGEKVVAIYAHK----PELEGAEKGKVVMRFAPNPNGPPTLGSARGIIVNGEYAKMYEGKYIIRFDDTDPRTKRPMIEAYEWYLEDIEWLGYKPD----------EVIYASRRIPIYYDYARKLIEMGKAYTCFCSQEEFKKFR----------DSGEECPHRNISVEDTLEVWERMLE-GDYEEGEVVLRIKTDMRHKDPAIR--------------------DWVAFRIIKES---HPLVGDKYVVYPTLDFESAIEDHLLGITHIIRGKDLIDSERRQRYIYEYFGWI-----YPITKHWGR-VKIFEFGKLSTSSIKKDIERG--KYEGWDD
sye_arcfu ::: PRLPTLRAFRRR-GFEPEAIKSFFLSLGV---------------------------GEN-DVSVSLKNLYAENRKIIDR-KANRYFFIWGPVKIEIVNLPEKK---EVELPLNPHTGE-----KRRLKGER--TIYVTKDDFERLK------------GQVVRLKDFCNVLLDEK---------AEFMGFELEG---------VKK-GKNIIHWLPE------SEAIKGKVIGERE----------------------------AEGLVERN-AVRDVGKVVQFERFAFCKVES---------ADEELVAVYTHP----sye_arcfu
sye_arcfu
PRLPTLRAFRRR-GFEPEAIKSFFLSLGV---------------------------GEN-DVSVSLKNLYAENRKIIDR-KANRYFFIWGPVKIEIVNLPEKK---EVELPLNPHTGE-----KRRLKGER--TIYVTKDDFERLK------------GQVVRLKDFCNVLLDEK---------AEFMGFELEG---------VKK-GKNIIHWLPE------SEAIKGKVIGERE----------------------------AEGLVERN-AVRDVGKVVQFERFAFCKVES---------ADEELVAVYTHP----PPLPGAVEGRVKLRFAPNPDFVIHMGNARPAIVNHEYARMYKGRMVLRFEDTDPRTKTPLREAYDLIRQDLKWLGVSWD----------EEYIQSLRMEVFYSVARRAIERGCAYVDNCGRE-GKELL----------SRGEYCPTRDLGPEDNLELFEKMLE-GEFYEGEAVVRMKTDPRHPNPSLR--------------------DWVAMRIIDTEKHPHPLVGSRYLVWPTYNFAVSVDDHMMEITHVLRGKEHQLNTEKQLAVYRCMGWR-----PPYFIHFGR-LKLE-GFILSKSKIRKLLEERPGEFMGYDD
aero_perni ::: PRFGTIAGLRRR-GVLAEAIRQIILEVGV---------------------------KPT-DATISWANLAAANRKLLDE-RADRIMYVEDPVEMEVELAQVEC--RAAEIPFHPSRPQR----KRRITLCTGDKVLLTREDAVE--------------GRQLRLMGLSNFTVSQG--------ILREVDPSLEY---------ARRMKLPIVQWVKKG-----GEASVEVLEPVELELRRH------------------------QGYAEDAIRGYGVDSRLQFVRYGFVRVDSV--------EDGVYRVIYTHK----aero_perni
aero_perni
PRFGTIAGLRRR-GVLAEAIRQIILEVGV---------------------------KPT-DATISWANLAAANRKLLDE-RADRIMYVEDPVEMEVELAQVEC--RAAEIPFHPSRPQR----KRRITLCTGDKVLLTREDAVE--------------GRQLRLMGLSNFTVSQG--------ILREVDPSLEY---------ARRMKLPIVQWVKKG-----GEASVEVLEPVELELRRH------------------------QGYAEDAIRGYGVDSRLQFVRYGFVRVDSV--------EDGVYRVIYTHK----PPLP-NVKGQVVTRFAPNPDGPLHLGNARSAILSYEYAKMYNGKFILRFDDTDPKVKRPILDAYDWIKEDLKWLGIKWE----------QELYASERLELYYKYARYLIEKGYAYVDTCDSSIFRKFRDSRGK-----MKEPECLHRSSSPESNLELFEKMLG-GKFKEGEAVVRLKTDLSDPDPSQI--------------------DWVMLRIIDTAKNPHPRVGSKYWVWPTYNFASIIDDHELGITHVLRAKEHMSNTEKQRYISEYMGWE-----FPEVLQFGR-LRLE-GFMMSKSKIRGMLEKG----TNRDD
sye_sulso ::: PRLPTLAGLRRR-GILPDTIKDVIIDVGV---------------------------KVT-DATISFENIAAINRKKLDP-VAKRIMFVKDAEEFSVELPESLN----AKIPLIPSKQEM----NRTIIVNPGDKILIESNDAED--------------NSILRLMELCNVKVDKHNR------KLIFHSKTLDE---------AKKVNAKIVQWVKSN-----EKVPVMVEKAERDEIKMI------------------------NGYAEKIAADLEIDEIVQFYRFGFVRVDRK--------DENMLRVVFSHD----sye_sulso
sye_sulso
PRLPTLAGLRRR-GILPDTIKDVIIDVGV---------------------------KVT-DATISFENIAAINRKKLDP-VAKRIMFVKDAEEFSVELPESLN----AKIPLIPSKQEM----NRTIIVNPGDKILIESNDAED--------------NSILRLMELCNVKVDKHNR------KLIFHSKTLDE---------AKKVNAKIVQWVKSN-----EKVPVMVEKAERDEIKMI------------------------NGYAEKIAADLEIDEIVQFYRFGFVRVDRK--------DENMLRVVFSHD----VELPGAEMGKVTVRFPPEASGYLHIGHAKAALLNQHYQVNFKGKLIMRFDDTNP--EKEKEDFEKVILEDVAMLHIKPD----------QFTYTSDHFETIMKYAEKLIQEGKAYVDDTPAEQMKAER----------EQRIESKHRKNPIEKNLQMWEEMKK-GSQFGHSCCLRAKIDMSSNNGCMR--------------------DPTLYRCKIQP---HPRTGNKYNVYPTYDFACPIVDSIEGVTHALRTTEYHDRDEQFYWIIEALGIR-----KPYIWEYSR-LNLN-NTVLSKRKLTWFVNEG--LVDGWDD
syep_human ::: PRFPTVRGVLRR-GMTVEGLKQFIAAQGS---------------------------SRS-VVNMEWDKIWAFNKKVIDP-VAPRYVALLKKEVIPVNVPEAQE--EMKEVAKHPKNPEV---GLKPVWYSP--KVFIEGADAETFSE-----------GEMVTFINWGNLNITKIHKNADGKIISLDAKFNLENK--------DYKKTT-KVTWLAETT--HALPIPVICVTYEHLITKPV----LGKDEDFKQYVNKNSKHEE-LMLGDPCLKDLKKGDIIQLQRRGFFICDQPYEPVSPYSCKEAPCVLIYIPDGHTK
syep_human
syep_human
PRFPTVRGVLRR-GMTVEGLKQFIAAQGS---------------------------SRS-VVNMEWDKIWAFNKKVIDP-VAPRYVALLKKEVIPVNVPEAQE--EMKEVAKHPKNPEV---GLKPVWYSP--KVFIEGADAETFSE-----------GEMVTFINWGNLNITKIHKNADGKIISLDAKFNLENK--------DYKKTT-KVTWLAETT--HALPIPVICVTYEHLITKPV----LGKDEDFKQYVNKNSKHEE-LMLGDPCLKDLKKGDIIQLQRRGFFICDQPYEPVSPYSCKEAPCVLIYIPDGHTK
VELPGAEKGKVVVRFPPEASGYLHIGHAKAALLNQYYQQAFEGQLIMRFDDTNP--AKENAHFEHVIKEDLSMLNIVPD----------RWTHSSDHFEMLLTMCEKLLKEGKAFVDDTDTETMRNER----------EQRQDSRNRSNTPEKNLQLWEEMKK-GSPKGLTCCVRMKIDMKSNNGAMR--------------------DPTIYRCKPEE---HVRTGLKYKVYPTYDFTCPIVDSVEGVTHALRTTEYHDRDDQYYFICDALGLR-----RPHIWEYAR-LNMT-NTVMSKRKLTWFVDEG--HVEGWDD
caeno_eleg ::: PRLPTVRGVMRR-GLTVEGLKQFIVAQGG---------------------------SRS-VVMMEWDKIWAFNKKVIDP-VAPRYTALDSTSPLVSIELTDSISDDTSNVSLHPKNAEI---GSKDVHKGK--KLLLEQVDAAALKE-----------GEIVTFVNWGNIKIGKIEK-KGAVITKISATLQLDNT--------DYKKTT-KVTWLGDVKAEAGKTIPVVTADYDHIISKAI----IGKDEDWKQFINFDSVHYT-KMVGEPAIKNVKKGDIIQIQRKGFYIVDQPYNPKSELSGVETPLLLIAIPDGHTG
caeno_eleg
caeno_eleg
PRLPTVRGVMRR-GLTVEGLKQFIVAQGG---------------------------SRS-VVMMEWDKIWAFNKKVIDP-VAPRYTALDSTSPLVSIELTDSISDDTSNVSLHPKNAEI---GSKDVHKGK--KLLLEQVDAAALKE-----------GEIVTFVNWGNIKIGKIEK-KGAVITKISATLQLDNT--------DYKKTT-KVTWLGDVKAEAGKTIPVVTADYDHIISKAI----IGKDEDWKQFINFDSVHYT-KMVGEPAIKNVKKGDIIQIQRKGFYIVDQPYNPKSELSGVETPLLLIAIPDGHTG
VDLPGAEMGKVVVRFPPEASGYLHIGHAKAALLNQYYALVCQGTLIMRFDDTNP--AKETVEFENVILGDLEQLQIKPD----------VFTHTSNYFDLMLDYCVRLIKESKAYVDDTPPEQMKLER----------EQRVESANRSNSVEKNLSLWEEMVK-GSEKGQNTACAAKIDMSSPNGCMR--------------------DPTIYRCKNEP---HPRTGTKYKVYPTYDFACPIVDAIENVTHTLRTTEYHDRDDQFYWFIDALKLR-----KPYIWSYSR-LNMT-NTVLSKRKLTWFVDSG--LVDGWDD
syep_drome ::: PRFPTVRGIIRR-GMTVEGLKEFIIAQGS---------------------------SKS-VVFMNWDKIWAFNKKVIDP-IAPRYTALEKEKRVIVNVAGAKV--ERIQVSVHPKDESL---GKKTVLLGP--RIYIDYVDAEALKE-----------GENATFINWGNILIKKVNKDASGNITSVDAALNLENK--------DFKKTL-KLTWLAVEDD-PSAYPPTFCVYFDNIISKAV----LGKDEDFKQFIGHKTRDEV-PMLGDPELKKCKKGDIIQLQRRGFFKVDVAYLPPSGYTNVPSPIVLFSIPDGHTK
syep_drome
syep_drome
PRFPTVRGIIRR-GMTVEGLKEFIIAQGS---------------------------SKS-VVFMNWDKIWAFNKKVIDP-IAPRYTALEKEKRVIVNVAGAKV--ERIQVSVHPKDESL---GKKTVLLGP--RIYIDYVDAEALKE-----------GENATFINWGNILIKKVNKDASGNITSVDAALNLENK--------DFKKTL-KLTWLAVEDD-PSAYPPTFCVYFDNIISKAV----LGKDEDFKQFIGHKTRDEV-PMLGDPELKKCKKGDIIQLQRRGFFKVDVAYLPPSGYTNVPSPIVLFSIPDGHTK
IGLPDAIDGKVVTRFPPEPSGYLHIGHAKAALLNQYFANKYHGKLIVRFDDTNP--SKENSEFQDAILEDVALLGIKPD----------VVTYTSDYLDTIHQYCVDMIKSGQAYADDTDVETMRHER----------TEGIPSKHRDRPIEESLEILSEMDK-GSDVGLKNCIRAKISYENPNKAMR--------------------DPVIYRCNLLP---HHRTGTKYRAYPTYDFACPIVDSLEGVTHALRTTEYRDRNPLYQWMIKAMNLR-----KIHVWEFSR-MNFV-RTLLSKRKLTEIVDHG--LVWGWDD
schizo_pom ::: PRFPTVRGVRRR-GMTIEALQQYIVSQGP---------------------------SKN-ILTLDWTSFWATNKKIIDP-VAPRHTAVESGDVVKATIVNGPAAPYAEDRPRHKKNPEL---GNKKSIFAN--EILIEQADAQSFKQ-----------DEEVTLMDWGNAYVREINRDASGKVTSLKLELHLDG---------DFKKTEKKVTWLADTE----DKTPVDLVDFDYLITKDK----LEEGENYKDFLTPQTEFHS-PVFADVGIKNLKKGDIIQVERKGYYIVDVP--------FDGTQAVLFNIPDGKTV
schizo_pom
schizo_pom
PRFPTVRGVRRR-GMTIEALQQYIVSQGP---------------------------SKN-ILTLDWTSFWATNKKIIDP-VAPRHTAVESGDVVKATIVNGPAAPYAEDRPRHKKNPEL---GNKKSIFAN--EILIEQADAQSFKQ-----------DEEVTLMDWGNAYVREINRDASGKVTSLKLELHLDG---------DFKKTEKKVTWLADTE----DKTPVDLVDFDYLITKDK----LEEGENYKDFLTPQTEFHS-PVFADVGIKNLKKGDIIQVERKGYYIVDVP--------FDGTQAVLFNIPDGKTV
IDLPDAKMGEVVTRFPPEPSGYLHIGHAKAALLNQYFAQAYKGKLIIRFDDTNP--SKEKEEFQDSILEDLDLLGIKGD----------RITYSSDYFQEMYDYCVQMIKDGKAYCDDTPTEKMREER----------MDGVASARRDRSVEENLRIFTEEMKNGTEEGLKNCVRAKIDYKALNKTLR--------------------DPVIYRCNLTP---HHRTGSTWKIYPTYDFCVPIVDAIEGVTHALRTIEYRDRNAQYDWMLQALRLR-----KVHIWDFAR-INFV-RTLLSKRKLQWMVDKD--LVGNWDD
syec_yeast ::: PRFPTVRGVRRR-GMTVEGLRNFVLSQGP---------------------------SRN-VINLEWNLIWAFNKKVIDP-IAPRHTAIVNPVKIHLEGSEAPQEPKIEMKPKHKKNPAV---GEKKVIYYK--DIVVDKDDADVINV-----------DEEVTLMDWGNVIITKKNDDGS-----MVAKLNLEG---------DFKKTKHKLTWLADTK----DVVPVDLVDFDHLITKDR----LEEDESFEDFLTPQTEFHT-DAIADLNVKDMKIGDIIQFERKGYYRLDAL-------PKDGKPYVFFTIPDGKSV
syec_yeast
syec_yeast
PRFPTVRGVRRR-GMTVEGLRNFVLSQGP---------------------------SRN-VINLEWNLIWAFNKKVIDP-IAPRHTAIVNPVKIHLEGSEAPQEPKIEMKPKHKKNPAV---GEKKVIYYK--DIVVDKDDADVINV-----------DEEVTLMDWGNVIITKKNDDGS-----MVAKLNLEG---------DFKKTKHKLTWLADTK----DVVPVDLVDFDHLITKDR----LEEDESFEDFLTPQTEFHT-DAIADLNVKDMKIGDIIQFERKGYYRLDAL-------PKDGKPYVFFTIPDGKSV
VDLPEAEIGKVKLRFAPEPSGYLHIGHAKAALLNKYFAERYQGEVIVRFDDTNP--AKESNEFVDNLVKDIGTLGIKYE----------KVTYTSDYFPELMDMAEKLMREGKAYVDDTPREQMQKER----------MDGIDSKCRNHSVEENLKLWKEMIA-GSERGLQCCVRGKFNMQDPNKAMR--------------------DPVYYRCNPMS---HHRIGDKYKIYPTYDFACPFVDSLEGITHALRSSEYHDRNAQYFKVLEDMGLR-----QVQLYEFSR-LNLV-FTLLSKRKLLWFVQTG--LVDGWDD
arab_thali ::: PRFPTVQGIVRR-GLKIEALIQFILEQGA---------------------------SKN-LNLMEWDKLWSINKRIIDP-VCPRHTAVVAERRVLFTLTDGPDEPFVRMIPKHKKFEGA---GEKATTFTK--SIWLEEADASAISV-----------GEEVTLMDWGNAIVKEITKDEEGRVTALSGVLNLQG---------SVKTTKLKLTWLPDTN----ELVNLTLTEFDYLITKKK----LEDDDEVADFVNPNTKKET-LALGDSNMRNLKCGDVIQLERKGYFRCDVP------FVKSSKPIVLFSIPDGRAA
arab_thali
arab_thali
PRFPTVQGIVRR-GLKIEALIQFILEQGA---------------------------SKN-LNLMEWDKLWSINKRIIDP-VCPRHTAVVAERRVLFTLTDGPDEPFVRMIPKHKKFEGA---GEKATTFTK--SIWLEEADASAISV-----------GEEVTLMDWGNAIVKEITKDEEGRVTALSGVLNLQG---------SVKTTKLKLTWLPDTN----ELVNLTLTEFDYLITKKK----LEDDDEVADFVNPNTKKET-LALGDSNMRNLKCGDVIQLERKGYFRCDVP------FVKSSKPIVLFSIPDGRAA
IKEDIHPSLPVRTRFAPSPTGFLHLGSLRTALYNYLLARNTNGQFLLRLEDTDQ--KRLIEGAEENIYEILKWCNINYDET---------PIKQSERKLIYDKYVKILLSSGKAYRCFCSKERLNDLRHSAMELKPPSMASYDRCCAHLGEEEIKSKLAQ--------GIPFTVRFKSP-ERYPTFTDLLHGQINLQPQVNFNDKRYDDLILVKSD---------------KLPTYHLANVVDDHLMGITHVIRGEEWLPSTPKHIALYNAFGWA-----CPKFIHIPLLTTVG-DKKLSKRKGD--------------syem_yeast ::: ---MSISDLKRQ-GVLPEALINFCALFGWSPPRDLASKKHECFSMEELETIFNLNGLTKGNAKVDDKKLWFFNKHFLQKRILNPSTLRELVDDIMPSLESIYNTSTISREKVAKILLNCGGSLSRINDF---HDEFYYFFEKPKYN-----------DNDAVTKFLSKNESRHIA--------HLLKKLGQFQEG------TDAQEVESMVETMYYEN-----GFSRKVTYQAMRFALA-------------------------------GCHPGAKIAAMIDILG-IKESNKRLSEGLQFLQREKK------------syem_yeast
syem_yeast
---MSISDLKRQ-GVLPEALINFCALFGWSPPRDLASKKHECFSMEELETIFNLNGLTKGNAKVDDKKLWFFNKHFLQKRILNPSTLRELVDDIMPSLESIYNTSTISREKVAKILLNCGGSLSRINDF---HDEFYYFFEKPKYN-----------DNDAVTKFLSKNESRHIA--------HLLKKLGQFQEG------TDAQEVESMVETMYYEN-----GFSRKVTYQAMRFALA-------------------------------GCHPGAKIAAMIDILG-IKESNKRLSEGLQFLQREKK-------------------MTTVRTRIAPSPTGDPHVGTAYIALFNLCFARQHGGQFILRIEDTDQ--LRSTRESEQQIYDALRWLGIEWDEGPDVGGP-HGPYRQSERGHIYKRYSDELVEKGHAFTCFCTPERLDAVRAEQMARK--ETPRYDGHCMHLPKDEVQRRLAA--------GESHVTRMKVPTEGVCVVPDMLRGDVEIPWDRMD------MQVLMKAD---------------GLPTYFLANVVDDHLMGITHVLRGEEWLPSAPKLIKLYEYFGWE-----QPQLCYMPLLRNPD-KSKLSKRKNP--------------pseudo_aer ::: ---TSITFYERM-GYLPQALLNYLGRMGWSMP-----DEREKFTLAEMIEHFDLSRVSLGGPIFDLEKLSWLNGQWIREQSV-EEFAREVQKWALNP------------EYLMKIAPHVQGRVENFSQIAP-LAGFFFSGGVPLDASLF--------EHKKLDPTQVRQVLQLVL--------WKLESLRQWE-----------KERITGCIQAVAEH----LQLKLRDVM-PLMFPAIT------------------------------GHASSVSVLDAMEILG-ADLSRYRLRQALELLGGASKKETKEWEKIRDAI
pseudo_aer
pseudo_aer
---TSITFYERM-GYLPQALLNYLGRMGWSMP-----DEREKFTLAEMIEHFDLSRVSLGGPIFDLEKLSWLNGQWIREQSV-EEFAREVQKWALNP------------EYLMKIAPHVQGRVENFSQIAP-LAGFFFSGGVPLDASLF--------EHKKLDPTQVRQVLQLVL--------WKLESLRQWE-----------KERITGCIQAVAEH----LQLKLRDVM-PLMFPAIT------------------------------GHASSVSVLDAMEILG-ADLSRYRLRQALELLGGASKKETKEWEKIRDAI
-----MADSAVRVRIAPSPTGEPHVGTAYIALFNYLFAKKHGGKFILRIEDTDA--TRSTPEFEKKVLDALKWCGLEWSEGPDIGGP-YGPYRQSDRKDIYKPYVEKIVANGHGFRCFCTPERLEQMREAQRAAG--KPPKYDGLCLSLSAEEVTSRVDA--------GEPHVVRMKIPTEGSCKFRDGVYGDVEIPWEAVD------MQVLLKAD---------------GMPTYHMANVVDDHLMKITHVARGEEWLASVPKHILIYQYLGLE-----PPVFMHLSLMRNAD-KSKLSKRKNP--------------sye_rhime ::: ---TSISYYTAL-GYLPEALMNFLGLFFIQIA-----EGEELLTMEELAEKFDPENLSKAGAIFDIQKLDWLNARWIREKLSEEEFAARVLAWAMDN------------ERLKEGLKLSQTRISKLGELPD-LAAFLFKSDLGLQPAAF--------AGVKASPEEMLKILNTVQ--------PDLEKILEWN-----------KDSIETELR-ASER----MGKKLKAVVAPLFVACS-------------------------------GSQRSLPLFDSMELLG-RSVVRQRLKVAAQVVASMAGSGKQ--------sye_rhime
sye_rhime
---TSISYYTAL-GYLPEALMNFLGLFFIQIA-----EGEELLTMEELAEKFDPENLSKAGAIFDIQKLDWLNARWIREKLSEEEFAARVLAWAMDN------------ERLKEGLKLSQTRISKLGELPD-LAAFLFKSDLGLQPAAF--------AGVKASPEEMLKILNTVQ--------PDLEKILEWN-----------KDSIETELR-ASER----MGKKLKAVVAPLFVACS-------------------------------GSQRSLPLFDSMELLG-RSVVRQRLKVAAQVVASMAGSGKQ-------------MAWENVRVRVAPSPTGDPHVGTAYMALFNEIFAKRFNGKMILRIEDTDQ--TRSRDDYEKNIFSALQWCGIQWDEGPDIGGP-HGPYRQSERTEIYREYAELLLKTDYAYKCFATPKELEEMRAVATTLG--YRGGYDRRYRYLSPEEIEARTQE--------GQPYTIRLKVPLTGECVLEDYCKGRVVFPWADVD------DQVLMKSD---------------GFPTYHFANVVDDHLMGITHVLRGEEWLSSTPKHLLLYEAFGWE-----PPIFLHMPLLLNPD-GTKLSKRKNP--------------chlamy_psi ::: ---TSIFYYRDA-GYIKEAFMNFLTLMGYSME-----GDEEVYSLEKLIANFDPKRIGKSGAVFDVRKLDWMNKHYLNHEGSPENLLARLKDWLVND------------EFLLKILPLCQSRMATLAEFVG-LSEFFFSVLPEYSKEEL--------LPAAISQEKAAILFYSYV--------KYLEKTDLWV-----------KDQFYLGSKWLSEA----FQVHHKKVVIPLLYVAIT------------------------------GKKQGLPLFDSMELLG-KPRTRMRMVHAQNLLGGVPKKIQTAIDKVLKEE
chlamy_psi
chlamy_psi
---TSIFYYRDA-GYIKEAFMNFLTLMGYSME-----GDEEVYSLEKLIANFDPKRIGKSGAVFDVRKLDWMNKHYLNHEGSPENLLARLKDWLVND------------EFLLKILPLCQSRMATLAEFVG-LSEFFFSVLPEYSKEEL--------LPAAISQEKAAILFYSYV--------KYLEKTDLWV-----------KDQFYLGSKWLSEA----FQVHHKKVVIPLLYVAIT------------------------------GKKQGLPLFDSMELLG-KPRTRMRMVHAQNLLGGVPKKIQTAIDKVLKEE
-------MEKIRTRYAPSPTGYLHVGGTRTAIFNFLLAKHFNGEFIIRIEDTDT--ERNIKEGINSQFDNLRWLGVIADESVYNPGN-YGPYLQSQKLAVYKKLAFDLIEKNLAYRCFCSKEKLESDRKQAINNH--KTPKYLGHCRNLHSKKITNHLEK--------NDPFTIRLKINNEAEYSWNDLVRGQITIPGSALT------DIVILKAN---------------GVATYNFAVVIDDYDMEITDVLRGAEHISNTAYQLAIYQALGFKR----IPRFGHLSVIVDES-GKKLSKRDEKTT------------sye_mycge ::: ---QFIEQFKQQ-GYLPEALLNFLALLGWHP-----QYNQEFFNLKQLIENFSLSRVVSAPAFFDIKKLQWINANYIKQ-LTDNAYFNFIDNYLDVKVDYLK-------DKNREISLLFKNQITHGVQINE-LIRESFATKIGVENLA---------KKSHILFKNIKLFLEQLA--------KSLQGLEEWK-----------AEQIKTTINKVGAV----FNLKGKQLFMPIRLIFT-------------------------------NKEHGPDLAHIIEIFD-KESAINLIKQFINATNLF--------------sye_mycge
sye_mycge
---QFIEQFKQQ-GYLPEALLNFLALLGWHP-----QYNQEFFNLKQLIENFSLSRVVSAPAFFDIKKLQWINANYIKQ-LTDNAYFNFIDNYLDVKVDYLK-------DKNREISLLFKNQITHGVQINE-LIRESFATKIGVENLA---------KKSHILFKNIKLFLEQLA--------KSLQGLEEWK-----------AEQIKTTINKVGAV----FNLKGKQLFMPIRLIFT-------------------------------NKEHGPDLAHIIEIFD-KESAINLIKQFINATNLF---------------------MEKIRTRYAPSPTGYLHVGGARTAIFNFLLAKHFNGEFIIRIEDTDT--ERNVEGGIESQLENLRWLGIIPDESIYNPGN-YGPYIQSQKLATYKKLAYELVGKGLAYRCFCTKEKLEHERQLALEHH--QTPKYLGTCRNLHSKHIQTNLDN--------QVPFTIRLKINQDAEFAWNDQVRGKITIPGNSLT------DIVLLKAN---------------GIATYNFAVVIDDHDMEITDVLRGAEHISNTAYQLAINQALGYQR----IPRFGHLSVIVDKS-GKKLSKRDTKTI------------sye_mycpn ::: ---QFIEQFKQE-GYLPEAVVNFLALLGWNS-----DFNREFFTINQLIESFTVNRVVGAPAFFDIKKLQWINAHYIKE-LSDNAYFNFIDNYLTIDFDYLK-------NKRKEVSLLFKNQLAFGIEINQ-LIKETFAPKLGVQHLS---------VKHRELFKELQSALQQLS--------EQLQALPDWT-----------KDNVKSTLTQIGEQ----FNLKGKKLFMPLRLIFT-------------------------------NKEHGPDLAGIMVLHG-KTQVLALLQEFIHATNLF--------------sye_mycpn
sye_mycpn
---QFIEQFKQE-GYLPEAVVNFLALLGWNS-----DFNREFFTINQLIESFTVNRVVGAPAFFDIKKLQWINAHYIKE-LSDNAYFNFIDNYLTIDFDYLK-------NKRKEVSLLFKNQLAFGIEINQ-LIKETFAPKLGVQHLS---------VKHRELFKELQSALQQLS--------EQLQALPDWT-----------KDNVKSTLTQIGEQ----FNLKGKKLFMPLRLIFT-------------------------------NKEHGPDLAGIMVLHG-KTQVLALLQEFIHATNLF---------------------MKKLRTRYAPSPTGYLHIGGARTALFNYLLAKHYNGDFIIRIEDTDV--KRNIADGEASQIENLKWLNIEANESPLKPNEKYGPYRQSQKLEKYLKIAHELIEKGYAYKAYDNSEELEEQKKHSEKLG-VASFRYQRDFLKISEEEKQKRDAS--------G-AYSIRVICPKNTTYQWDDLVRGNIAVNSNDIG------DWIIIKSD---------------DYPTYNFAVVIDDIDMEISHILRGEEHITNTPKQMMIYDYLNAP-----KPLFGHLTIITNME-GKKLSKRDLSLK------------sye_mycpu ::: ---QFIHEYKEE-GYNSQAIFNFLTLLGWTD-----EKARELMDHDEIIKSFLYTRLSKSPSKFDITKMQWFSKQYWKN-TPNEELIKILNLNDYDN------------DWINLFLDLYKENIYSLNQLKN-YLKIYKQANLNQ-------------EKDLDLNDAEKNVVKSFS--------SYIDYS-NFS-----------VNQIQEAINKTQEK----LSIKGKNLFLPIRKATT-------------------------------FQEHGPELAKAIYLFG-SEIIEKRMKKWK--------------------sye_mycpu
sye_mycpu
---QFIHEYKEE-GYNSQAIFNFLTLLGWTD-----EKARELMDHDEIIKSFLYTRLSKSPSKFDITKMQWFSKQYWKN-TPNEELIKILNLNDYDN------------DWINLFLDLYKENIYSLNQLKN-YLKIYKQANLNQ-------------EKDLDLNDAEKNVVKSFS--------SYIDYS-NFS-----------VNQIQEAINKTQEK----LSIKGKNLFLPIRKATT-------------------------------FQEHGPELAKAIYLFG-SEIIEKRMKKWK-----------------------------MVVTRIAPSPTGDPHVGTAYIALFNYAWARRNGGRFIVRIEDTDR--ARYVPGAEERILAALKWLGLSYDEGPDVAAP-TGPYRQSERLPLYQKYAEELLKRGWAYRAFETPEELEQIRKEK--------GGYDGRARNIPPEEAEERARR--------GEPHVIRLKVPRPGTTEVKDELRGVVVYDNQEIP------DVVLLKSD---------------GYPTYHLANVVDDHLMGVTDVIRAEEWLVSTPIHVLLYRAFGWE-----APRFYHMPLLRNPD-KTKISKRKSH--------------sye_theth ::: ---TSLDWYKAE-GFLPEALRNYLCLMGFSMP-----DGREIFTLEEFIQAFTWERVSLGGPVFDLEKLRWMNGKYIREVLSLEEVAERVKPFLREAGLSWESE-----AYLRRAVELMRPRFDTLKEFPE-KARYLFTEDYPVS------------EKAQRKLEEGLPLLKELY--------PRLRAQEEWT-----------EAALEALLRGFAAE----KGVKLGQVAQPLRAALT-------------------------------GSLETPGLFEILALLG-KERALRRLERALA-------------------sye_theth
sye_theth
---TSLDWYKAE-GFLPEALRNYLCLMGFSMP-----DGREIFTLEEFIQAFTWERVSLGGPVFDLEKLRWMNGKYIREVLSLEEVAERVKPFLREAGLSWESE-----AYLRRAVELMRPRFDTLKEFPE-KARYLFTEDYPVS------------EKAQRKLEEGLPLLKELY--------PRLRAQEEWT-----------EAALEALLRGFAAE----KGVKLGQVAQPLRAALT-------------------------------GSLETPGLFEILALLG-KERALRRLERALA-------------------ASADSGGSGPVRVRFAPSPTGNLHVGGARTALFNYLFARSRGGKFVLRVEDTDL--ERSTKKSEEAVLTDLSWLGLDWDEGPDIGGD-FGPYRQSERNALYKEHAQKLMESGAVYRCFCSNEELEKMKETANRMK--IPPVYMGKWATASDAEVQQELEK--------GTPYTYRFRVPKEGSLKINDLIRGEVSWNLNTLG------DFVIMRSN---------------GQPVYNFCVTVDDATMRISHVIRAEEHLPNTLRQALIYKALGFA-----MPLFAHVSLILAPD-KSKLSKRHGA--------------sye_horvu ::: ---TSVGQYKEM-GYLPQAMVNYLALLGWGD-----GTENEFFTIDDLVEKFTIDRVNKSGAVFDATKLKWMNGQHLRS-LPSDLLIKDFEDQWRSTGILLESES----GFAKEAAELLKEGIDLITDADAALCKLLSYPLHETLSSD---------EAKSVVEDKLSEVASGLI--------SAYDSG-ELD--------QALAEGHDGWKKWVKSFGKT-HKRKGKSLFMPLRVLLT-------------------------------GKLHGPAMDSTVILVH-KAGTSGAVAPQSGFVSLDERFKILKEVNWESLQ
sye_horvu
sye_horvu
---TSVGQYKEM-GYLPQAMVNYLALLGWGD-----GTENEFFTIDDLVEKFTIDRVNKSGAVFDATKLKWMNGQHLRS-LPSDLLIKDFEDQWRSTGILLESES----GFAKEAAELLKEGIDLITDADAALCKLLSYPLHETLSSD---------EAKSVVEDKLSEVASGLI--------SAYDSG-ELD--------QALAEGHDGWKKWVKSFGKT-HKRKGKSLFMPLRVLLT-------------------------------GKLHGPAMDSTVILVH-KAGTSGAVAPQSGFVSLDERFKILKEVNWESLQ
VYASAGDGGDVRVRFAPSPTGNLHVGGARTALFNYLYARAKGGKFILRIEDTDL--ERSTKESEEAVLRDLSWLGPAWDEGPGIGGE-YGPYRQSERNALYKQFAEKLLQSGHVYRCFCSNEELEKMKEIAKLKQ--LPPVYTGRWASATEEEVVEELAK--------GTPYTYRFRVPKEGSLKIDDLIRGEVSWNLDTLG------DFVIMRSN---------------GQPVYNFCVTVDDATMAISHVIRAEEHLPNTLRQALIYKALGFP-----MPHFAHVSLILAPD-RSKLSKRHGA--------------sye_tobac ::: ---TSVGQFRDM-GYLPQAMVNYLALLGWGD-----GTENEFFTLEQLVEKFTIERVNKSGAIFDSTKLRWMNGQHLRS-LPSEELNRIIGERWKDAGIATESQG----IFIQDAVLLLKDGIDLITDSEKALSSLLSYPLYETLASA---------EGKPILEDGVSEVAKSLL--------AAYDSG-ELS--------GALAEGQPGWQKWAKNFGKL-LKRKGKSLFMPLRVLLT-------------------------------GKLHGPDIGATTVLLY-KAGTSGSVVPQAGFVTFDERFKILREVQWESFS
sye_tobac
sye_tobac
---TSVGQFRDM-GYLPQAMVNYLALLGWGD-----GTENEFFTLEQLVEKFTIERVNKSGAIFDSTKLRWMNGQHLRS-LPSEELNRIIGERWKDAGIATESQG----IFIQDAVLLLKDGIDLITDSEKALSSLLSYPLYETLASA---------EGKPILEDGVSEVAKSLL--------AAYDSG-ELS--------GALAEGQPGWQKWAKNFGKL-LKRKGKSLFMPLRVLLT-------------------------------GKLHGPDIGATTVLLY-KAGTSGSVVPQAGFVTFDERFKILREVQWESFS
---------MVRVRFAPSPTGFLHVGGARTALFNFLFARKEKGKFILRIEDTDL--ERSEREYEEKLMESLRWLGLLWDEGPDVGGD-HGPYRQSERVEIYREHAERLVKEGKAYYVYAYPEEIEEMREKLLSEG--KAPHYSQEMFEKFDTPERRREYEEK------GLRPAVFFKMPR-KDYVLNDVVKGEVVFKTGAIG------DFVIMRSN---------------GLPTYNFACVVDDMLMEITHVIRGDDHLSNTLRQLALYEAFEKA-----PPVFAHVSTILGPD-GKKLSKRHGA--------------thermo_mar ::: ---TSVEAFRDM-GYLPEALVNYLALLGWSH-----PEGKELLTLEELISSFSLDRLSPNPAIFDPQKLKWMNGYYLRN-MPIEKLAELAKPFFEKAGIKIIDE-----EYFKKVLEITKERVEVLSEFPE-ESRFFFEDP-----------------APVEIPEEMKEVFSQLK--------EELQNV-RWT-----------MEEITPVFKKVLKQ----HGVKPKEFYMTLRRVLT-------------------------------GREEGPELVNIIPLLG-KEIFLRRIERSLGG------------------thermo_mar
thermo_mar
---TSVEAFRDM-GYLPEALVNYLALLGWSH-----PEGKELLTLEELISSFSLDRLSPNPAIFDPQKLKWMNGYYLRN-MPIEKLAELAKPFFEKAGIKIIDE-----EYFKKVLEITKERVEVLSEFPE-ESRFFFEDP-----------------APVEIPEEMKEVFSQLK--------EELQNV-RWT-----------MEEITPVFKKVLKQ----HGVKPKEFYMTLRRVLT-------------------------------GREEGPELVNIIPLLG-KEIFLRRIERSLGG--------------------MASASGSPVRVRFCPSPTGNPHVGLVRTALFNWAFARHHQGTLVFRIEDTDA--ARDSEESYDQLLDSMRWLGFDWDEGPEVGGP-HAPYRQSQRMDIYQDVAQKLLDAGHAYRCYCSQEELDTRREAARAAG--KPSGYDGHCRELTDAQVEEYTSQ--------GREPIVRFRMPDE-AITFTDLVRGEITYLPENVP------DYGIVRAN---------------GAPLYTLVNPVDDALMEITHVLRGEDLLSSTPRQIALYKALIELGVAKEIPAFGHLPYVMGEG-NKKLSKRDPQ--------------strepto_co ::: ---SSLNLYRER-GFLPEGLLNYLSLLGWSLS-----ADQDIFTIEEMVAAFDVSDVQPNPARFDLKKCEAINGDHIRL-LEVKDFTERCRPWLKA-PVAPWAPEDFDEAKWQAIAPHAQTRLKVLSEITD-NVDFLFLPEPVFDEA----------SWTKAMKEGSDALLTTAR--------EKLD-AADWTS----------PEALKEAVLAAGEA----HGLKLGKAQAPVRVAVT-------------------------------GRTVGLPLFESLEVLG-KEKALARIDAALARLAA---------------strepto_co
strepto_co
---SSLNLYRER-GFLPEGLLNYLSLLGWSLS-----ADQDIFTIEEMVAAFDVSDVQPNPARFDLKKCEAINGDHIRL-LEVKDFTERCRPWLKA-PVAPWAPEDFDEAKWQAIAPHAQTRLKVLSEITD-NVDFLFLPEPVFDEA----------SWTKAMKEGSDALLTTAR--------EKLD-AADWTS----------PEALKEAVLAAGEA----HGLKLGKAQAPVRVAVT-------------------------------GRTVGLPLFESLEVLG-KEKALARIDAALARLAA--------------------MANKKIRVRYAPSPTGHLHIGNARTALFNYLFARHNKGTLVLRIEDADT--ERNVEGGAESQIENLHWLGIDWDEGPDIGGD-YGPYKQSERKDIYQKYIDQLLEEGKAYYSFKTEEELEAQREEQRAMG--IAPHYVYEYEGMTTDEIKQAQAEARAK----GLKPVVRIHIPEGVTYEWDDIVKGHLSFESDTIG-----GDFVIQKRD---------------GMPTYNFAVVIDDHLMEISHVLRGDDHISNTPKQLCVYEALGWE-----APVFGHMTLIINSATGKKLSKRDESVL------------sye_lacde ::: ---QFIEQYREL-VSCQKPCSTSSSLLGWSP-----VGESEIFSKREFIKQFDPARLSKSPAAFDQKKLDWVNNQYMKT-ADRDELLDLALHNLQEAGLVEANPAPGKMEWVRQLVNMYANQMSYTKQIVD-LSKIFFTEAKYLTDE----------EVEEIKKDEARPAIEEFK--------KQLDKLDNFT-----------AKKIMGAIMATRRE----TGIKGRKLFMPIRIATT-------------------------------RSMVGPGIGEAMELMG-KDTVMKHLDLTLKQLSEAGIE-----------sye_lacde
sye_lacde
---QFIEQYREL-VSCQKPCSTSSSLLGWSP-----VGESEIFSKREFIKQFDPARLSKSPAAFDQKKLDWVNNQYMKT-ADRDELLDLALHNLQEAGLVEANPAPGKMEWVRQLVNMYANQMSYTKQIVD-LSKIFFTEAKYLTDE----------EVEEIKKDEARPAIEEFK--------KQLDKLDNFT-----------AKKIMGAIMATRRE----TGIKGRKLFMPIRIATT-------------------------------RSMVGPGIGEAMELMG-KDTVMKHLDLTLKQLSEAGIE-----------------MGNEVRVRYAPSPTGHLHIGNARTALFNYLFARNQGGKFIIRVEDTDK--KRNIEGGEQSQLNYLKWLGIDWDESVDVGGE-YGPYRQSERNDIYKVYYEELLEKGLAYKCYCTEEELEKEREEQIARG--EMPRYSGKHRDLTQEEQEKFIAE--------GRKPSIRFRVPEGKVIAFNDIVKGEISFESDGIG------DFVIVKKD---------------GTPTYNFAVAIDDYLMKMTHVLRGEDHISNTPKQIMIYQAFGWD-----IPQFGHMTLIVNES-RKKLSKRDESII------------sye_bacsu ::: ---QFIEQYKEL-GYLPEALFNFIGLLGWSP-----VGEEELFTKEQFIEIFDVNRLSKSPALFDMHKLKWVNNQYVKK-LDLDQVVELTLPHLQKAGKVGTELSAEEQEWVRKLISLYHEQLSYGAEIVE-LTDLFFTDEIEYNQE----------AKAVLEEEQVPEVLSTFA--------AKLEELEEFT-----------PDNIKASIKAVQKE----TGHKGKKLFMPIRVAVT-------------------------------GQTHGPELPQSIELIG-KETAIQRLKNI---------------------sye_bacsu
sye_bacsu
---QFIEQYKEL-GYLPEALFNFIGLLGWSP-----VGEEELFTKEQFIEIFDVNRLSKSPALFDMHKLKWVNNQYVKK-LDLDQVVELTLPHLQKAGKVGTELSAEEQEWVRKLISLYHEQLSYGAEIVE-LTDLFFTDEIEYNQE----------AKAVLEEEQVPEVLSTFA--------AKLEELEEFT-----------PDNIKASIKAVQKE----TGHKGKKLFMPIRVAVT-------------------------------GQTHGPELPQSIELIG-KETAIQRLKNI---------------------------MAKDVRVGYAPSPTGHLHIGGARTALFNYLFARHHGGKMIVRIEDTDI--ERNVEGGEQSQLENLQWLGIDYDESVDKDGG-YGPYRQTERLDIYRKYVDELLEQGHAYKCFCTPEELEREREEQRAAG-IAAPQYSGKCRRLTPEQVAELEAQ--------GKPYTIRLKVPEGKTYEVDDLVRGKVTFESKDIG------DWVIVKAN---------------GIPTYNFAVVIDDHLMEISHVFRGEEHLSNTPKQLMVYEYFGWE-----PPQFAHLTLIVNEQ-RKKLSKRDESII------------sye_bacst ::: ---QFVSQYKEL-GYLPEAMFNFFALLGWSP-----EGEEEIFSKDELIRIFDVSRLSKSPSMFDTKKLTWMNNQYIKK-LDLDRLVELALPHLVKAGRLPADMSDEQRQWARDLIALYQEQMSYGAEIVP-LSELFFKEEVEYEDE----------ARQVLAEEQVPDVLSAFL--------AHVRDLDPFT-----------ADEIKAAIKAVQKA----TGQKGKKLFMPIRAAVT-------------------------------GQTHGPELPFAIQLLG-KQKVIERLERALQEKF----------------sye_bacst
sye_bacst
---QFVSQYKEL-GYLPEAMFNFFALLGWSP-----EGEEEIFSKDELIRIFDVSRLSKSPSMFDTKKLTWMNNQYIKK-LDLDRLVELALPHLVKAGRLPADMSDEQRQWARDLIALYQEQMSYGAEIVP-LSELFFKEEVEYEDE----------ARQVLAEEQVPDVLSAFL--------AHVRDLDPFT-----------ADEIKAAIKAVQKA----TGQKGKKLFMPIRAAVT-------------------------------GQTHGPELPFAIQLLG-KQKVIERLERALQEKF----------------TSDGTPQAAKVRVRFCPSPTGVPHVGMVRTALFNWAYARHTGGTFVLRIEDTDA--DRDSEESYLALLDALRWLGLNWDEGPEVGGP-YGPYRQSQRTDIYREVVAKLLATGEAYYAFSTPEEVENRHLAAGRNP---KLGYDNFDRDLTDAQFSAYLAE--------GRKPVVRLRMPDE-DISWDDLVRGTTTFAVGTVP------DYVLTRAS---------------GDPLYTLVNPCDDALMKITHVLRGEDLLSSTPRQVALYQALIRIGMAERIPEFGHFPSVLGEG-TKKLSKREPQ--------------mycob_lepr
:
mycob_lepr
mycob_lepr :: ---SNLFAHRDR-GFIPEGLLNYLALLGWAIA-----DDHDLFSLDEMVAAFDVVDVNSNPARFDQKKADAVNAEHIRM-LDSEDFAGRLRDYFTTHGYHIALDPANYEAGFVAAAQLVQTRIVVLGDAWD-LLKFLNDDEYSIDSK----------AAAKELDADAGPVLDVAC--------AVLDSLVDWT-----------TASIEDVLKVALIE---GLGLKPRKVFGPIRVAAT-------------------------------GALVSPPLFESLELLG-RARSLQRLSAARARVTSA-----------------SNLFAHRDR-GFIPEGLLNYLALLGWAIA-----DDHDLFSLDEMVAAFDVVDVNSNPARFDQKKADAVNAEHIRM-LDSEDFAGRLRDYFTTHGYHIALDPANYEAGFVAAAQLVQTRIVVLGDAWD-LLKFLNDDEYSIDSK----------AAAKELDADAGPVLDVAC--------AVLDSLVDWT-----------TASIEDVLKVALIE---GLGLKPRKVFGPIRVAAT-------------------------------GALVSPPLFESLELLG-RARSLQRLSAARARVTSA----------------------MSTRVRYAPSPTGLQHIGGIRTALFNYFFAKSCGGKFLLRIEDTDQ--SRYSPEAENDLYSSLKWLGISFDEGPVVGGD-YAPYVQSQRSAIYKQYAKYLIESGHAYYCYCSPERLERIKKIQNINK--MPPGYDRHCRNLSNEEVENALIK--------KIKPVVRFKIPLEGDTSFDDILLGRITWANKDIS-----PDPVILKSD---------------GLPTYHLANVVDDYLMKITHVLRAQEWVSSGPLHVLLYKAFKWK-----PPIYCHLPMVMGND-GQKLSKRHGS--------------sye_borbu ::: ---TALRQFIED-GYLPEAIINYVTLLGWSYD-----DKREFFSKNDLEQFFSIEKINKSPAIFDYHKLDFFNSYYIRE-KKDEDLFNLLLPFFQKKGYVSKPSTLEENQKLKLLIPLIKSRIKKLSDALN-MTKFFYEDIKSWNLDEF--------LSRKKTAKEVCSILELIK--------PILEGFEKRS-----------SEENDKIFYDFAES----NGFKLGEILLPIRIAAL-------------------------------GSKVSPPLFDSLKLIG-KSKVFERIKLAQEFLRINE-------------sye_borbu
sye_borbu
---TALRQFIED-GYLPEAIINYVTLLGWSYD-----DKREFFSKNDLEQFFSIEKINKSPAIFDYHKLDFFNSYYIRE-KKDEDLFNLLLPFFQKKGYVSKPSTLEENQKLKLLIPLIKSRIKKLSDALN-MTKFFYEDIKSWNLDEF--------LSRKKTAKEVCSILELIK--------PILEGFEKRS-----------SEENDKIFYDFAES----NGFKLGEILLPIRIAAL-------------------------------GSKVSPPLFDSLKLIG-KSKVFERIKLAQEFLRINE-------------APFNLDPNVKVRTRFAPSPTGYLHVGGARTALYSWLYAKHNNGEFVLRIEDTDL--ERSTPEATAAIIEGMEWLNLPWEH---------GPYYQTKRFDRYNQVIDEMIEQGLAYRCYCTKEHLEELRHTQEQNK--EKPRYDRHCLHDH-NHSP-------------DEPHVVRFKNPTEGSVVFDDAVRGRIEISNSELD------DLIIRRTD---------------GSPTYNFCVVVDDWDMGITHVVRGEDHINNTPRQINILKAIGAP-----IPTYAHVSMINGDD-GQKLSKRHGA--------------sye_haein ::: ---VSVMQYRDD-GYLPEALINYLVRLGWGH------GDQEIFSREEMINYFELDHVSKSASAFNTEKLQWLNQHYIRE-LPPEYVAKHLEWHYKDQGIDTSNG-----PALTEIVTMLAERCKTLKEMAR-SSRYFFEEFETFDEA----------AAKKHFKGNAAEALAKVK--------EKLTALSSWD-----------LHSIHEAIEQTAAE----LEVGMGKVGMPLRVAVT-------------------------------GSGQSPSMDVTLVGIG-RDRVLARIQRAIDFIHAQNA------------sye_haein
sye_haein
---VSVMQYRDD-GYLPEALINYLVRLGWGH------GDQEIFSREEMINYFELDHVSKSASAFNTEKLQWLNQHYIRE-LPPEYVAKHLEWHYKDQGIDTSNG-----PALTEIVTMLAERCKTLKEMAR-SSRYFFEEFETFDEA----------AAKKHFKGNAAEALAKVK--------EKLTALSSWD-----------LHSIHEAIEQTAAE----LEVGMGKVGMPLRVAVT-------------------------------GSGQSPSMDVTLVGIG-RDRVLARIQRAIDFIHAQNA--------------------MKIKTRFAPSPTGYLHVGGARTALYSWLFARNHGGEFVLRIEDTDL--ERSTPEAIEAIMDGMNWLSLEWDE---------GPYYQTKRFDRYNAVIDQMLEEGTAYKCYCSKERLEALREEQMAKG--EKPRYDGRCRHSHEHHAD-------------DEPCVVRFANPQEGSVVFDDQIRGPIEFSNQELD------DLIIRRTD---------------GSPTYNFCVVVDDWDMEITHVIRGEDHINNTPRQINILKALKAP-----VPVYAHVSMINGDD-GKKLSKRHGA--------------sye_ecoli
:
sye_ecoli
sye_ecoli :: ---VSVMQYRDD-GYLPEALLNYLVRLGWSH------GDQEIFTREEMIKYFTLNAVSKSASAFNTDKLLWLNHHYINA-LPPEYVATHLQWHIEQENIDTRNG-----PQLADLVKLLGERCKTLKEMAQ-SCRYFYEDFAEFDAD----------AAKKHLRPVARQPLEVVR--------DKLAAITDWT-----------AENVHHAIQATADE----LEVGMGKVGMPLRVAVT-------------------------------GAGQSPALDVTVHAIG-KTRSIERINKALDFIAERENQQ-------------VSVMQYRDD-GYLPEALLNYLVRLGWSH------GDQEIFTREEMIKYFTLNAVSKSASAFNTDKLLWLNHHYINA-LPPEYVATHLQWHIEQENIDTRNG-----PQLADLVKLLGERCKTLKEMAQ-SCRYFYEDFAEFDAD----------AAKKHLRPVARQPLEVVR--------DKLAAITDWT-----------AENVHHAIQATADE----LEVGMGKVGMPLRVAVT-------------------------------GAGQSPALDVTVHAIG-KTRSIERINKALDFIAERENQQ---------------------MLRFAPSPTGDMHIGNLRAAIFNYIVAKQQYKPFLIRIEDTDK--ERNIEGKDQEILEILKLMGISWDKL----------VYQSHNIDYHREMAEKLLKENKAFYCYASAEFLEREKEKAKNEK--RPFRYSDEWATLEKDK---------------HHAPVVRLKAP-NHAVSFNDAIKKEVKFEPDELD------SFVLLRQD---------------KSPTYNFACACDDLLYKISLIIRGEDHVSNTPKQILIQQALGSND----PIVYAHLPIILDEVSGKKMSKRDEA--------------heli_pylor ::: ---SSVKWLLNQ-GFLPVAIANYLITIGN-------KVPKEVFSLDEAIEWFSLENLSSSPAHFNLKYLKHLNHEHLKL-LDDDKLLELTSIKD---------------KNLLGLLRLFIEECGTLLELRE-KISLFLEPKD----------------IVKTYENEDFKERCLAL--------FNALTSMDFQA----------YKDFESFKKEAMRL----SQLKGKDFFKPLRILLT-------------------------------GNSHGVELPLIFPYIQSHHQEVLRLKA----------------------heli_pylor
heli_pylor
---SSVKWLLNQ-GFLPVAIANYLITIGN-------KVPKEVFSLDEAIEWFSLENLSSSPAHFNLKYLKHLNHEHLKL-LDDDKLLELTSIKD---------------KNLLGLLRLFIEECGTLLELRE-KISLFLEPKD----------------IVKTYENEDFKERCLAL--------FNALTSMDFQA----------YKDFESFKKEAMRL----SQLKGKDFFKPLRILLT-------------------------------GNSHGVELPLIFPYIQSHHQEVLRLKA----------------------MKLTGFLKQNVRVRFAPSPTGHLHIGGLRTAFFNYLFAKKYGGDFILRIEDTDR--TRFIY-------SSLNFYNLLPDEGPREGGK-FGPYEQSKRLEIYRNAAYRLIDSGHAYRCFCSENRLDLLRKTAEKRG--EIPKYDRKCANLSSRDAVKMEQN--------GEKFVIRFKLD-KQNVQFHDEVFGSVNQFIDES-------DPVLLKSD---------------GFPTYHLANVIDDRKMEISHVIRGMEWLSSTGKHTILYKAFNWT-----PPKFVHLSLIMRSA-TKKLSKRDKD--------------caeno_eleg ::: ---AFVSYYSEQLGALPEAVLNLMIRNGAGIRN---FDAEHFYSLDEMIEQFDLSLLGRRNLLLDSDVLQKYSRMAFQK-SDFKELYPRIIDILNKKSNYSTSREDI--QKIVTFLKAKEENFGFLSSLST-EFSWFFTRPQ---------------SSQLLKESHPNVDLRNIL--------NSLLEIEVFN-----------SESLEYLAKNH--------QLNLAKAMGIVRISLI-------------------------------GSKKGPPISELVEFFG-MTECHRRI----RIMQELL-------------caeno_eleg
caeno_eleg
---AFVSYYSEQLGALPEAVLNLMIRNGAGIRN---FDAEHFYSLDEMIEQFDLSLLGRRNLLLDSDVLQKYSRMAFQK-SDFKELYPRIIDILNKKSNYSTSREDI--QKIVTFLKAKEENFGFLSSLST-EFSWFFTRPQ---------------SSQLLKESHPNVDLRNIL--------NSLLEIEVFN-----------SESLEYLAKNH--------QLNLAKAMGIVRISLI-------------------------------GSKKGPPISELVEFFG-MTECHRRI----RIMQELL---------------------MTVRVRIAPSPTGNLHIGTARTAVFNWLFARHTGGTFILRVEDTDL--ERSKAEYTENIQSGLQWLGLNWDEG---------PFFQTQRLDHYRKAIQQLLDQGLAYRCYCTSEELEQMREAQKAKN--QAPRYDNRHRNLTPDQEQALRAE--------GRQPVIRFRIDDDRQIVWQDQIRGQVVWQGSDLG-----GDMVIARAS--------ENPEEAFGQPLYNLAVVVDDIDMAITHVIRGEDHIANTAKQILLYEALGGA-----VPTFAHTPLILNQE-GKKLSKRDGV--------------sye_syny3
:
sye_syny3
sye_syny3 :: ---TSIDDFRAM-GFLPQAIANYMCLLGWTPP----DSTQEIFTLAEAAEQFSLERVNKAGAKFDWQKLDWINSQYLHA-LPAAELVPLLIPHLEAGGHQVDPDRDQ--AWLVGLATLIGPSLTRLTDAAT-ESQLLFGDRLELKED----------GQKQLAVEGAKAVLEAAL--------TFSQNTPELT-----------LDEAKGEINRLTKE----LGLKKGVVMKSLRAGLM-------------------------------GTVQGPDLLQSWLLLQQKGWATTRLTQAIAAE--------------------TSIDDFRAM-GFLPQAIANYMCLLGWTPP----DSTQEIFTLAEAAEQFSLERVNKAGAKFDWQKLDWINSQYLHA-LPAAELVPLLIPHLEAGGHQVDPDRDQ--AWLVGLATLIGPSLTRLTDAAT-ESQLLFGDRLELKED----------GQKQLAVEGAKAVLEAAL--------TFSQNTPELT-----------LDEAKGEINRLTKE----LGLKKGVVMKSLRAGLM-------------------------------GTVQGPDLLQSWLLLQQKGWATTRLTQAIAAE------------------------MSKVKTRFAPSPTGYLHLGNARTAIFSYLFARHNNGGFVLRIEDTDP--ERSKKEYEEMLIEDLKWLGIDWDEF----------YRQSERFDIYREYVNKLLESGHAYPCFCTPEELEKEREEARKKG--IPYRYSGKCRHLTPEEVEKFKKE--------GKPFAIRFKVPENRTVVFEDLIKGHIAINTDDFG------DFVIVRSD---------------GSPTYNFVVVVDDALMGITHVIRGEDHIPNTPKQILIYEALGFP-----VPKFAHLPVILGED-RSKLSKRHGA--------------sye_aquae ::: ---VSVRAYREE-GYMPEALFNYLCLLGWSPP----EEGREIFSKEELIKIFDLKDVNDSPAVFNKEKLKWMNGVYIREVLPLDVLLERAIPFLEKAG--YDTSDR---EYIKKVLEYTRDSFDTLSEMVD-RLRPFFVDEFEIPEE----------LWSFLDDEKAYQVLSAFL--------EKIREKKPET-----------PQEVKKLAKEIQKA----LKVKPPQVWKPLRIALT-------------------------------GELEGVGIDILIAVLP-KEKIEKRILRVLEKLS----------------sye_aquae
sye_aquae
---VSVRAYREE-GYMPEALFNYLCLLGWSPP----EEGREIFSKEELIKIFDLKDVNDSPAVFNKEKLKWMNGVYIREVLPLDVLLERAIPFLEKAG--YDTSDR---EYIKKVLEYTRDSFDTLSEMVD-RLRPFFVDEFEIPEE----------LWSFLDDEKAYQVLSAFL--------EKIREKKPET-----------PQEVKKLAKEIQKA----LKVKPPQVWKPLRIALT-------------------------------GELEGVGIDILIAVLP-KEKIEKRILRVLEKLS-----------------------MSLIVTRFAPSPTGYLHIGGLRTAIFNYLFARANQGKFFLRIEDTDL--SRNSIEAANAIIEAFKWVGLEYDG---------EILYQSKRFEIYKEYIQKLLDEDKAYYCYMSKEELDALREEQKARK--ETPRYDNRYRDFKGTPPK-------------GIEPVVRIKVPQNEVIGFNDGVKGEVKVNTNELD------DFIIARSD---------------GTPTYNFVVTIDDALMGITDVIRGDDHLSNTPKQIVLYKALNFK-----IPNFFHVPMILNEE-GQKLSKRHGA--------------sye_helpy ::: ---TNVMDYQEM-GYLKEALVNFLARLGWSY------QDKEVFSMQELLELFDPKDLNSSPSCFSWHKLNWLNAHYLKN-QSVQELLKLLKPFSFSDLSHLNP------TQLDRLLDALKERSQTLKELAL-KIDEVLIAPVEYEEK----------VFKKLNQALVMPLLEKFK--------LELNKANFND-----------ESALENAMRQIIEE----EKIKAGSFMQPLRLALL-------------------------------GKGGGIGLKEALFILG-KTESVKRIEDFLKN------------------sye_helpy
sye_helpy
---TNVMDYQEM-GYLKEALVNFLARLGWSY------QDKEVFSMQELLELFDPKDLNSSPSCFSWHKLNWLNAHYLKN-QSVQELLKLLKPFSFSDLSHLNP------TQLDRLLDALKERSQTLKELAL-KIDEVLIAPVEYEEK----------VFKKLNQALVMPLLEKFK--------LELNKANFND-----------ESALENAMRQIIEE----EKIKAGSFMQPLRLALL-------------------------------GKGGGIGLKEALFILG-KTESVKRIEDFLKN-------------------------MTNIITRFAPSPTGFLHIGSARTALFNYLFARHNNGKFFLRIEDTDK--KRSTKEAVEAIFSGLKWLGLNWDG---------EVIFQSKRNSLYKEAALKLLKEGKAYYCFTRQEEIAKQRQQALKDK--QHFIFNSEWRDKGPSTYPADIK------------PVIRLKVPREGSITIHDTLQGEIVIENSHID------DMILIRTD---------------GTATYMLAVIVDDHDMGITHIIRGDDHLTNAARQIAIYHAFGYE-----VPNMTHIPLIHGAD-GTKLSKRHGA--------------ricket_pro
:
ricket_pro
ricket_pro :: ---LGVEAYKDM-GYLPESLCNYLLRLGWSH------GDDEIISMNQAIEWFNLASLGKSPSKLDFAKMNSINSHYLRM-LDNDSLTSKTVEILKQNYKISEKEV----SYIKQAMPSLIVRSETLRDLAQ-LAYIYLVDSPMIYSQ----------DAKEVINNCDKDLIKQVI--------ENLSKLEQFN-----------KECVQNKFKEIAIY----NGLKLNDIMKPVRALIT-------------------------------GMTASPSVFEIAETLG-KENILKRLKIIYYNNLNF-----------------LGVEAYKDM-GYLPESLCNYLLRLGWSH------GDDEIISMNQAIEWFNLASLGKSPSKLDFAKMNSINSHYLRM-LDNDSLTSKTVEILKQNYKISEKEV----SYIKQAMPSLIVRSETLRDLAQ-LAYIYLVDSPMIYSQ----------DAKEVINNCDKDLIKQVI--------ENLSKLEQFN-----------KECVQNKFKEIAIY----NGLKLNDIMKPVRALIT-------------------------------GMTASPSVFEIAETLG-KENILKRLKIIYYNNLNF----------------MPAASDKPVVTRFAPSPTGYLHIGGGRTALFNWLYARGRKGTFLLRIEDTDR--ERSTPEATDAILRGLTWLGLDWDG---------EVVSQFARKDRHAEVAREMLERGAAYKCFSTQEEIEAFRESARAEG--RSTLFRSPWRDADPTSHPDA-------------PFVIRMKAPRSGETVIEDEVQGTVRFQNETLD------DMVVLRSD---------------GTPTYMLAVVVDDHDMGVTHVIRGDDHLNNAARQTMVYEAMGWE-----VPVWAHIPLIHGPD-GKKLSKRHGA--------------rhodo_spha ::: ---LGVEEYQAM-GYPAAGMRNYLARLGWSH------GDDEFFTSEQAMDWFDLGGIGRSPARLDFKKLESVCGQHIAV-MEDAELMREIAAYLAAARKPALTDLQA--ARLEKGLYALKDRAKTFPELLE-KARFALESRPIVADD----------AAAKALDPVSRGILRELT--------P-MLQAASWS-----------KQDLEAILTAFASE----KGMGFGKLAAPLRTALA-------------------------------GRTVTPSVYDMMLVIG-RDETIARLEDAAAA------------------rhodo_spha
rhodo_spha
---LGVEEYQAM-GYPAAGMRNYLARLGWSH------GDDEFFTSEQAMDWFDLGGIGRSPARLDFKKLESVCGQHIAV-MEDAELMREIAAYLAAARKPALTDLQA--ARLEKGLYALKDRAKTFPELLE-KARFALESRPIVADD----------AAAKALDPVSRGILRELT--------P-MLQAASWS-----------KQDLEAILTAFASE----KGMGFGKLAAPLRTALA-------------------------------GRTVTPSVYDMMLVIG-RDETIARLEDAAAA-------------------------MTKVITRFAPSPTGMLHVGNIRVALLNWLYAKKHNGKFILRFDDTDL--ERSKQKYKNDIERDLKFLNINWDQ----------TFNQLSRVSRYHEIKNLLINKKRLYACYETKEELELKRKLQLSKG--LPPIYDRASLNLTEKQIQKYIEQ--------GRKPHYRFFLSYE-PISWFDMIKGEIKYDGKTLS------DPIVIRAD---------------GSMTYMLCSVIDDIDYDITHIIRGEDHVSNTAIQIQMFEALNKI-----PPVFAHLSLIINKE--EKISKRVGG--------------ricket_pro ::: ---FEIAYLKKEVGLEAMTIASFFSLLGSSLH-----IF-PYKSIEKLVAQFEISSFSKSPTIYQQYDLERLNHKLLIS-LDFNEVKERLKEIDAD-------------YIDENFWLSVR---PNLQKLSD-IKDWWDICYQTPKIKNLN-------LDKEYLKQASKLLP-LKI--------TKDSWSIWT-------------KEITNIT-----------GRKGKELFLPLRLALT-------------------------------GRESGPEIAGILPLID-REEIIRRLISIA--------------------ricket_pro
ricket_pro
---FEIAYLKKEVGLEAMTIASFFSLLGSSLH-----IF-PYKSIEKLVAQFEISSFSKSPTIYQQYDLERLNHKLLIS-LDFNEVKERLKEIDAD-------------YIDENFWLSVR---PNLQKLSD-IKDWWDICYQTPKIKNLN-------LDKEYLKQASKLLP-LKI--------TKDSWSIWT-------------KEITNIT-----------GRKGKELFLPLRLALT-------------------------------GRESGPEIAGILPLID-REEIIRRLISIA----------------------------MSVAVPFAPSPTGLLHVGNVRLALVNWLFARKAGGNFLVRLDDTDE--ERSKPEYAEGIERDLTWLGLTWDR----------FARESDRYGATDEVAAALKASGRLYPCYETPEELNLKRASLSSQG--RPPIYDRAALRLGDADRARLEAE--------GRKPHWRFKLEHT-PVEWTDLVRGPVHFEGSALS------DPVLIAED---------------GRPLYTLTSVVDDADLAITHVIRGEDHLANTAVQIQIFEAVGGA-----VPVFAHLPLLTDAT-GQGLSKRLGS--------------sye_azobr
:
sye_azobr
sye_azobr :: ---LSVASLREEEGIEPMALASLLAKLGTSDA-----IE-PRLTLDELVAEFDIAKVSRATPKFDPEELLRLNARILHL-LPFERVAGELAASVWM-------------MPTPAFWEAV----PNLSRVAE-ARDWWAVTHAP--VARRR-------TIPLFLAEAATLLPKEPW--------DLSTWGTWT-------------GAVKAKT-----------GRKGKDLFLPLRRALT-------------------------------GRDHGGQLKNLLPLIG-RTRAHKRLAGETA----------------------LSVASLREEEGIEPMALASLLAKLGTSDA-----IE-PRLTLDELVAEFDIAKVSRATPKFDPEELLRLNARILHL-LPFERVAGELAASVWM-------------MPTPAFWEAV----PNLSRVAE-ARDWWAVTHAP--VARRR-------TIPLFLAEAATLLPKEPW--------DLSTWGTWT-------------GAVKAKT-----------GRKGKDLFLPLRRALT-------------------------------GRDHGGQLKNLLPLIG-RTRAHKRLAGETA--------------------
1gln
1.0
0.5
Window length = 8
Window length = 40
Julie Thompson – IGBMC
Error detection and correction

RASCAL (Thompson et al, 2003), Refiner (Chakrabati et al, 2006)
RASCAL
Define sequence groups
with the Secator program
Wicker N. et al. (2001).
Define core blocks :
regions with average
NorMD_sw above a
specified threshold
Calculate a Gribskov
profile for each block in
each group
Julie Thompson – IGBMC
Error detection and correction

RASCAL, errors within core blocks
metalloprotease
Julie Thompson – IGBMC
HExxH
Error detection and correction

RASCAL, errors between core blocks
methyltransferase
DxxxG[AST]GxF[ILV]
Julie Thompson – IGBMC
DxxxG[AST]GxF[ILV]
Homology detection methods

Sequence percent identity:



local analysis of positional conservation


>30% identity  sequences are homologous
15-30% identity  ‘twilight zone’
AL2CO (Pi, Grishin, 2001), SEGID (Wang,Zu,2003), NorMD
Conserved regions

LEON (Thompson et al, 2004), MCOFFEE (Moretti et al, 2007)
Julie Thompson – IGBMC
Homology analysis with LEON





vertical analysis :sequence clustering, intermediate sequences
horizontal analysis : residue conservation, motif context information
composition analysis : prediction of compositionally biased segments
Homologous regions are delineated
Removal of sequences non-homologous to query
Julie Thompson – IGBMC
Homology analysis with LEON
BlastP results :
Query sequence: DKK1_HUMAN
*
DKK1_HUMAN
Dickkopf related protein-1 precursor
1e-151
*
DKK3_MOUSE
Dickkopf related protein-3 precursor
8e-07
*
*
TXCA_CAEEX
Neurotoxic peptide caeron precursor.
0.007
PRK1_RAT
Prokineticin 1 precursor
0.021
*
VPRA_DENPO
Intestinal toxin 1 _MIT
0.10
Q8BKK7
MEGF11 protein.
0.10
*
COL_RABIT
Colipase precursor.
0.13
*
PRK2_HUMAN
Prokineticin 2 precursor
0.17
Q7XZ34
Growth factor _Fragment_.
0.17
*
1imt_
VENOM. MAMBA INTESTINAL TOXIN 1,
0.23
*
Q863H5
Bv8/prokineticin 2-like protein.
0.30
VE6_RHPV1
E6 protein.
1.1
COL_CANFA
Colipase precursor.
3.3
Q9Y7V5
Conidiospore surface protein.
3.3
COLA_HORSE
Procolipase A precursor _Fragment_.
4.3
O00508
Latent TGF-beta binding protein-4.
5.6
1pco_
LIPASE PROTEIN COFACTOR.
7.3
Q8SRF4
GTP binding protein.
7.3
NTC1_MOUSE
Neurogenic locus notch homolog
9.6
*
*
*
Julie Thompson – IGBMC
Homology analysis with LEON
dkk1
dkk2
dkk3
Prokinecitin/
Intestinal toxin
Lipase protein
cofactor
Pfam :
Dickkopf N-terminal domain
Colipase
Colipase C-terminal domain
Julie Thompson – IGBMC
Structural proteomics : target characterisation
Detection of structural homologs for targets in the SPINE
(Structural Proteomics in Europe) project
For a training set of 510 potential targets :
No. of targets with at least 1 PDB neighbour
BlastP (E<10-7)
142 (28%)
BlastP (E<10-4)
166 (33%)
PipeAlign (BlastP E<10)
196 (38%)
PipeAlign (PDB-Blast)
223 (44%)
Julie Thompson – IGBMC
Conserved residue analysis


Active site residues are under evolutionary pressure to maintain
their functional integrity and undergo fewer mutations than less
functionally important amino acids
Methods:

Evolutionary trace (Lichtarge et al, 1996): sequence conservation patterns in
homologous proteins are mapped onto the protein surface to generate clusters
identifying functional interfaces
Julie Thompson – IGBMC
Conserved residue analysis

Comparison of sequence-based methods
FRcons combines information :
• conservation at each site
• amino acid distribution
• predicted secondary structure (ss)
• predicted relative solvent accessibility (rsa)
Julie Thompson – IGBMC
FRcons: Fischer et al. Bioinformatics 2008
OrdAli : Ordered Alignment Analysis
color scheme

residues conserved in all sequences in family


structural or functional importance: characteristic motifs
residues conserved within a sub-group of sequences

discriminant residues
Julie Thompson – IGBMC
Schematic alignment of aspartyl-tRNA synthetases
• universal proteins, play a key role in traduction
180
200
220
240
260
280
300
320
Euc
Arc
Bac
Anticodon binding domain
340
360
380
400
420
P
440
460
L Q PQ KQ
480
500
520
540
560
R
Euc
Arc
Bac
Motif I
Flipping
loop
Motif II
Catalytic core I
690
710
730
750
770
790
810
Insertion domain
830
850
870
890
930
HG
Euc
Family conserved
Archaea+Bacteria
Archaea+Eukaryote
Arc
Bac
Motif III
Catalytic core II
Julie Thompson – IGBMC
PipeAlign: automatic protein analysis
BlastP search
Ballast Anchors
DbClustal Alignment
Query Sequence
RASCALED MACS
Multiple Alignment of Complete Sequences
Anchors
LMS (local maximum segments)
Homologous regions
Plewniak et al. (2000) Bioinformatics.
Thompson et al (2000) Nucl Acids Res.
Thompson et al. (2003) Bioinformatics.
Thompson et al (2004) Nucl Acids Res.
Thompson et al. (2001) J Mol Biol.
• Secator/DPC : automatic clustering algorithms
Wicker et al. (2001) Mol Biol Evol.
Wicker et al. (2002) Nucl Acids Res.
Phylogeny
Conserved residues/domains
2D/3D structure prediction
Cellular location prediction
…
Julie Thompson – IGBMC
http://www-igbmc.u-strasbg.fr/PipeAlign/
Julie Thompson – IGBMC
Multiple sequence alignment editors
No automatic method is 100% reliable - manual verification and refinement is essential!
SeqLab GCG Wisconsin Package
SeaView (Gaultier et al, 1996) http://pbil.univ-lyon1.fr/software/seaview.html
UNIX/Linux, Windows 95+, MAC OS 8,9,X
WEB servers :
GeneAlign (Kurukawa) http://www.gen-info.osaka-u.ac.jp/geneweb2/genealign/
Jalview (Clamp, 1998) http://www.ebi.ac.uk/~michele/jalview/
CINEMA (Lord et al, 2002) http://www.bioinf.man.ac.uk/dbbrowser/cinema-mx
Julie Thompson – IGBMC
Multiple Sequence Alignment


Introduction: what is a multiple alignment?
Multiple alignment construction




Multiple alignment analysis



Traditional approaches: optimal, progressive
Alignment parameters
Iterative and co-operative approaches
Conserved/homologous regions
Quality analysis/error detection
Multiple alignment applications
Julie Thompson – IGBMC
Central role of multiple alignments
euk
domain
structure
bac
arc
conserved, functional sites
Julie Thompson – IGBMC
Central role of multiple alignments
Comparative genomics
Phylogenetic studies
Hierarchical function annotation:
homologs, domains, motifs
Gene identification, validation
Multiple alignment
Structure comparison, modelling
Interaction networks
RNA sequence, structure, function
Human genetics, SNPs
Therapeutics, drug design
insertion domain
DBD
Therapeutics, drug discovery
LBD
Julie Thompson – IGBMC
binding sites / mutations
Example: protein, RNA complexes
:
ASP tRNA
synthetase
:
Comparative genomics
Phylogenetic studies
eukaryotic extension
Hierarchical
function annotation:
anticodon
binding
aspRS,
tRNA interactions :
euk
arc
bac
hinge region
euk
arc
bac
Gene identification, validation
euk
arc
bac
U
A
catalytic domain
Multiple alignment
GG A
U GUC
Structure comparison, modelling
GGUUC.A.UC
Interaction networks
RNA sequence, structure, function
amino acid acceptor stem
AspRS in complex with tRNAAsp
A
B
(Cavarelli et al, 1993) E
B
A
aspartate determinants are conserved in
Eprokaryotes and eukaryotes (Becker et al, 1996)
Human genetics, SNPs
Therapeutics, drug design
anticodon loop and stem
global alignment
cloverleaf representation
Julie Thompson – IGBMC
anticodon-binding
domain
Westhof et al, 1988
Ruff et al, 1991
Example: Bardet Biedl Syndrome
Phylogenetic studies
Comparative
Hierarchical
functiondisease,
annotation:
Identification
ofgenomics
newbased
genesanalysis
responsible
forA BBS
: Ba rare
genetic
A recessive
Multiple
alignment
identified
aE new
gene
with
a chaperonin-like
fold
E(BBS10)autosomic
B
eukaryotic extension
anticodon binding
probably caused by a defect at the basal body of ciliated cells
deletion
insertion 3
insertion 1 insertion 2
Phenotypes : obesity, retinopathy, polydactyly,
global alignment
BBS10retardation, hypogonadism, renal failure
mental
BBS6
Gene identification, validation
chaperonin
catalytic domain
anticodon-binding
domain
Multiple alignment
9 genes are known to be involved : BBS1 – BBS9
hinge region
Structure comparison, modelling
euk
arc
bac
In a comparative genomics study,
euk
arc
BBS10
shows688
a high
frequency of mutation (~20% of patients)
Li et al,
(2004)gene
identified
bac
Interaction networks
RNA sequence, structure, function
genes implicated in cilia andeuk
arc
flagella
bac
U
A
GG
A
U GUC
GGUUC.A.UC
Clinical studies have identified a
candidate chromosomic
region
Human genetics,
SNPs of
8Mb with approx. 23 genes
Therapeutics, drug design
• including 4 genes from set of 688
Julie Thompson – IGBMC
J. Muller et al 2006