Macromolecular Structure Database group

Download Report

Transcript Macromolecular Structure Database group

EMBL-EBI
MSDfold (SSM)
Secondary Structure Matching
A web service for protein structure
comparison and structure searches
Eugene Krissinel
http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html
EMBL-EBI
Structure alignment
Structure alignment
may be defined as
identification of
residues occupying
“equivalent”
geometrical positions
 Unlike in sequence alignment, residue type
is neglected
 Used for
 measuring the structural similarity
 protein classification and functional analysis
 database searches
EMBL-EBI
Methods
 Many methods are known:
 Distance matrix alignment (DALI, Holm & Sander, EBI)
 Vector alignment (VAST, Bryant et. al. NCBI)
 Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt,
Uppsala)
 Combinatorial extension (CE, Shindyalov & Bourne, SDSC)
 Dynamical programming on Ca (Gerstein & Levitt)
 Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford
University)
 many other
 SSM employs a 2-step procedure:
A Initial structure alignment and superposition using SSE graph matching
B Ca - alignment
EMBL-EBI
Graph representation of SSEs
E. M. Mitchell et al. (1990) J. Mol. Biol. 212:151
r2
a2
a1
r1
t
L
SSE graphs differ from conventional chemical graphs only in that
they are labelled by vectors of properties. In graph matching, the
labels are compared with tolerances chosen empirically.
EMBL-EBI
SSE graph matching
H1
A
B
S3
S2
S3
H4
H1
H2
H6
S1
S4
S2
S2
S1
H3 S
7
S6
S4
B
H1
H2
S1
H5
A
S5
H1
H2
S1
S3
S2
S4
S5
S3
S6
S4
S7
H2
H3
H4
H5
H6
Matching the SSE graphs yields a
correspondence between secondary
structure elements, that is, groups of
residues. The correspondence may
be used as initial guess for structure
superposition and alignment of
individual residues.
EMBL-EBI
Ca - alignment
 SSE-alignment is used as an initial guess for Ca-alignment
 Ca-alignment is an iterative procedure based on the expansion
of shortest contacts at best superposition of structures
chain A
chain B
matched helices
matched strands
 Ca-alignment is a compromise between the alignment length
Nalign and r.m.s.d. Longest contacts are unmapped in order to
maximise the Q-score:
2
Q
N align
 1 r.m.s.d . R0 2  N A N B
EMBL-EBI
Multiple structure alignment
 More than 2 structures are aligned
simultaneously
 Multiple alignment is not equal to the
set of all-to-all pairwise alignments
 Helps to identify common structure
motifs for a whole family of
structures
EMBL-EBI
Iterative removal of non-aligning SSEs
best pairwise alignments
Helices
may be multiply aligned from
pairwise relations
Strands
do not multiply align,
but one still can try to align them by probing
alternative (not best) alignments
C
A
B
EMBL-EBI
Iterative removal of non-aligning SSEs
4 alternative pairwise alignments
make up to 4 multiple alignments:
A1 - B1 - C1
1
1
A1 - B2 - C1
A2 - B1 - C1
2
1
A2 - B2 - C1
C
A
2
B

Complexity O i 1 ni
N
 prohibitive for
N  15  20 structures
EMBL-EBI
Iterative removal of non-aligning SSEs
Heuristics:
remove non-aligning SSE with
lowest alignment score
Qi   Qij
j
and reiterate all alignment
1
Start
1
2
Calculate all-to-all
pairwise alignments
1
C
A
2
Yes
B
Remove one nonaligning SSE with
lowest score
Are there nonaligning SSEs?
No
Quit
EMBL-EBI
Multiple Ca refinement
Central star & consensus

Multiple SSE
alignment
Initial Ca
alignment
Superpose structures
and calculate
consensus structure X
Yes
Score
improved?
No
Choose structure, closest
to X, as central star  and
align all the rest to 
A
X
B
C
Unmap groups of atoms with
highest distance score D in
order to maximise the score


2
Q  Nalign
1  D R0  N1N2
2
Quit
EMBL-EBI
Pairwise Alignment vs. Multiple Alignment
Best pairwise alignment of 1SAR:A and
1D1F:B includes only -sheet
Addition of 1MGW:A (close neighbour to
1SAR:A) spots out a common motif of sheet and a-helix
EMBL-EBI
SSM server map
http://www.ebi.ac.uk/msd-srv/ssm
EMBL-EBI
SSM output
 Table of matched Secondary Structure Elements
 Table of matched backbone Ca-atoms with distances
between them at best structure superposition
 Rotation-translation matrix of best structure
superposition
 Visualisation in Jmol and Rasmol
 r.m.s.d. of Ca-alignment
 Length of Ca-alignment Nalign
 Number of gaps in Ca-alignment
 Quality score Q
 Statistical significance scores P(S), Z
 Sequence identity
EMBL-EBI
Scoring at low structural similarity - 1KNO:A vs SCOP 1.61
Maximal Q-score
d1di2a_ (69 res)
Q-score 0.213
RMSD 2.43
Nalign
67/184
P
0.55
Lowest RMSD
d1emn_1 (43 res)
Q-score 0.019
RMSD 0.9
Nalign
13/184
P
0.075
Highest Nalign
d1elxb_ (449 res)
Q-score 0.02
RMSD 5.82
Nalign
89/184
P
~1
EMBL-EBI
Nqueries
Performance data
10
3
10
2
101
10
CPU used [secs]
total queries
jobs per query
4
0
10
5
10
4
10
3
delivery time per query
total CPU
CPU/delivery
102
10
1
10
0
50 s
1
10-1
0
200
400
600
800
Day from June 17, 2002
1000
1200