Transcript Lecture 8

8. Protein Docking
1
Prediction of protein-protein interactions
1. How do proteins interact?
2. Can we predict and manipulate those
interactions?
 Prediction of Structure – Docking
 Prediction of Binding
 Design – creation of new interactions
2
Docking vs. ab initio modeling
de novo Structure
Prediction (ROSETTA)
Sequence
ADEFFGKLSTKK…….
O
N
N
O
O
Monomers
+
O
N
...
Docking
(ROSETTADOCK)
N
...
O
O
N
Building Blocks:
backbone & side chains
CASP
Structure
Rigid body degrees of
freedom
3 translation
3 rotation
CAPRI
Complex
3
Protein-protein docking
Aim: predict the structure of a protein
complex from its partners
+
Rigid body degrees of freedom
3 translation
3 rotation
Monomers
Complex
4
Monomers change structure upon
binding to partner
+
=
Solution 1: Tolerate clashes
+
=
 Fast
↓ Weak discrimination of
correct solution
Solution 2: Model changes
+
=
↓ Slow
 Precise
5
Protein-protein docking
Sampling strategies
 Initial approaches: Techniques for fast detection of
shape complementarity
1. Fast Fourier Transform (FFT)
2. Geometric hashing
 Advanced high-resolution approaches: model
changes explicitly
3. Rosettadock
 Data-driven docking
4. Haddock
6
Find shape complementarity:
1. Fast Fourier Transform (FFT)
Ephraim Katzir
+
7
Find shape complementarity - FFT
Ephraim Katzir
8
Find shape complementarity:
Fast Fourier Transform (FFT)
Ephraim Katzir
Correlation
Test all possible positions of ligand and receptor:
• For each rotation of ligand
(R)
• evaluate all translations
(T) of ligand grid over
Y Translation
X Translation
receptor grid
z
= correlation product: can be calculated by FFT
9
Find shape complementarity:
Fast Fourier Transform (FFT)
Ephraim Katzir
R
Discretize
R
R
Fast Fourier Transform
A=DFT(a)
Computational cost: N3logN3
(instead of N6)
L
Rotate
Surface
1
L
Discretize
Interior
<0 for R
>0 for L
Correlation function
C=A*B
S=iDFT(C)
L
Fast Fourier
Transform
B=DFT(b)
10
From http://zlab.bu.edu/~rong/be703/
Find shape complementarity:
Fast Fourier Transform (FFT)
Correlation
Increase the speed by 107
IFFT
R
L
Y Translation
Surface
X Translation
Interior
Binding Site
11
From http://zlab.bu.edu/~rong/be703/
Some FFT-based docking protocols
•
•
•
•
•
•
Zdock (Weng)
Cluspro (Vajda, Camacho)
PIPER (Vajda, Kozakov)
Molfit (Eisenstein)
DOT (TenEyck)
HEX (Ritchie) – FFT in
rotation space
12
Shape complementarity:
2. Geometric hashing
(patchdock, Wolfson & Nussinov)
 Matching of puzzle
pieces
1. Define geometric
patches (concave,
convex, flat)
2. Surface patch matching
3. Filtering and scoring
13
From http://bioinfo3d.cs.tau.ac.il/PatchDock/patchdock.html
Hashing: alpha shapes
• Formalizes the idea of “shape”
• In 2D an “edge” between two points is “alpha-exposed”
if there exists a circle of radius alpha such that the two
points lie on the surface of the circle and the circle
contains no other points from the point set
14
Hashing – sparse surface representation
15
Slide from Jens Meiler
Docking with geometric hashing
PATCHDOCK
• Fast and versatile approach
• Speed allows easy extension to multiple
protein docking, flexible hinge docking, etc
• A extension of this protocol, FIREDOCK,
includes side chain optimization (RosettaDocklike) – very flexible, fast and accurate protocol
16
High-resolution docking with
Rosetta: Rosettadock
Random
Start
Random Start
Position
Position
Low-Resolution
Monte Carlo Search
Filters
High-Resolution
Refinement
Clustering
Predictions
105
17
Choosing starting orientations
1. Global search


Random Translation
Random Rotation (Euler Angles)
1.
2.
3.
Tilt direction [0..360o]
Tilt angle [0:90o]
Spin angle [0..360o]
• Euler angles are
independent and
guarantee non-biased
search
18
Choosing starting orientations
2. Local Refinement


Translation 3Å normal, 8Å parallel
Rotation 80
1.
2.
3.
Tilt direction [0±8o]
Tilt angle
Spin angle
19
Overview of docking algorithm
Random Start
Position
Low-Resolution
Monte Carlo Search
Filters
High-Resolution
Refinement
Clustering
Predictions
105
20
Low-resolution search
1.
2.
3.
4.
Perturbation
Monte Carlo search
Rigid body translations and rotations
Residue-scale interaction potentials
Protein representation:
backbone atoms + average centroids
O
N
...
N
O
O
 Mimics physical
O
N
N
...
diffusion process
O
21
O
Residue-scale scoring
Score
Representation
Physical Force
rcentroid-centroid < 6 Å
Attractive
van der Waals
Bumps
(r – Rij)2
Repulsive
van der Waals
Residue environment
-ln(Penv)
Solvation
-ln(Pij)
Hydrogen bonding
electrostatics,
solvation
-1 for interface residues
in Antibody CDR
(bioinformatic)
varies
(biochemical)
Contacts
Residue pair
Alignment
Constraints
22
Overview of docking algorithm
Random Start
Position
Low-Resolution
Monte Carlo Search
Filters
HighResolution
Refinement
Clustering
Predictions
105
23
High resolution optimization:
Monte Carlo with Minimization (MCM)
Cycles of iterative optimization
Random
perturbation
Side chain
optimization
Random
perturbation
Side chain
optimization
Rigid body
minimization
Rigid body
minimization
START
Energy
MC
FINISH
Rigid body orientations
24
Overview of docking algorithm
Random Start
Position
Low-Resolution
Monte Carlo Search
Filters
High-Resolution
Refinement
Clustering
Predictions
105
25
Filters
 Low resolution
• Antibody profiles
• Antigen binding residues at
interface
• Contact filters
• Biological information
• Interface residues
• Interacting residue pair
 High resolution
• Energy filters speed up
creation of low energy
models
Random perturbation
Monte-Carlo (MC) optimization
Minimization of rigid body
orientation
5 cycles of MC
optimization
45 cycles of MC
optimization
Final scoring
Filter1
Filter2
Filter3
26
Overview of docking algorithm
Random Start
Position
Low-Resolution
Monte Carlo Search
Filters
High-Resolution
Refinement
Clustering
Predictions
105
27
Clustering
• Compare all top-scoring decoys pairwise
• Cluster decoys
hierarchically
• Decoys within e.g. 2.5Å form a cluster
Represents
ENTROPY 28
Assessment 1: Benchmark studies
Benchmark set contains 54 targets for which
bound and unbound structures are known
http://zlab.bu.edu/zdock/benchmark.shtml
• Bound-Bound
– Start with bound complex
structure, but remove the
side chain configurations so
they must be predicted
trypsin + inhibitor
barnase + barstar
• Unbound-Unbound
– Start with the individuallycrystallized component
proteins in their unbound
conformation
• Bound-Unbound (Semibound)
lysozyme + antibodies
29
Assessment of method on benchmark
(54 proteins, Gray et al., 2003)
 funnel - 3/5 top-scoring models
within 5A rmsd
 Overall
performance
Bound Docking
Perturbation1
42/54
Unbound Docking
Perturbation2
32/54
Unbound Docking
Global3
1.
2.
3.
……..
More than three of top five decoys (by score) that have rmsd less than 5 Å
More than three of top five decoys (by score) that predict more than 25% native residue contacts
The rank of the first cluster with >25% native residue contacts
28/32
30
Δ score (calculated)
Score and performance are
correlated with binding affinity
-log Ka (experimental)  targets with funnels
 targets without funnels
Δ score for bound backbone docking
31
Limitation of “rotamer-based” modeling
Near-native model with clash
Trp 172
Non-native model without clash
Trp 215
Orange and red: native complex; Blue: docking model.
PDB code: 1CHO
32
Improved side chain modeling at interface
Minimization
Rtmin: rotamer trial with
minimization
•
•
•
Rot I
•
Rot II
Native
Randomly pick one residue.
Screen a list of rotamers.
Minimize each of these
rotamers.
Accept the one that yields the
lowest energy.
Additional rotamers
• Include free side chain
conformation in rotamer
library
Wang, OSF & Baker,33
2005
RosettaDock simulation
 1 model/simulation:
energy vs RMSD
 Final model selected
based on energy
(and/or sample
density)
Energy
(structural similarity to starting
model)
Rigid body orientations:
RMSD to arbitrary starting structure (Å)
34
RosettaDock simulation
2. Refinement
Energy
1. Initial Search
(Å)
RMSD to arbitrary starting structure
RMSD to starting structure of refinement
35
Side chain flexibility is important
CAPRI Target 12
Cohesin-Dockerin
 0.27Å interface rmsd
 87% native contacts
 6% wrong contacts
 Overall rank 1
Dockerin
Cohesin
red,orange– xray
blue – model;
green – unbound
Carvalho et. al (2003)36PNAS
Details of T12 interface
Dockerin
R53
S45
D39
L22
Y74
N37
L83
E86
Cohesin
red,orange– xray
blue - model
37
Similar landscapes for different
Rosetta predictions
Docking
Folding
Energy
function describes
well principles
energy landscape
energy landscape
underlying the correct structure of monomers
and complexes
Phil Bradley
38
Schueler-Furman et. al (2005) Science
A Challenging Target RF1-HEMK (T20)
Challenge:
• Large complex
• RF1 to be modeled from RF2
• Disordered Q-loop
RF1
RF1
Q-loop
Q-loop
Q252
loop1 Q252
loop1
loop1
Q-loop
loop2
loop2
Q-235
Q-235
Q-235
loop2
HemK
HemK
Hope:
• Q235 methylated
• A Gln analog in HemK crystal
Strategy:
• Trimming – Docking – Loop
Modeling - Refining
Keys to success: Location of interface with truncated protein
Separate modeling of large conformational change in key loop
39
Prediction of large conformational change
Q-loop
Gln235
I_rmsd 2.34 Ǻ
F_nat 34.2%
GLN235 C atom shift:14.13Ǻ to 3.91 Ǻ
Q-loop global C rmsd: 11.8 Ǻ to 4.8 Ǻ
Red, orange – bound; Green,– unbound; Blue -- model
40
Docking with backbone minimization
1
C
N
1’
C
random
perturbation
2SNI
Interface energy
Fold
tree
N
Red: bound rigid
Green: unbound rigid
Blue: unbound flexible
Interface RMSD
# of “hits” in top 10 models
repack
10
9
START
8
7
6
Rigid-body
Backbone
Sidechain
minimization
5
4
FINISH
Docking Monte Carlo Minimization (MCM)
3
2
1
0
1DFJ
1DQJ
1FSS
1GLA
1UGH
1WQ1
41
2SNI
Docking with loop minimization
Fold-tree
N
N
1
2
x
1’
2’
C
C
Minimize rigid-body and loop simultaneously
Flexible Docking
All-atom energy
Correctly predicted loop conformation
Interface RMSD
Red, orange – bound (1T6G, Sansen, S. et al, J.B.C.(2004));
Blue – model; Green – unbound (1UKR, Krengel U. et al, JMB (1996))42
Docking with loop rebuilding
1BTH
All-atom energy
Bound rigid
Ligand RMSD
unbound rigid
unbound flexible loop
43
Flexible backbone protein–protein
docking using ensembles
• Incorporate backbone
flexibility by using a set
of different templates
• Generation of set of
ensembles: with
Rosetta relax protocol,
from NMR ensembles,
etc
44
Chaudhury & Gray, (2008)
Sampling among conformers during docking
• Exchange between templates during protocol
45
Evaluation of 4 different protocols
1. key-lock (KL) model
rigid-backbone docking
2. conformer selection (CS)
model
ensemble docking algorithm
• Can teach us about the
possible binding
mechanism (e.g.
induced fit vs key-lock)
3. induced fit (IF) model
energy-gradient-based
backbone minimization
4. combined conformer
selection/induced fit
(CS/IF) model
Brown: high-quality decoys
Orange: medium-quality decoys
46
RosettaDock - summary
• First program to introduce general (side chain)
flexibility during docking
• Advanced the docking field towards unbiased
high-resolution modeling
• Many other protocols have since then
incorporated RosettaDock as a high-resolution
final step
• Targeted introduction of backbone flexibility
can improve modeling dramatically
47
4. Data-driven docking
• Challenges:
– Large conformational space to sample
– Conformational changes of proteins upon binding
• Approach: restrict search space by previous
information
– HADDOCK (High Ambiguity Driven protein-protein
Docking)
48
Scheme of Haddock Bonvin, JACS 2003
• Information about complex can be retrieved
from several sources
49
http://www.nmr.chem.uu.nl/haddock/
Haddock computational scheme
1. Derive Ambiguous
Interaction Restraints
(AIRs):
– Active residues: involved in
interaction, and solvent
accessible
– Passive residues: neighbors
of active residues
2. Create CNS restraints file
(Used in NMR structure
determination)
Rational:
• Include AIRs in energy
function
• find protein complex
structure with minimum
energy
Similar to
– solving a structure by NMR
– Homology modeling with
constraints (e.g. Modeler)
50
Overview of Haddock
Start Position
Rigid body energy minimization:
1. rotational minimization
2. rotational & translational
• Align molecules if anisotropic data is available
•
Satisfy maximum number of AIC
•
Retain top200
Predictions
Semi-flexible simulated annealing (SA)
•High temperature rigid body search
•Rigid body SA
•Semi-flexible SA with flexible side-chains at the interface
•Semi-flexible SA with fully flexible interface (both backbone and side-chains)
Flexible explicit solvent refinement
•Improves energy ranking
Clustering
51
Docking – Summary & Outlook
• Efficient search using
– fast sampling techniques (e.g. FFT, Geometric hashing), or/and
– Restraints to relevant region (e.g. biological constraints, etc)
• Challenge: conformational changes in the partners
• Introduction of flexibility has improved modeling to high
resolution
– Full side chain flexibility (Rosetta)
– Targeted introduction of backbone flexibility
• Larger changes can be incorporated using techniques such as
Normal Mode Analysis
52