Structure of Cyclic Peptide LY2241108

Download Report

Transcript Structure of Cyclic Peptide LY2241108

EMBO-Course: “Methods for Protein Simulation & Drug
Design.” Shanghai, China, September 13-24, 2004.
Docking & Scoring
Qi Chen
Eli Lilly & Company
Indianapolis, Indiana
U.S.A.
[email protected]
Copyright © 2004 Eli Lilly and Company
Outline
 Introduction
 Docking Methods

Representation of receptor binding site and ligand

Sampling of configuration space of the ligand-receptor complex
 Scoring Methods

Free energy, binding affinity, and docking scores

Scoring functions, consensus scoring, and others
 Docking Software

Existing software

DOCK, FlexX, GOLD, AutoDock, LUDI, Glide, FRED, CDOCKER
 Accuracy, Applications, and Successes
September 21, 2004
Copyright © 2004 Eli Lilly and Company
2
What Are Docking & Scoring?
 To place a ligand (small molecule) into the binding site of a
receptor in the manners appropriate for optimal interactions
with a receptor.
 To evaluate the ligand-receptor interactions in a way that may
discriminate the experimentally observed mode from others
and estimate the binding affinity.
complex
ligand
docking
scoring
receptor
X-ray structure
& DG
… etc
September 21, 2004
Copyright © 2004 Eli Lilly and Company
3
Why Do We Do Docking?
 Drug discovery costs are too high: ~$800 millions, 8~14 years,
~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon
2004)
 Drugs interact with their receptors in a highly specific and
complementary manner.
 Core of the target-based structure-based drug design (SBDD)
for lead generation and optimization.
Lead is a compound that

shows biological activity,
 is novel, and
 has the potential of being structurally modified for improved bioactivity,
selectivity, and drugeability.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
4
Three Components of Docking
pre- and/or
during docking:
Representation of receptor
binding site and ligand
during docking:
Sampling of configuration space
of the ligand-receptor complex
during docking
and scoring:
Evaluation of ligand-receptor
interactions
September 21, 2004
Copyright © 2004 Eli Lilly and Company
5
Receptor Structures & Binding Site
Descriptions
 PDB (Protein Data Bank, www.rcsb.org/pdb/) containing proteins or
enzymes:
X-ray crystal: >12,000 structures, 788 have ≤ 1.5 Å, 9,390 between 1.5-2.5 Å
 NMR: >450 structures, ensemble accuracy of 0.4-1 Å in the backbone region, 1.5
Å in average side chain position (Billeter 1992; Clore et al. 1993)
 (and high quality homology models built from highly similar sequences)

 Limitation of experimental structures (Davis et al. 2003):

Locations of hydrogen atoms, water molecules, and metal ions
 Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn & Gln, and
N/C of His incorrectly assigned in PDB; up to 0.5 Å uncertainty in position)
 Conformational flexibility of proteins
 Binding site descriptions: atomic coordinates, surface,
volume, points & distances, bond vectors, grid and
various properties such as electrostatic potential,
hydrophobic moment, polar, nonpolar, atom types, etc.
DOCK
September 21, 2004
Copyright © 2004 Eli Lilly and Company
6
Drug, Chemical & Structural Space
 Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (Comprehensive
Medicinal Chemistry) >8,600 entries
 Non-drug-like: ACD (Available Chemicals Directory) ~3 million entries
 Literatures and databases, Beilstein (>8 million compounds), CAS & SciFinder
 CSD (Cambridge Structural Database, www.ccdc.cam.ac.uk): ~3 million X-ray crystal
structures for >264,000 different compounds and >128,00 organic structures
 Available compounds

Available without exclusivity: various vendors (& ACD)

Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma,
ChemExplorer, etc.
 Corporate databases: a few millions in large pharma companies
September 21, 2004
Copyright © 2004 Eli Lilly and Company
7
3D Structural Information & Ligand
Descriptions
 2D->3D software: CORINA, OMEGA, CONCORD, MM2/3,
WIZARD, COBRA. (reviewed by Robertson et al. 2001)
 CSD: <0.1 Å for small molecules, but may not be the bound
conformation in the receptor
 PDB: ligand-bound protein structures ~6000 entries
 Atoms associated with inter-atom distances, physical and
chemical properties, types, charges, pharmacophore, etc
 Flexibility: conformation ensemble, fragment-based
September 21, 2004
Copyright © 2004 Eli Lilly and Company
8
Sampling of Configuration Space of
The Ligand-Receptor Complex
 Descriptor-matching: using pattern-recognizing geometric methods to
match ligand and receptor site descriptors

geometric, chemical, pharmacophore properties, such as distance pairs, triplet,
volume, vector, hydrogen-bond, hydrophobic, charged, etc.
 Molecular simulation: MD (molecular dynamics), MC (Monte Carlo)
 Others: GA (genetic algorithm), similarity, fragment-based
 Challenges

Complete conformation and configuration space of ligand and receptor complex
are too large.
 Conformational flexibility of both ligand and receptor can’t be ignored.
 Shape-matching: No ‘best’ method and general solutions for describing and
matching molecular shape of irregular objects (Ullman 1976; Salomaa 1991).
Shape alone is not sufficient descriptor to identify low-energy conformations of a
ligand-receptor complex (Jorgensen 1991).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
9
Descriptor Matching Methods: DOCK
 Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances
between sphere centers and distances between ligand heavy atoms
September 21, 2004
Copyright © 2004 Eli Lilly and Company
10
Descriptor Matching Methods
 Distance-compatibility graph in DOCK (Ewing and Kuntz 1997): distances
between sphere centers and distances between ligand heavy atoms
 Interaction site matching in LUDI (Boehm 1992): HBA<->HBD, HYP<->HYP
 Pose clustering and triplet matching in FlexX (Rarey et al. 1996): HBA<>HBD, HYP<->HYP
 Shape-matching in FRED (Openeye www.eyesopen.com)
 Vector matching in CAVEAT (Lauri and Bartlett 1994)
 Steric effects-matching in CLIX (Lawrence and Davis 1992)
 Shape chemical complementarity in SANDOCK (Burkhard et al. 1998)
 Surface complementarity in LIGIN: (Sobolev et al. 1996)
 H-bond matching in ADAM (Mizutani et al. 1994)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
11
Fragment-based Methods
 Flexibility and/or de novo design
 Identification and placement of the base/anchor fragment are very important
 Energy optimization (during or post-docking) is important
 Examples
Incremental
construction in FlexX with triplet matching and pose clustering to maximize the
number of favorable interactions
Growing
and/or joining in LUDI from pre-built fragment and linker libraries and maximize Hbond and hydrophobic interactions
Anchor-based
September 21, 2004
fragment joining in DOCK
Copyright © 2004 Eli Lilly and Company
12
Molecular Simulation: MD & MC
 Two major components:

The description of the degrees of freedom
 The energy evaluation
 The local movement of the atoms is performed

Due to the forces present at each step in MD (Molecular Dynamics)
 Randomly in MC (Monte Carlo)
 Usually time consuming:

Search from a starting orientation to low-energy configuration
 Several simulations with different starting orientation must be performed to get a
statistically significant result
 Grid for energy calculation. Larger steps or multiple starting poses
are often used for speed and sampling coverage in MD:

Di Nola et al. 1994; Mangoni et al. 1999; Pak & Wang 2000; CDOCKER by Wu et
al. 2003.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
13
MC-based Docking
 E ( B )  E ( A) 

P  exp  
k BT


where T is reduced based on a so-called cooling schedule, and grid can be used for
energy calculation.
 An advantage of the MC technique compared with gradient-based
methods (e.g. MD) is that a simple energy function can be used
which does not require derivative information, and able to step over
energy barrier.
 AutoDOCK (Goodsell & Olson 1990). MCDOCK (Liu & Wang 1999),
PRODOCK (Trosset & Scheraga 1999), ICM (Abagyan et al. 1994).
 Simulated annealing is used in DockVision (Hart & Read 1992) and
Affinity (Accelrys Inc., San Diego, CA)
 Energy minimization is used in QXP (McMartin & Bohacek 1997).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
14
Genetic Algorithm Docking
 A fitness function is used to decide which individuals (configurations)
survive and produce offspring for the next iteration of optimization.
Degrees of freedom are encoded into genes or binary strings.
 The collection of genes (chromosome) is assigned a fitness based
on a scoring function. There are three genetic operators:

mutation operator randomly changes the value of a gene;
 crossover exchanges a set of genes from one parent chromosome to another;
 migration moves individual genes from one sub-population to another.
 Requires the generation of an initial population where conventional
MC and MD require a single starting structure in their standard
implementation.
 GOLD (Jones et al. 1997); AutoDock 3.0 (Morris et al. 1998); DIVALI
(Clark & Ajay 1995).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
15
Multiple Method Approach

systematic search
conformations
rigid DOCK
minimization
MD/SA
(Wang et al. 1999)

initial poses
filters
finer docking
final scoring
(FRED, GLIDE, DOCK)
 Similarity-guided MD simulated annealing to improve accuracy
(Wu & Vieth 2004).
 Shape similarity & clustering to speed up conformational
search in docking (Makino & Kuntz 1998).
Better input or constrains for the existing docking engines
September 21, 2004
Copyright © 2004 Eli Lilly and Company
16
Scoring Functions
 A fast and simplified estimation of binding energies
scores <-> DGbinding
DGbinding   RT lnK affinity 
-scores
 DGcomplex / solv  DGligand / solv  DG protein / solv  DGinteraction  TDS  D
X-ray
structure
?
configurations of the complex
September 21, 2004
Copyright © 2004 Eli Lilly and Company
17
Types of Scoring Functions
 Force field based: nonbonded interaction terms as the score,
sometimes in combination with solvation terms
 Empirical: multivariate regression methods to fit coefficients of physically
motivated structural functions by using a training set of ligand-receptor
complexes with measured binding affinity
 Knowledge-based: statistical atom pair potentials derived from
structural databases as the score
 Other: scores and/or filters based on chemical properties,
pharmacophore, contact, shape complementary
 Consensus scoring functions approach
September 21, 2004
Copyright © 2004 Eli Lilly and Company
18
Force Field Based Scoring Functions
 Aij Bij
qi q j

E   a  b  332

Drij
rij
i 1 j 1  rij
lig rec




e.g. AMBER FF in DOCK
 Advantages

FF terms are well studied and have some physical basis

Transferable, and fast when used on a pre-computed grid
 Disadvantages

Only parts of the relevant energies, i.e., potential energies & sometimes
enhanced by solvation or entropy terms

Electrostatics often overestimated, leading to systematic problems in
ranking complexes
September 21, 2004
Copyright © 2004 Eli Lilly and Company
19
FF Scoring: Implementations
 AMBER FF: DOCK, FLOG, AutoDOCK
 CHARMm FF: CDOCK, MC-approach (Caflisch et al. 1997)
 Potential Grid: rigid receptor structure upon docking. The grid-based score
interpolates from eight surrounding grid points only. 100-fold speed up. Examples:
DOCK, CDOCK, and many other docking programs.
 Soften VDW: A soft-core vdw potential is needed for the kinetic accessibility of the
binding site (Vieth et al. 1998). FLOG: 6-9 Lennard-Jones function; GOLD: 4-8 vdw +
H-bond, and intraligand energy.
 Solvent Effect on Electrostatic: often approximated by rescaling the in vacuo
coulomb interactions by 1/D, where D = 1-80 or = n*r, n = 1-4, r = distance.
 Solvation and Entropy Terms: Solvation terms decomposed into nonpolar and
electrostatic contributions (e.g., DOCK):
Ebind  Enonbond  Esolv ,elec  Esolv ,np
September 21, 2004
Copyright © 2004 Eli Lilly and Company
20
Empirical Scoring Functions
DG  DG0  DGrot N rot  DGHB  neutral_ Hbondsf DR, D 
 DGio  ionic _ int f DR, D   DGaro  aro _ int f DR, D 
 DGlipo  lipo.contf DR, D 
LUDI & FlexX
(Boehm 1994)
 Goals: reproduce the experimental values of binding energies and with its global
minimum directed to the X-ray crystal structure
 Advantages: fast & direct estimation of binding affinity
 Disadvantages
 Only
a few complexes with both accurate structures & binding energies known
 Discrepancy
in the binding affinities measured from different labs
 Heavy
dependence on the placement of hydrogen atoms
 Heavy
dependence of transferability on the training set
 No
effective penalty term for bad structures
September 21, 2004
Copyright © 2004 Eli Lilly and Company
21
Empirical Scoring: Implementations
Mostly differ by what training set and how many parameters are used
 Cerius2/Insight2000: LUDI, ChemScore, PLP, LigScore
 SYBYL: FlexX, F-Score
 Hammerhead: 17 parameters for hydrophobic, polar complementary, entropy,
solvation. sLOO = 1.0 logK for 34 complexes
 VALIDATE: 8 parameters for VDW and Coulomb interactions, surface
complementarity, lipophilicity, conformational entropy and enthalpy, lipophilic and
hydrophilic complementarity between receptor and ligand surfaces
 PRO_LEADS: 5 coefficients for lipophilic, metal-binding, H-bond, and a flexibility
penalty term. sLOO = 2 kcal/mol for 82 complexes
 SCORE (Tao & Lai, 2001); ChemScore (GOLD)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
22
Knowledge-based Potentials of Mean
Force Scoring Functions (PMF)
 Assumptions

An observed crystallographic complex represents the optimum placement of the
ligand atoms relative to the receptor atoms

The Boltzmann hypothesis converts the frequencies of finding atom A of the ligand
at a distance r from atom B of the receptor into an effective interaction energy
between A and B as a function of r
 Advantages

Similar to empirical, but more general (much more distance data than binding
energy data)
 Disadvantages

The Boltzmann hypothesis originates from the statistics of a spatially uniform
liquid, while receptor-ligand complex is a two-component non-uniform medium

PMF are typically pair-wise, while the probability to find atoms A and B at a
distance r is non-pairwise and depends also on surrounding atoms
September 21, 2004
Copyright © 2004 Eli Lilly and Company
23
PMF: Implementations
 Verkhivker et al.(1995): 12 atom pairs, 30 complexes (HIV-1 and simian
immunodeficiency virus). Test on 7 other HIV-1 protease complexes
 Wallqvist et al. (1995): 38 complexes, 21 atom types (10 C, 5 O, 5 N,
1 S). Test on 8 complexes sd=1.5 kcal/mol, and 20 complexes rmsd=1.0 A.
DG pred    ij    ln Pij   
i
j
 Muegge et al. (1999): 697 complexes, 16 atom types from receptor &
34 from ligand, 282 statistically significant PMF interactions. Test on 77
diverse compounds: sd=1.8 log Ki. The PMF was combined with a vdw term to
account for short-range interactions for DOCK4 docking:
PMF _ score 
 A r 
ij
ij
kl , r  rcutoff
ij
 j
r 
 seg
where Aij r   k B T ln  f Vol _ corr r  ij 
 bulk 

 DrugScore (Gohlke et al, 2000), FlexX, BLEEP
September 21, 2004
Copyright © 2004 Eli Lilly and Company
24
Consensus Scoring and Others
 Too many scoring functions, none prevails in terms of predictivity
 Combined approach: one scoring function to sample configuration
space, the other(s) to optimize and/or score:

2 docking methods & 13 scoring functions to significantly reduces false positive
rate (Charifson et al. 1999)

Postprocessing of docking results with a filter function followed by re-scoring
(Stahl & Bohm 1998)

ADAM, FlexX, Hammerhead

SYBYL Cscore (Tripos) : FlexX, PMF, DOCK energy, GOLD score

C2 (Accelrys) : LigScore2, PLP, PMF, Ludi, Jain

FRED (OpenEye) : ChemScore, PB-SA, ChemGauss, PLP, ScreenScore

DOCK: AMBER FF, PMF, contact scores, ChemScore
Reduce false positives!
September 21, 2004
Copyright © 2004 Eli Lilly and Company
25
An Example of Combined Empirical and
Knowledge-based Approach
 Procedure
1.
2.
3.
4.
Knowledge-based potentials
FA, B (ri , j )
Optimize the ligand position with the scoring function DG 
i, j
Fit the scores to experimental values
Re-optimize ligand positions iteratively until the ligand positions and calibrated
parameters have finally converged.

 Scoring function: 7 atom types (1 C, 4 O, 2 N), cutoff 7 A, 2000 complexes,
rmsd<2A, no metal ions, 164 binding energies, sd =2.1 kcal/mol, rmsd=0.49A
PA, B (r ) 
n A, B ( r )
r
2
 FA, B (r )
)
& PA, B (r )  exp(
T
FA,B (r )  T log(PA,B (r))  C
 Validation: 36 rigid complexes, AlgoDock, FlexX, Gold, Dock, rmsd 0.74-1.68A; 25
known binding energies: sd = 2.0 kcal/mol
Muryshev et al. 2003
September 21, 2004
Copyright © 2004 Eli Lilly and Company
26
Docking Software
DOCK: (Kuntz et al. 1982)
DOCK 4.0 (Ewing & Kuntz 1997)
AutoDOCK (Goodsell & Olson 1990)
AutoDOCK 3.0 (Morris et al. 1998)
GOLD (Jones et al. 1997)
FlexX: (Rarey et al. 1996)
GLIDE: (Friesner et al. 2004)
ADAM (Mizutani et al. 1994)
CDOCKER (Wu et al. 2003)
CombiDOCK (Sun et al. 1998)
DIVALI (Clark & Ajay 1995)
DockVision (Hart & Read 1992)
FLOG (Miller et al. 1994)
GEMDOCK (Yang & Chen 2004)
Hammerhead (Welch et al. 1996)
LIBDOCK (Diller & Merz 2001)
MCDOCK (Liu & Wang 1999)
PRO_LEADS (Baxter et al. 1998)
September 21, 2004
SDOCKER (Wu et al. 2004)
QXP (McMartin & Bohacek 1997)
Validate (Head et al. 1996)
 de novo design tools
LUDI (Boehm 1992),
BUILDER (Roe & Kuntz 1995)
SMOG (DeWitte et al. 1997)
CONCEPTS (Pearlman & Murcko 1996)
DLD/MCSS (Stultz & Karplus 2000)
Genstar (Rotstein & Murcko 1993)
Group-Build (Rotstein & Murcko 1993)
Grow (Moon & Howe 1991)
HOOK (Eisen et al. 1994)
Legend (Nishibata & Itai 1993)
MCDNLG (Gehlhaar et al. 1995)
SPROUT (Gillet et al. 1993)
Copyright © 2004 Eli Lilly and Company
27
Docking Software: Important Factors
 Sensitivity on and transferability of the parameters, including the
starting conformation
 Adaptability to additional scoring functions, pre- and/or post- docking
processing and filters
 Ability for iteratively refining docking parameter/protocol based on
new results
 Design, components, and results of validation studies
 Speed, user interface & control, I/O, structural file formats
 User learning curve, customer supports, and cost
 Code availability and upgrading possibility
September 21, 2004
Copyright © 2004 Eli Lilly and Company
28
DOCK (Kuntz, UCSF)
Receptor Structure
• X-ray crystal
• NMR
• homology
Binding Site
Molecular Surface
of Binding Site
Binding Mode Analysis for
Lead Optimization: binding
orientations and scores for each
ligands
Virtual Screening for
MTS/HTS and Library
Design: ligands in the order
of their best scores
Scoring Orientations
1. Energy scoring (vdw and electrostatic)
2. Contact scoring (shape complementarity)
3. Chemical scoring
4. Solvation terms
Filters
Spheres describing the
shape of binding site and
favorable locations of
potential ligand atoms
September 21, 2004
Ligands
Matching heavy atoms of
ligands to centers of
spheres to generate thousands
of binding orientations
Copyright © 2004 Eli Lilly and Company
• 3D structure
• atomic charges
• potentials
• labeling
29
DOCK: Conformational Flexibility
 Torsion-drive and anchor-based options (DOCK4.0)
 GA to generate ligand conformations inside the binding site (Oshiro
et al. 1995)
 A ligand anchor fragment is selected and placed in the receptor,
followed by rigid body simplex minimization (Makino & Kuntz 1997)
 Ensembles: ~300 conformations are created with the rigid part
superimposed. DOCK applied to the rigid part and all conformation
were tested for overlap and scored. (Lorber & Shoichet 1998)
 Multiple random ligand conformations (Ewing et al. 2001)
 Ensemble of protein structures (Knegtel et al. 1997)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
30
FlexX (Tripos/SYBYL)
 Fragment-based, descriptor matching, empirical scoring (Rarey et al.
1996)
 Procedures:

Select a small set of base fragment suitable for placement using a simple scoring
function.
 Place base fragments with the pose clustering algorithm: rigid, triplet matching of
H-bond & hydrophobic interactions, Bohm's scoring function
 Build up the remainder of the ligand incrementally from other fragments
 Ligand conformations

MIMUMBA model with CSD derived low energy torsional angles for each rotatable
bond and ring from CORINA.
 Multiple conformations for each fragment in the ligand building steps
 Other works: Explicit waters are placed into binding site during the docking
procedure using pre-computed water positions(Rarey et al. 1999). Receptor flexibility
using discrete alternative protein conformations (Claussen et al. 2001; Claussen &
Hindle 2003)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
31
GOLD
 GA method, H-bond matching, FF scoring (Jones et al. 1997)




A configuration is represented by two bit strings:
1. The conformation of the ligand and the protein defined by the torsions;
2. A mapping between H-bond partners in the protein and the ligand.
For fitness evaluation, a 3D structure is created from the chromosome
representation. The H-bond atoms are then superimposed to H-bond site points
in the receptor site.
Fitness (scoring) function: H-bond, the ligand internal energy, the protein-ligand
van der Waals energy
Rotational flexibility for selected receptor hydrogens along with full ligand
flexibility
 Highlights:



Validation test set: 100 complexes, 66 with rmsd<2A.
The structure generation is biased towards inter-molecular H-bonds.
Hydrophobic fitting points was added (GOLD 1.2, CCDC, Cambridge, UK 2001).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
32
AutoDock & AutoDock 3.0
 Early implementation: MC simulated annealing, AMBER FF-based
energy grid, flexible ligands (Goodsell & Olson 1990)
 AutoDock 3.0: GA as a global optimizer combined with energy
minimization as a local search method, flexible ligand, rigid protein as
represented in a grid (Morris et al. 1998)
 The fitness function:

a Lennard-Jones 12-6 dispersion/repulsion term

a directional 12-10 hydrogen bond term

a coulombic electrostatic potential

a term proportional to the number of sp3 bonds in the ligand to represent
unfavorable entropy of ligand binding

a desolvation term
September 21, 2004
Copyright © 2004 Eli Lilly and Company
33
LUDI: Matching polar and hydrophobic groups
 Calculate protein and ligand interaction sites (H-bond or
hydrophobic), which are defined by centers and surface, from

non-bonded contact distributions based on a search through the CSD,
 a set of geometric rules,
 the output from the program GRID (Goodford 1985) which calculates binding
energies for a given probe with a receptor molecule.
 Fit fragments onto the interaction sites.

distance between interaction sites on the receptor
 an RMSD superposition algorithm,
 A hashing scheme to access and match surface triangles onto a triangle query of a
ligand interaction center.
 A list-merging algorithm creates all triangles based on lists of fitting triangle edges
for two of the three query triangle edges.
 Join/grow fragments using the databases of fragments and the same
fitting algorithm.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
34
GLIDE (www.schrodinger.com)
 Funnel: site point search -> diameter test -> subset test -> greedy score ->
refinement -> grid-based energy optimization -> GlideScore.
 Approximates a complete systematic search of the conformational,
orientational, and positional space of the docked ligand.
 Hierarchical filters, including a rough scoring function that recognizes
hydrophobic and polar contacts, dramatically narrow the search space
 Torsionally flexible energy optimization on an OPLS-AA nonbonded potential
grid for a few hundred surviving candidate poses.
 The very best candidates are further refined via a MC sampling of pose
conformation.
 A modified ChemScore (Eldridge et al. 1997) that combines empirical and
force-field-based terms.
 Validation: 282 complexes, new ligand conformation, the top-ranked pose:
50%<1 A, ~33% >2 A.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
35
FRED (OpenEye www.eyesopen.com)
 Systematic, nonstochastic, docking
 Directed docking with SMARTS enclosures
 ChemScore, PB-SA, ChemGauss, PLP, ScreenScore
 Multiple active site comparisons
 Multiple simultaneous scoring functions and hit lists
 RMS clustering of hit-lists
 Refinement of docked poses in the context of the active site using MMFF
 On-the-fly OMEGA conformer generation
 Robust reading and specification-compliant writing of SDF, MOL, MOL2,
PDB, MacroModel, XYZ, and OEBinary file formats
 Distributed processing via PVM for most Unix platforms
September 21, 2004
Copyright © 2004 Eli Lilly and Company
36
CDOCKER & SDOCKER

Randomly generate ligand seeds in the binding site

High temperature MD using a modified version of CHARMM

Locate minima from all of the MD simulations

Fully minimization

Cluster on position and geometry

Rank by energy (interaction + ligand conformation)

SDOCKER: X-ray structure of complex as templates to guide docking
Wu et al. 2003;
Wu et al. 2004.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
37
Matrix of Accuracy & Success
Drug <- Quality Novel Lead <- Active
 Reproduce binding mode (X-ray crystal structures)
 Predict binding affinity (free energies)
 Rank diverse set of compounds (by binding affinity)
 Enhance hit rate for database mining
 Reduce false positive (Nselected-Nhits) and false negative (Nall_hits-Nhits)
 Fast enough for iterative SBDD
active inactive
active TRUE FALSE
inactive FALSE TRUE
September 21, 2004
 N hits


N selected VS
H VS 
EF 

H0
 N all _ hits



N
all

0
Copyright © 2004 Eli Lilly and Company
38
Accuracy of Docking
 Reality Boundary

Experimental errors: 0.1-0.25 kcal/mol (18-53%) with MSR (maximum significant
ratio) as much as 3 fold (0.65 kcal/mol)
 Free energy calculation accuracy: ~1 kcal/mol (5.4 fold) starting with an accurate
geometric model & fully sampling
 Entropy and solvation estimation need a sufficiently long simulation run with an
accurate force field, an ensemble of explicit of water molecules, and fully sampling
 Current





Reproduce X-ray structure with rmsd<2A: 50-90% achievable
Binding affinity: 1.5~2 log unit (32-100 fold, 2.05-2.73 kcal/mol)
Correlation between scores and affinities, r^2<0.3
Enthalpy ranking with minimization: ±5 kcal/mol
Hit rate enhancement : 2~50 fold with hit rate 1-20% (and high false negative rate
if 1~5% of total compounds selected)
(Wang et al. 2003; Erickson et al. 2004; and others.)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
39
Docking Accuracy: Examples
Typical HTS
1,000,000 tested
0.3%
3,000 actives
0.3%
10 quality
novel leads
To find 5 quality leads using docking:
15,000 needed
if only 200-2,000 selected
10%, EF=33
75-100%, EF>250
1,500
actives
 Example 1. Docking of a focused library of 55 PI3Kg inhibitors which
share a common chemotype, IC50 8-20000 nM

GLIDE docking, scored by LUDI, Ligscore, GScore, PMF, PLP.

r^2=0.02-0.15

Straight GLIDE docking: hit rate 0.34%.

Used additional knowledge (only poses with substructure’s rmsd <2.5 A vs. a cocrystal), hit rate 9.8% (J. Klicic, 2004)
September 21, 2004
Copyright © 2004 Eli Lilly and Company
40
Docking Accuracy: More Examples
 Example 2. 800 PDB complexes, resolution<2.5A, Ki or Kd known,
MW<1000, non-covalent bond, no cofactor, 200 different proteins

13 scoring functions from SYBYL, Cerius2, GOLD, etc
 r^2 = 0.02 to 0.32, sd = 1.8 to 2.2 log (2.5-3.0 kcal/mol)
 Best from X-Score, DrugScore, Sybyl ChemScore, Cerius2 PLP (Wang et. al.
2004)
 Example 3. Compared CDOCKER, DOCK, GOLD, FlexX for
reproducing X-ray crystal structure with rmsd < 2 A

The most important factors are flexibility of protein and ligand
 Suggest to apply VS on only compounds with <8 rotatable bonds
 Use CORINA for 3D conformation generation
 Softer potentials in the beginning (Erickson et al. 2004)
Bottom line: current docking is almost always better than random, but
still way too inaccurate to be a sole or dominant approach for lead
generation. Multiple CADD & SBDD approaches should be used for
any VS/MTS and lead optimization efforts.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
41
Docking Applications
 Determine the lowest free energy structures for the receptorligand complex
 Search database and rank hits for lead generation
 Calculate the differential binding of a ligand to two different
macromolecular receptors
 Study the geometry of a particular complex
 Propose modification of a lead molecules to optimize potency
or other properties
 de novo design for lead generation
 Library design
September 21, 2004
Copyright © 2004 Eli Lilly and Company
42
Docking of Combinatorial Libraries
 Combinatorial docking problem: given a library of ligands, calculate
the docking score (and the geometry of the complex) for each molecules of
the library
 R-group selection problem: given a library, select molecules for the
individual R-groups in order to form a smaller sublibrary with an enriched
number of hits
 de novo library design: given a catalog of available reagents, design a
library (incl. The rules of synthesis) that will optimize the number of hits
 The incremental construction method: PRO_SELECT, CombiDOCK
(Sun, Ewing et al. 1998), FlexXc
 Docking of the fully enumerated library followed by plate
optimization or cherry-picking
September 21, 2004
Copyright © 2004 Eli Lilly and Company
43
Docking to Nucleic Acid Targets
 RNA and DNA as potential drug targets

Ribosome RNA structures (Agalarov et al. 2000; Ban et al. 2000; Filikov
et al. 2000; Nissen et al. 2000; Wimberly et al. 2000)
 Highly charged environments, well-defined binding pocket
 DOCK identified compounds selectively bind to RNA duplexes or
DNA qudraplexes (Chen et al. 1996; Chen et al. 1997). The portions
in the DOCK suite that calculate electrostatics, including solvation,
partial charges, and scoring function were recently optimized for
RNA targets (Downing et al. 2003; Kang et al. 2004).
 A MC minimization and an empirical scoring function which accounts
for solvation, isomerization free energy, and changes in
conformational entropy were used to rank compounds (Hermann &
Westhof 1999).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
44
Challenges to Docking Approach
 Binding affinity is only one of many attributes of a drug
 Structures of most drugeable targets undetermined
 The identification of the binding site
 Dependence on protein and ligand structures

Source (epo, co-crystal, complex of other inhibitor, NMR, homology), Treatment
(hydrogen atoms, optimization), Flexibility, Starting Conformation, Structural
Diversity, Protonated State
 Similar ligands may unexpectedly bind in quite different modes

MJ33 in phospholipase A2 (Sekar et al. 1997); BANA113 in influenza virus
neuraminidase (Sudbeck et al. 1997).
 Favor larger & more complicated molecules

But contributions to binding free energy from the heavy atoms of the ligand level
off at ~15 atoms. Many interactions, including H-bonding, do not always lead to
higher binding affinity (Kuntz et al. 1999).
September 21, 2004
Copyright © 2004 Eli Lilly and Company
45
Challenges to Docking Approach
 Large energies vs. small energy differences
 Find weakly potent compounds in pools of nonbinders
 High false positives and false negatives from in silico screen
 Explicit water are needed for: volume, change shape of the binding
site, bridging interaction
 A scoring function that always has its global optimum in agreement
with the experiment.
 Good affinity prediction not necessarily leads to correct binding mode
 Speed and accuracy
September 21, 2004
Copyright © 2004 Eli Lilly and Company
46
Successes of Docking & SBDD
 HIV protease inhibitor amprenavir (Agenerase) from Vertex &
GSK (Kim et al. 1995)
 HIV: nelfinavir (Viracept) by Pfizer (& Agouron) (Greer et al.
1994)
 Influenza neuraminidase inhibitor zanamivir (Relenza) by GSK
(Schindler 2000)
 Widely used & greatly appreciated. Identified many hits.

Review articles by Kuntz 1992; Kuntz et al. 1994; Kubinyi 1998; Muegge
& Rarey 2001; Blundell 2002; Halperin et al. 2002; Shoichet et al. 2002;
Taylor et al. 2002; Waszkowycz 2002; Davis et al. 2003; Schneidmanduhovny et al. 2004.
September 21, 2004
Copyright © 2004 Eli Lilly and Company
47