Transcript Slide 1

Docking of small molecules
using Discovery Studio
Tel-Aviv University
Goal of the workshop:
Provide useful information required for using Discover Studio docking
algorithms.
Outline
- A brief review on docking algorithms
- LigandFit: the workflow
- CDocker: the workflow
- Hands-on session:
-
Visualization tools in Discovery Studio
Docking with LigandFit
Docking with Cdocker
Post-docking analisys
The molecular docking problem
Given two molecules with 3D conformations in atomic details:
 Do the molecules bind to each other?
 How does the molecule-molecule complex looks like?
 How strong is the binding affinity?
What do we dock?
The two molecules might be:
 A protein (enzyme, receptor) and a small
molecule (substrates, ligands)
 A protein and a DNA molecule
 Two proteins
Why do we dock?
 Drug discovery costs are too high: ~$800 millions, 8~14 years,
~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)
 Drugs interact with their receptors in a highly specific and
complementary manner.
 Core of the target-based structure-based drug design (SBDD) for
lead generation and optimization.
 Lead is a compound that
shows biological activity,
is novel, and
has the potential of being structurally modified for improved
bioactivity and selectivity
Three components of docking
pre- and/or
during docking:
Representation of receptor
binding site and ligand
during docking:
Sampling of configuration space
of the ligand-receptor complex
during docking
and scoring:
Evaluation of ligand-receptor
interactions
Basic principles
 The association of molecules is based on interactions
 H-bonds
 salt bridges
 hydrophobic contacts
 electrostatic
 very strong repulsive (VdW) interactions on short distances
 Ligands are flexible
 Receptors are mostly rigid
Sampling of configuration space of the ligand-receptor
complex
 Descriptor-matching: using pattern-recognizing geometric methods to
match ligand and receptor site descriptors
 geometric, chemical, pharmacophore properties, such as distance
pairs, triplet, volume, vector, hydrogen-bond, hydrophobic, charged,
etc.
 Molecular simulation: MD (molecular dynamics), MC (Monte Carlo)
 Others: GA (genetic algorithm), similarity, fragment-based
 No “best” method
Molecular simulation: MD & MC
 Two major components:
The description of the degrees of freedom
The energy evaluation
 The local movement of the atoms is performed
Due to the forces present at each step in MD (Molecular Dynamics)
Randomly in MC (Monte Carlo)
 Usually time consuming:
Search from a starting orientation to low-energy configuration
Several simulations with different starting orientation must be
performed to get a statistically significant result
Genetic algorithm docking
 Requires the generation of an initial population where conventional
MC and MD require a single starting structure in their standard
implementation.
 The collection of genes (chromosome) is assigned a fitness based
on a scoring function. There are three genetic operators:
mutation operator randomly changes the value of a gene;
crossover exchanges a set of genes from one parent
chromosome to another;
migration moves individual genes from one sub-population to
another.
Docking programs







Dock (UCSF)
Autodock (Scripps)
Glide (Schrodinger)
ICM (Molsoft)
FRED (Open Eye)
Gold
FlexX, etc.
12
Evaluation of docking programs
 Evaluation of library ranking efficacy in virtual screening. J
Comput Chem. 2005 Jan 15;26(1):11-22.
 Evaluation of docking performance: comparative data on
docking algorithms. J Med Chem. 2004 Jan 29;47(3):558-65.
 Impact of scoring functions on enrichment in docking-based
virtual screening: an application study on renin inhibitors. J
Chem Inf Comput Sci. 2004 May-Jun;44(3):1123-9.
 And more….
LigandFit
CDOCKER
Methodology
Shape-based docking
CHARMm-based
docking/refinement
Usage
Screening of
medium-size
libraries in well
defined binding
cavities
Screening of small
libraries &
refinement of
docking poses
Speed
Medium
Medium-Slow
Associated Tools
and Utilities
Site definition by ligand or
receptor: Pose interactions
filters
Binding site sphere
definition; Forcefield typing
LigandFit
 Active-site finding
 Automatic active site location using flood filling algorithm
 Flexible docking of ligands
 Searches the ligand conformational space to find the best fit
into the protein active site
 1,000 conformations per sec
 Fast ligand scoring
 Initial scoring based on both internal energy of ligand and
interaction energy between ligand and protein
 With DS LigandScore, a variety of scoring functions are available
for final analysis
LigandFit workflow
Define binding site/site
partition
No. Monte Carlo trials
Generate ligand
conformation
Fail
Ligand/Site Shape
Match
Pass
Position and Orient
Ligand to Site
No
Is it better than saved
poses?
Is it different from saved
poses?
Rank the poses
Apply scoring
function(s)
Save pose in
Save List
Yes
Replace the worst pose
Prepare your protein
 All hydrogens must be added
 All atom valencies must be satisfied for correct atom typing
Use Tools → Protein Modeling → Clean to :
check structural disorder
 fix connectivity and bonds order
 add H at a specified pH
 Use Preferences → Protein Utilities to set Clean tool options
Binding site identification
 Before beginning docking calculations…
 Where is the binding site?
Binding site characteristics
 Liang et al. 1998 found small
molecule binding sites to be:
 Indentations,
 Crevices, or
 Cavities
 And often the largest site is
the true binding site
 Laskowski et al. 1996 reported
an analysis of cleft volumes:
 Often the ligand is bound in
the largest cleft
 Usually the largest cleft is
considerably larger than the
others
Abl tyrosine kinase
HSV-1
thymidine
kinase
Prepare you protein: define the active site using Site
Search Algorithm
1. Set up a grid around the
protein
2. Use a probe to test for Van der
Waals clashes at each grid
point
Free point
Protein point
• Default resolution is 0.5 Å but
can be adjusted by the user
Site Search Algorithm
3. Clean free points by an
“eraser”
Eraser
Free point
Protein point
Erased point
• Clean “free” grid points
• Eraser size can be varied
Site Search Algorithm
4. If the “eraser” is unable to enter a
cavity, all grid points inside the cavity
are considered as a site.
Site point
Protein point
Outside point
Site editing
 A site definition can be modified
 Site Editing links in Binding Site Tool Panel
 Contraction/Expansion
 Site points are objects
 Manually selected and deleted
 Recommended:
 manually remove “tails”
 expand 2-3 times
 Preferences → Binding sites → site opening.
Changes the eraser size (recommended size is 5)
If the ligand is smaller than site use partition site
option
Align
Site
Maybe
Ligand
Site
Partition
Site search by protein shape
 Flood-filling algorithm identifies
possible binding sites
 Fast (a few seconds)
 Will work on any protein shape
 Not sensitive to the orientation
of the protein in the grid
Prepare your protein: Interaction Filters
 If you know that certain /residue atom promote your ligand-receptor
interaction you can define an interaction site
 Select protein atom(s) as interaction sites
Hydrogens for defining a donor on the protein
Heavy atom (such as O, N) for an acceptor
Select Carbons for Hydrophobic
Attributes and type can be edited
Accessed by Edit| Attributes…
menu
Right-click a selected object
and select Attributes of
Energy grid parameter
 Select a forcefield and partial charge calculation method to be used
in the evaluation of ligand pose-receptor interaction energies
during docking
 Dreiding - default
 PLP1 – good for many (and mainly) hydrophobic interactions
 CFF – more accurate then Dreiding; time-consuming though
 Click on the arrow symbol to reveal advanced parameters
LigandFit conformational search
 Required for flexible fit of the ligand
 Monte Carlo search in torsional space
 Bond lengths and bond angles fixed
 Multiple torsion changes simultaneously
 Rings are not varied
 Upper limit of random dihedral perturbation is 180°
 Lower limit depends on the number of rotating atoms
Coarse search step
N
OH
H
N
N
N
N
H
O
O
Fine search step
OH
Monte Carlo (MC) trials dialog
 Perform Rigid Docking
 A docking mode that treats the
ligand as a rigid body. The ligand
conformation is not changed during
docking
 Use a fixed number of MC steps
 Specify a fixed number of iterations
for the Monte Carlo conformer
generation which is employed for all
input ligands
 Use variable number of MC steps from table
This table allows you to adjust the number of iterations and consecutive
failures based on the number of ligand torsion
Docking mode
 Docking or Rigid-Body Minimization only
 Docking
 places ligand into the binding site
 shape matching and refinements done
 Rigid-Body Minimization
 position of input ligands specified by the starting coordinates
 rigid-body minimization of the ligand-protein interaction energy
 No attempt is made to place the ligand into the binding site, so the
input file should be "pre-docked" for meaningful results
Evaluating the ligand position
 Once fit is completed…
how good is it?
 Ligand position initially evaluated using DockScore
 Energy-based
 Grid-based
 Higher scores indicate better fit
 Choice of forcefields
 Dreiding - default
 PLP1 – good for many (and mainly hydrophobic intercations)
 CFF – is more accurate then Dreiding; time-consuming
though
Protein-ligand interaction filters
 Features may be used as a “filter” for docked poses
 Does not affect how a ligand is positioned or optimized
 Once a ligand is docked, its pose is examined to find how many
features are matched between the receptor and the docked pose
 The number of matched features influences whether the pose will
be saved to the Save List
Scoring functions
 Used for final evaluation of positions
after the DockScore is computed
 Used during LigandFit Docking
Protocol
 Or evaluated for a completed run
in Score Ligand Poses Protocol
 Choice of Scores:
LigScore1
LigScore2
PLP1
PLP2
Jain
PMF
Ludi
Types of scoring functions
 Force field based: nonbonded interaction terms as the score,
sometimes in combination with solvation terms
 Empirical: multivariate regression methods to fit coefficients of
physically motivated structural functions by using a training set of
ligand-receptor complexes with measured binding affinity
 Knowledge-based: statistical atom pair potentials derived from
structural databases as the score
 Other: scores and/or filters based on chemical properties,
pharmacophore, contact, shape complementary
 Consensus scoring functions approach
Force field based scoring functions
 Aij Bij
qi q j

E   a  b  332

Drij
rij
i 1 j 1  rij
lig rec




e.g. CharmM in CDocker
 Advantages
FF terms are well studied and have some physical basis
Transferable, and fast when used on a pre-computed grid
 Disadvantages
Only parts of the relevant energies, i.e., potential energies &
sometimes enhanced by solvation or entropy terms
Electrostatics often overestimated, leading to systematic
problems in ranking complexes
Empirical scoring functions
G  G0  Grot N rot  GHB  neutral_ Hbondsf R,  
 Gio  ionic _ int f R,    Garo  aro _ int f R,  
 Glipo  lipo.contf R,  
LUDI
PLP
LigScore
Jain
 Counts the number of interactions and assign a score based on the
number of occurrences
 H-bonds, ionic interactions (easy to quantify)
 Hydrophobic interactions (more difficult to assess and quantify)
 Number of rotatable bonds frozen (link to entropic cost of
binding, quite difficult to estimate)
 Advantages: fast & direct estimation of binding affinity
Knowledge-based potentials of mean force scoring
functions (PMF)
 Assumptions
An observed crystallographic complex represents the optimum
placement of the ligand atoms relative to the receptor atoms
 Advantages
Similar to empirical, but more general (much more distance data
than binding energy data)
 Disadvantages
PMF are typically pair-wise, while the probability to find atoms A and
B at a distance r is non-pairwise and depends also on surrounding
atoms
Consensus Scoring
 Combination of several scoring functions
 The common top rankers get a higher consensus rank than single
outliers
 False positives can be detected easier than one singular scoring
function
 Advisable to use 2-4 well-suited scoring functions for the consensus
score
Take home message
There is no best method!
 Try different methods, force-fields, scoring functions
 Refer to your results as a suggestion
 Use the experimental data
LigandFit
CDOCKER
Methodology
Shape-based docking
CHARMm-based
docking/refinement
Usage
Screening of
medium-size
libraries in well
defined binding
cavities
Screening of small
libraries &
refinement of
docking poses
Speed
Medium
Medium-Slow
Associated Tools
and Utilities
Site definition by ligand or
receptor: Pose interactions
filters
Binding site sphere
definition; Forcefield typing
CDOCKER
 CDOCKER is a CHARMm-based docking algorithm
Generate Ligand Conformations
Through High Temperature
Molecular Dynamics
Random (rigid-body) rotation
Grid-based Simulated Annealing
Full Minimization
Output of Refined
Ligand Poses
CDOCKER
 CHARMm-based docking/refinement algorithm
 Uses soft-core potentials and an optional grid representation to
dock ligands into the receptor active site
 High temperature MD to generate (10) starting conformations
 Take each conformation and perform random rigid body
rotations (10)
 Minimise resulting structures (<=50)
Prepare your protein
 All hydrogens must be added
 All atom valencies must be satisfied for correct atom typing
Use Tools → Protein Modeling → Clean to :
check structural disorder
 fix connectivity and bonds order
 add H at a specified pH
 Use Preferences → Protein Utilities to set Clean tool options
Prepare your protein: define your binding site
If you know the residues involved in the interaction with your ligand you
can define your binding site
 Enlarge your site using attributes of the site-sphere
Advanced parameters
 Advanced parameters for:
 Forcefield
CHARMm
cff
 Use Full Potential
 Grid extension
 Ligand partial charge method
(MMFF/CHARMm)
 Final minimization
Grid-Based
Full potential
Post – docking tools
Score your poses
Consensus score
Analyze your poses