Transcript Slide 1
Docking of small molecules
using Discovery Studio
Tel-Aviv University
Goal of the workshop:
Provide useful information required for using Discover Studio docking
algorithms.
Outline
- A brief review on docking algorithms
- LigandFit: the workflow
- CDocker: the workflow
- Hands-on session:
-
Visualization tools in Discovery Studio
Docking with LigandFit
Docking with Cdocker
Post-docking analisys
The molecular docking problem
Given two molecules with 3D conformations in atomic details:
Do the molecules bind to each other?
How does the molecule-molecule complex looks like?
How strong is the binding affinity?
What do we dock?
The two molecules might be:
A protein (enzyme, receptor) and a small
molecule (substrates, ligands)
A protein and a DNA molecule
Two proteins
Why do we dock?
Drug discovery costs are too high: ~$800 millions, 8~14 years,
~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)
Drugs interact with their receptors in a highly specific and
complementary manner.
Core of the target-based structure-based drug design (SBDD) for
lead generation and optimization.
Lead is a compound that
shows biological activity,
is novel, and
has the potential of being structurally modified for improved
bioactivity and selectivity
Three components of docking
pre- and/or
during docking:
Representation of receptor
binding site and ligand
during docking:
Sampling of configuration space
of the ligand-receptor complex
during docking
and scoring:
Evaluation of ligand-receptor
interactions
Basic principles
The association of molecules is based on interactions
H-bonds
salt bridges
hydrophobic contacts
electrostatic
very strong repulsive (VdW) interactions on short distances
Ligands are flexible
Receptors are mostly rigid
Sampling of configuration space of the ligand-receptor
complex
Descriptor-matching: using pattern-recognizing geometric methods to
match ligand and receptor site descriptors
geometric, chemical, pharmacophore properties, such as distance
pairs, triplet, volume, vector, hydrogen-bond, hydrophobic, charged,
etc.
Molecular simulation: MD (molecular dynamics), MC (Monte Carlo)
Others: GA (genetic algorithm), similarity, fragment-based
No “best” method
Molecular simulation: MD & MC
Two major components:
The description of the degrees of freedom
The energy evaluation
The local movement of the atoms is performed
Due to the forces present at each step in MD (Molecular Dynamics)
Randomly in MC (Monte Carlo)
Usually time consuming:
Search from a starting orientation to low-energy configuration
Several simulations with different starting orientation must be
performed to get a statistically significant result
Genetic algorithm docking
Requires the generation of an initial population where conventional
MC and MD require a single starting structure in their standard
implementation.
The collection of genes (chromosome) is assigned a fitness based
on a scoring function. There are three genetic operators:
mutation operator randomly changes the value of a gene;
crossover exchanges a set of genes from one parent
chromosome to another;
migration moves individual genes from one sub-population to
another.
Docking programs
Dock (UCSF)
Autodock (Scripps)
Glide (Schrodinger)
ICM (Molsoft)
FRED (Open Eye)
Gold
FlexX, etc.
12
Evaluation of docking programs
Evaluation of library ranking efficacy in virtual screening. J
Comput Chem. 2005 Jan 15;26(1):11-22.
Evaluation of docking performance: comparative data on
docking algorithms. J Med Chem. 2004 Jan 29;47(3):558-65.
Impact of scoring functions on enrichment in docking-based
virtual screening: an application study on renin inhibitors. J
Chem Inf Comput Sci. 2004 May-Jun;44(3):1123-9.
And more….
LigandFit
CDOCKER
Methodology
Shape-based docking
CHARMm-based
docking/refinement
Usage
Screening of
medium-size
libraries in well
defined binding
cavities
Screening of small
libraries &
refinement of
docking poses
Speed
Medium
Medium-Slow
Associated Tools
and Utilities
Site definition by ligand or
receptor: Pose interactions
filters
Binding site sphere
definition; Forcefield typing
LigandFit
Active-site finding
Automatic active site location using flood filling algorithm
Flexible docking of ligands
Searches the ligand conformational space to find the best fit
into the protein active site
1,000 conformations per sec
Fast ligand scoring
Initial scoring based on both internal energy of ligand and
interaction energy between ligand and protein
With DS LigandScore, a variety of scoring functions are available
for final analysis
LigandFit workflow
Define binding site/site
partition
No. Monte Carlo trials
Generate ligand
conformation
Fail
Ligand/Site Shape
Match
Pass
Position and Orient
Ligand to Site
No
Is it better than saved
poses?
Is it different from saved
poses?
Rank the poses
Apply scoring
function(s)
Save pose in
Save List
Yes
Replace the worst pose
Prepare your protein
All hydrogens must be added
All atom valencies must be satisfied for correct atom typing
Use Tools → Protein Modeling → Clean to :
check structural disorder
fix connectivity and bonds order
add H at a specified pH
Use Preferences → Protein Utilities to set Clean tool options
Binding site identification
Before beginning docking calculations…
Where is the binding site?
Binding site characteristics
Liang et al. 1998 found small
molecule binding sites to be:
Indentations,
Crevices, or
Cavities
And often the largest site is
the true binding site
Laskowski et al. 1996 reported
an analysis of cleft volumes:
Often the ligand is bound in
the largest cleft
Usually the largest cleft is
considerably larger than the
others
Abl tyrosine kinase
HSV-1
thymidine
kinase
Prepare you protein: define the active site using Site
Search Algorithm
1. Set up a grid around the
protein
2. Use a probe to test for Van der
Waals clashes at each grid
point
Free point
Protein point
• Default resolution is 0.5 Å but
can be adjusted by the user
Site Search Algorithm
3. Clean free points by an
“eraser”
Eraser
Free point
Protein point
Erased point
• Clean “free” grid points
• Eraser size can be varied
Site Search Algorithm
4. If the “eraser” is unable to enter a
cavity, all grid points inside the cavity
are considered as a site.
Site point
Protein point
Outside point
Site editing
A site definition can be modified
Site Editing links in Binding Site Tool Panel
Contraction/Expansion
Site points are objects
Manually selected and deleted
Recommended:
manually remove “tails”
expand 2-3 times
Preferences → Binding sites → site opening.
Changes the eraser size (recommended size is 5)
If the ligand is smaller than site use partition site
option
Align
Site
Maybe
Ligand
Site
Partition
Site search by protein shape
Flood-filling algorithm identifies
possible binding sites
Fast (a few seconds)
Will work on any protein shape
Not sensitive to the orientation
of the protein in the grid
Prepare your protein: Interaction Filters
If you know that certain /residue atom promote your ligand-receptor
interaction you can define an interaction site
Select protein atom(s) as interaction sites
Hydrogens for defining a donor on the protein
Heavy atom (such as O, N) for an acceptor
Select Carbons for Hydrophobic
Attributes and type can be edited
Accessed by Edit| Attributes…
menu
Right-click a selected object
and select Attributes of
Energy grid parameter
Select a forcefield and partial charge calculation method to be used
in the evaluation of ligand pose-receptor interaction energies
during docking
Dreiding - default
PLP1 – good for many (and mainly) hydrophobic interactions
CFF – more accurate then Dreiding; time-consuming though
Click on the arrow symbol to reveal advanced parameters
LigandFit conformational search
Required for flexible fit of the ligand
Monte Carlo search in torsional space
Bond lengths and bond angles fixed
Multiple torsion changes simultaneously
Rings are not varied
Upper limit of random dihedral perturbation is 180°
Lower limit depends on the number of rotating atoms
Coarse search step
N
OH
H
N
N
N
N
H
O
O
Fine search step
OH
Monte Carlo (MC) trials dialog
Perform Rigid Docking
A docking mode that treats the
ligand as a rigid body. The ligand
conformation is not changed during
docking
Use a fixed number of MC steps
Specify a fixed number of iterations
for the Monte Carlo conformer
generation which is employed for all
input ligands
Use variable number of MC steps from table
This table allows you to adjust the number of iterations and consecutive
failures based on the number of ligand torsion
Docking mode
Docking or Rigid-Body Minimization only
Docking
places ligand into the binding site
shape matching and refinements done
Rigid-Body Minimization
position of input ligands specified by the starting coordinates
rigid-body minimization of the ligand-protein interaction energy
No attempt is made to place the ligand into the binding site, so the
input file should be "pre-docked" for meaningful results
Evaluating the ligand position
Once fit is completed…
how good is it?
Ligand position initially evaluated using DockScore
Energy-based
Grid-based
Higher scores indicate better fit
Choice of forcefields
Dreiding - default
PLP1 – good for many (and mainly hydrophobic intercations)
CFF – is more accurate then Dreiding; time-consuming
though
Protein-ligand interaction filters
Features may be used as a “filter” for docked poses
Does not affect how a ligand is positioned or optimized
Once a ligand is docked, its pose is examined to find how many
features are matched between the receptor and the docked pose
The number of matched features influences whether the pose will
be saved to the Save List
Scoring functions
Used for final evaluation of positions
after the DockScore is computed
Used during LigandFit Docking
Protocol
Or evaluated for a completed run
in Score Ligand Poses Protocol
Choice of Scores:
LigScore1
LigScore2
PLP1
PLP2
Jain
PMF
Ludi
Types of scoring functions
Force field based: nonbonded interaction terms as the score,
sometimes in combination with solvation terms
Empirical: multivariate regression methods to fit coefficients of
physically motivated structural functions by using a training set of
ligand-receptor complexes with measured binding affinity
Knowledge-based: statistical atom pair potentials derived from
structural databases as the score
Other: scores and/or filters based on chemical properties,
pharmacophore, contact, shape complementary
Consensus scoring functions approach
Force field based scoring functions
Aij Bij
qi q j
E a b 332
Drij
rij
i 1 j 1 rij
lig rec
e.g. CharmM in CDocker
Advantages
FF terms are well studied and have some physical basis
Transferable, and fast when used on a pre-computed grid
Disadvantages
Only parts of the relevant energies, i.e., potential energies &
sometimes enhanced by solvation or entropy terms
Electrostatics often overestimated, leading to systematic
problems in ranking complexes
Empirical scoring functions
G G0 Grot N rot GHB neutral_ Hbondsf R,
Gio ionic _ int f R, Garo aro _ int f R,
Glipo lipo.contf R,
LUDI
PLP
LigScore
Jain
Counts the number of interactions and assign a score based on the
number of occurrences
H-bonds, ionic interactions (easy to quantify)
Hydrophobic interactions (more difficult to assess and quantify)
Number of rotatable bonds frozen (link to entropic cost of
binding, quite difficult to estimate)
Advantages: fast & direct estimation of binding affinity
Knowledge-based potentials of mean force scoring
functions (PMF)
Assumptions
An observed crystallographic complex represents the optimum
placement of the ligand atoms relative to the receptor atoms
Advantages
Similar to empirical, but more general (much more distance data
than binding energy data)
Disadvantages
PMF are typically pair-wise, while the probability to find atoms A and
B at a distance r is non-pairwise and depends also on surrounding
atoms
Consensus Scoring
Combination of several scoring functions
The common top rankers get a higher consensus rank than single
outliers
False positives can be detected easier than one singular scoring
function
Advisable to use 2-4 well-suited scoring functions for the consensus
score
Take home message
There is no best method!
Try different methods, force-fields, scoring functions
Refer to your results as a suggestion
Use the experimental data
LigandFit
CDOCKER
Methodology
Shape-based docking
CHARMm-based
docking/refinement
Usage
Screening of
medium-size
libraries in well
defined binding
cavities
Screening of small
libraries &
refinement of
docking poses
Speed
Medium
Medium-Slow
Associated Tools
and Utilities
Site definition by ligand or
receptor: Pose interactions
filters
Binding site sphere
definition; Forcefield typing
CDOCKER
CDOCKER is a CHARMm-based docking algorithm
Generate Ligand Conformations
Through High Temperature
Molecular Dynamics
Random (rigid-body) rotation
Grid-based Simulated Annealing
Full Minimization
Output of Refined
Ligand Poses
CDOCKER
CHARMm-based docking/refinement algorithm
Uses soft-core potentials and an optional grid representation to
dock ligands into the receptor active site
High temperature MD to generate (10) starting conformations
Take each conformation and perform random rigid body
rotations (10)
Minimise resulting structures (<=50)
Prepare your protein
All hydrogens must be added
All atom valencies must be satisfied for correct atom typing
Use Tools → Protein Modeling → Clean to :
check structural disorder
fix connectivity and bonds order
add H at a specified pH
Use Preferences → Protein Utilities to set Clean tool options
Prepare your protein: define your binding site
If you know the residues involved in the interaction with your ligand you
can define your binding site
Enlarge your site using attributes of the site-sphere
Advanced parameters
Advanced parameters for:
Forcefield
CHARMm
cff
Use Full Potential
Grid extension
Ligand partial charge method
(MMFF/CHARMm)
Final minimization
Grid-Based
Full potential
Post – docking tools
Score your poses
Consensus score
Analyze your poses