Transcript Slide 1
Docking of small molecules using Discovery Studio Tel-Aviv University Goal of the workshop: Provide useful information required for using Discover Studio docking algorithms. Outline - A brief review on docking algorithms - LigandFit: the workflow - CDocker: the workflow - Hands-on session: - Visualization tools in Discovery Studio Docking with LigandFit Docking with Cdocker Post-docking analisys The molecular docking problem Given two molecules with 3D conformations in atomic details: Do the molecules bind to each other? How does the molecule-molecule complex looks like? How strong is the binding affinity? What do we dock? The two molecules might be: A protein (enzyme, receptor) and a small molecule (substrates, ligands) A protein and a DNA molecule Two proteins Why do we dock? Drug discovery costs are too high: ~$800 millions, 8~14 years, ~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004) Drugs interact with their receptors in a highly specific and complementary manner. Core of the target-based structure-based drug design (SBDD) for lead generation and optimization. Lead is a compound that shows biological activity, is novel, and has the potential of being structurally modified for improved bioactivity and selectivity Three components of docking pre- and/or during docking: Representation of receptor binding site and ligand during docking: Sampling of configuration space of the ligand-receptor complex during docking and scoring: Evaluation of ligand-receptor interactions Basic principles The association of molecules is based on interactions H-bonds salt bridges hydrophobic contacts electrostatic very strong repulsive (VdW) interactions on short distances Ligands are flexible Receptors are mostly rigid Sampling of configuration space of the ligand-receptor complex Descriptor-matching: using pattern-recognizing geometric methods to match ligand and receptor site descriptors geometric, chemical, pharmacophore properties, such as distance pairs, triplet, volume, vector, hydrogen-bond, hydrophobic, charged, etc. Molecular simulation: MD (molecular dynamics), MC (Monte Carlo) Others: GA (genetic algorithm), similarity, fragment-based No “best” method Molecular simulation: MD & MC Two major components: The description of the degrees of freedom The energy evaluation The local movement of the atoms is performed Due to the forces present at each step in MD (Molecular Dynamics) Randomly in MC (Monte Carlo) Usually time consuming: Search from a starting orientation to low-energy configuration Several simulations with different starting orientation must be performed to get a statistically significant result Genetic algorithm docking Requires the generation of an initial population where conventional MC and MD require a single starting structure in their standard implementation. The collection of genes (chromosome) is assigned a fitness based on a scoring function. There are three genetic operators: mutation operator randomly changes the value of a gene; crossover exchanges a set of genes from one parent chromosome to another; migration moves individual genes from one sub-population to another. Docking programs Dock (UCSF) Autodock (Scripps) Glide (Schrodinger) ICM (Molsoft) FRED (Open Eye) Gold FlexX, etc. 12 Evaluation of docking programs Evaluation of library ranking efficacy in virtual screening. J Comput Chem. 2005 Jan 15;26(1):11-22. Evaluation of docking performance: comparative data on docking algorithms. J Med Chem. 2004 Jan 29;47(3):558-65. Impact of scoring functions on enrichment in docking-based virtual screening: an application study on renin inhibitors. J Chem Inf Comput Sci. 2004 May-Jun;44(3):1123-9. And more…. LigandFit CDOCKER Methodology Shape-based docking CHARMm-based docking/refinement Usage Screening of medium-size libraries in well defined binding cavities Screening of small libraries & refinement of docking poses Speed Medium Medium-Slow Associated Tools and Utilities Site definition by ligand or receptor: Pose interactions filters Binding site sphere definition; Forcefield typing LigandFit Active-site finding Automatic active site location using flood filling algorithm Flexible docking of ligands Searches the ligand conformational space to find the best fit into the protein active site 1,000 conformations per sec Fast ligand scoring Initial scoring based on both internal energy of ligand and interaction energy between ligand and protein With DS LigandScore, a variety of scoring functions are available for final analysis LigandFit workflow Define binding site/site partition No. Monte Carlo trials Generate ligand conformation Fail Ligand/Site Shape Match Pass Position and Orient Ligand to Site No Is it better than saved poses? Is it different from saved poses? Rank the poses Apply scoring function(s) Save pose in Save List Yes Replace the worst pose Prepare your protein All hydrogens must be added All atom valencies must be satisfied for correct atom typing Use Tools → Protein Modeling → Clean to : check structural disorder fix connectivity and bonds order add H at a specified pH Use Preferences → Protein Utilities to set Clean tool options Binding site identification Before beginning docking calculations… Where is the binding site? Binding site characteristics Liang et al. 1998 found small molecule binding sites to be: Indentations, Crevices, or Cavities And often the largest site is the true binding site Laskowski et al. 1996 reported an analysis of cleft volumes: Often the ligand is bound in the largest cleft Usually the largest cleft is considerably larger than the others Abl tyrosine kinase HSV-1 thymidine kinase Prepare you protein: define the active site using Site Search Algorithm 1. Set up a grid around the protein 2. Use a probe to test for Van der Waals clashes at each grid point Free point Protein point • Default resolution is 0.5 Å but can be adjusted by the user Site Search Algorithm 3. Clean free points by an “eraser” Eraser Free point Protein point Erased point • Clean “free” grid points • Eraser size can be varied Site Search Algorithm 4. If the “eraser” is unable to enter a cavity, all grid points inside the cavity are considered as a site. Site point Protein point Outside point Site editing A site definition can be modified Site Editing links in Binding Site Tool Panel Contraction/Expansion Site points are objects Manually selected and deleted Recommended: manually remove “tails” expand 2-3 times Preferences → Binding sites → site opening. Changes the eraser size (recommended size is 5) If the ligand is smaller than site use partition site option Align Site Maybe Ligand Site Partition Site search by protein shape Flood-filling algorithm identifies possible binding sites Fast (a few seconds) Will work on any protein shape Not sensitive to the orientation of the protein in the grid Prepare your protein: Interaction Filters If you know that certain /residue atom promote your ligand-receptor interaction you can define an interaction site Select protein atom(s) as interaction sites Hydrogens for defining a donor on the protein Heavy atom (such as O, N) for an acceptor Select Carbons for Hydrophobic Attributes and type can be edited Accessed by Edit| Attributes… menu Right-click a selected object and select Attributes of Energy grid parameter Select a forcefield and partial charge calculation method to be used in the evaluation of ligand pose-receptor interaction energies during docking Dreiding - default PLP1 – good for many (and mainly) hydrophobic interactions CFF – more accurate then Dreiding; time-consuming though Click on the arrow symbol to reveal advanced parameters LigandFit conformational search Required for flexible fit of the ligand Monte Carlo search in torsional space Bond lengths and bond angles fixed Multiple torsion changes simultaneously Rings are not varied Upper limit of random dihedral perturbation is 180° Lower limit depends on the number of rotating atoms Coarse search step N OH H N N N N H O O Fine search step OH Monte Carlo (MC) trials dialog Perform Rigid Docking A docking mode that treats the ligand as a rigid body. The ligand conformation is not changed during docking Use a fixed number of MC steps Specify a fixed number of iterations for the Monte Carlo conformer generation which is employed for all input ligands Use variable number of MC steps from table This table allows you to adjust the number of iterations and consecutive failures based on the number of ligand torsion Docking mode Docking or Rigid-Body Minimization only Docking places ligand into the binding site shape matching and refinements done Rigid-Body Minimization position of input ligands specified by the starting coordinates rigid-body minimization of the ligand-protein interaction energy No attempt is made to place the ligand into the binding site, so the input file should be "pre-docked" for meaningful results Evaluating the ligand position Once fit is completed… how good is it? Ligand position initially evaluated using DockScore Energy-based Grid-based Higher scores indicate better fit Choice of forcefields Dreiding - default PLP1 – good for many (and mainly hydrophobic intercations) CFF – is more accurate then Dreiding; time-consuming though Protein-ligand interaction filters Features may be used as a “filter” for docked poses Does not affect how a ligand is positioned or optimized Once a ligand is docked, its pose is examined to find how many features are matched between the receptor and the docked pose The number of matched features influences whether the pose will be saved to the Save List Scoring functions Used for final evaluation of positions after the DockScore is computed Used during LigandFit Docking Protocol Or evaluated for a completed run in Score Ligand Poses Protocol Choice of Scores: LigScore1 LigScore2 PLP1 PLP2 Jain PMF Ludi Types of scoring functions Force field based: nonbonded interaction terms as the score, sometimes in combination with solvation terms Empirical: multivariate regression methods to fit coefficients of physically motivated structural functions by using a training set of ligand-receptor complexes with measured binding affinity Knowledge-based: statistical atom pair potentials derived from structural databases as the score Other: scores and/or filters based on chemical properties, pharmacophore, contact, shape complementary Consensus scoring functions approach Force field based scoring functions Aij Bij qi q j E a b 332 Drij rij i 1 j 1 rij lig rec e.g. CharmM in CDocker Advantages FF terms are well studied and have some physical basis Transferable, and fast when used on a pre-computed grid Disadvantages Only parts of the relevant energies, i.e., potential energies & sometimes enhanced by solvation or entropy terms Electrostatics often overestimated, leading to systematic problems in ranking complexes Empirical scoring functions G G0 Grot N rot GHB neutral_ Hbondsf R, Gio ionic _ int f R, Garo aro _ int f R, Glipo lipo.contf R, LUDI PLP LigScore Jain Counts the number of interactions and assign a score based on the number of occurrences H-bonds, ionic interactions (easy to quantify) Hydrophobic interactions (more difficult to assess and quantify) Number of rotatable bonds frozen (link to entropic cost of binding, quite difficult to estimate) Advantages: fast & direct estimation of binding affinity Knowledge-based potentials of mean force scoring functions (PMF) Assumptions An observed crystallographic complex represents the optimum placement of the ligand atoms relative to the receptor atoms Advantages Similar to empirical, but more general (much more distance data than binding energy data) Disadvantages PMF are typically pair-wise, while the probability to find atoms A and B at a distance r is non-pairwise and depends also on surrounding atoms Consensus Scoring Combination of several scoring functions The common top rankers get a higher consensus rank than single outliers False positives can be detected easier than one singular scoring function Advisable to use 2-4 well-suited scoring functions for the consensus score Take home message There is no best method! Try different methods, force-fields, scoring functions Refer to your results as a suggestion Use the experimental data LigandFit CDOCKER Methodology Shape-based docking CHARMm-based docking/refinement Usage Screening of medium-size libraries in well defined binding cavities Screening of small libraries & refinement of docking poses Speed Medium Medium-Slow Associated Tools and Utilities Site definition by ligand or receptor: Pose interactions filters Binding site sphere definition; Forcefield typing CDOCKER CDOCKER is a CHARMm-based docking algorithm Generate Ligand Conformations Through High Temperature Molecular Dynamics Random (rigid-body) rotation Grid-based Simulated Annealing Full Minimization Output of Refined Ligand Poses CDOCKER CHARMm-based docking/refinement algorithm Uses soft-core potentials and an optional grid representation to dock ligands into the receptor active site High temperature MD to generate (10) starting conformations Take each conformation and perform random rigid body rotations (10) Minimise resulting structures (<=50) Prepare your protein All hydrogens must be added All atom valencies must be satisfied for correct atom typing Use Tools → Protein Modeling → Clean to : check structural disorder fix connectivity and bonds order add H at a specified pH Use Preferences → Protein Utilities to set Clean tool options Prepare your protein: define your binding site If you know the residues involved in the interaction with your ligand you can define your binding site Enlarge your site using attributes of the site-sphere Advanced parameters Advanced parameters for: Forcefield CHARMm cff Use Full Potential Grid extension Ligand partial charge method (MMFF/CHARMm) Final minimization Grid-Based Full potential Post – docking tools Score your poses Consensus score Analyze your poses