Transcript Slide 1
Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy [email protected]
2-0622
Outline Overview of drug discovery Structure-based computational methods When we know the structure of the targeted protein Ligand-based computational methods When we don’t know the protein’s structure
What is a drug?
Small Molecule Drugs Aspirin Sildenafil (Viagra) Darunavir Taxol Glipizide (Glucotrol) Digoxin
Nanoparticle s (e.g., packaged small-molecule drugs) Doxil (liposome package, extended circulation time,milder toxicity) http://www.doxil.com/about_doxil.html
Abraxane (albumin-packaged taxol) http://www.abraxane.com/professional/nab-technology.aspx
Biopharmaceuticals Erythropoietin (EPO) Stabilized variant of a natural protein hormone Etanercept (Enbrel) Protein with TNF receptor + Ab Fc domain Scavenges TNF, diminishes inflammation http://www.ganfyd.org/index.php?title=Erythropoietin_beta http://en.wikipedia.org/wiki/File:Enbrel.jpg
How are drugs discovered?
Aspirin Natural Products Digoxin Taxol Willow Foxglove Pacific Yew
Aspirin How Aspirin Works inflammation platelet activation platelet inactivation
Biomolecular Pathways and Target Selection E.g. signaling pathways Target protein http://www.isys.uni-stuttgart.de/forschung/sysbio/insulin/index.html
Compound library (commercial, in-house, synthetic, natural) Empirical Path to Ligand Discovery High throughput screening (HTS) Hit confirmation Lead compounds (e.g., µM K d ) Lead optimization (Medicinal chemistry) Animal and clinical evaluation Potent drug candidates (nM K d )
Compound Libraries Commercial (also in-house pharma) Government (NIH) Academia
Computer-Aided Ligand Design Aims to reduce number of compounds synthesized and assayed Lower costs Less chemical waste Faster progress
Scenario 1
Structure of Targeted Protein Known: Structure-Based Drug Discovery HIV Protease/KNI-272 complex
Protein-Ligand Docking Structure-Based Ligand Design Docking software Search for structure of lowest energy Potential function Energy as function of structure VDW
+ -
Screened Coulombic Dihedral
Energy Determines Probability (Stability) Boltzmann distribution
e
x
Structure-Based Virtual Screening Compound database
3D structure of target
(crystallography, NMR, modeling) Virtual screening (e.g., computational docking ) Candidate ligands Ligand optimization Med chem, crystallography, modeling Experimental assay Ligands Drug candidates
Fragmental Structure-Based Screening “Fragment” library
3D structure of target
(crystallography, NMR, modeling) Fragment docking Compound design Experimental assay and ligand optimization Med chem, crystallography, modeling Drug candidates http://www.beilstein-institut.de/bozen2002/proceedings/Jhoti/jhoti.html
Potential Functions for Structure-Based Design Energy as a function of structure Physics-Based Knowledge-Based
Physics-Based Potentials Energy terms from physical theory Van der Waals interactions (shape fitting) Bonded interactions (shape and flexibility) Coulombic interactions (charge-charge complementarity) Hydrogen-bonding
Common Simplifications Used in Physics-Based Docking Quantum effects approximated classically Protein often held rigid Configurational entropy neglected Influence of water treated crudely
Ligand
Proteins and Ligand are Flexible
Protein Complex D G o
+
E Free Binding Energy and Entropy
K
2
e
E Bound
/
R T
6
e
E Free
/
RT
Unbound states Bound states E Bound
RT
l n
E bound
E Free
RT
ln Energy part Entropy part
Structure-Based Discovery Physics-oriented approaches Weaknesses Fully physical detail becomes computationally intractable Approximations are unavoidable Parameterization still required Strengths Interpretable, provides guides to design Broadly applicable, in principle at least Clear pathways to improving accuracy Status Useful, far from perfect Multiple groups working on fewer, better approxs Force fields, quantum Flexibility, entropy Water effects Moore’s law: hardware improving
Knowledge-Based Docking Potentials Ligand carboxylate Aromatic stacking
Probability
Energy
Boltzmann: Inverse Boltzmann:
e
RT
RT
ln Example: ligand carboxylate O to protein histidine N 1.
2.
3.
4.
Find all protein-ligand structures in the PDB with a ligand carboxylate O For each structure, histogram the distances from O to every histidine N Sum the histograms over all structures to obtain p(r O N ) Compute E(r O N ) from p(r O N )
Knowledge-Based Docking Potentials “PMF”, Muegge & Martin, J. Med. Chem. 42:791, 1999 A few types of atom pairs, out of several hundred total Nitrogen + /Oxygen Aromatic carbons Aliphatic carbons
E
Atom-atom distance (Angstroms)
E vdw
E r ij
Structure-Based Discovery Knowledge-based potentials Weaknesses Accuracy limited by availability of data Accuracy may also be limited by overall approach Strengths Relatively easy to implement Computationally fast Status Useful, far from perfect May be at point of diminishing returns
Limitations of Knowledge-Based Potentials 1. Statistical limitations (e.g., to pairwise potentials) 100 bins for a histogram of O N & O C distances r 1 10 bins for a histogram of O N distances r 2 … r 10 r O N r O C r O N 2. Even if we had infinite statistics, would the results be accurate? (Is inverse Boltzmann quite right? Where is entropy?)
Scenario 2
Structure of Targeted Protein Unknown: Ligand-Based Drug Discovery e.g. MAP Kinase Inhibitors Using knowledge of existing inhibitors to discover more
Why Look for Another Ligand if You Already Have Some?
Experimental screening generated some ligands, but they don’t bind tightly A company wants to work around another company’s chemical patents An high-affinyt ligand is toxic, is not well-absorbed, etc.
Ligand-Based Virtual Screening Compound Library Optimization Med chem, crystallography, modeling Molecular similarity Machine-learning Etc.
Candidate ligands Assay
Known Ligands
Actives Potent drug candidates
Sources of Data on Known Ligand Journals, e.g., J. Med. Chem.
Some Binding and Chemical Activity Databases PubChem (NIH) pubchem.ncbi.nlm.nih.gov
ChEMBL (EMBL) www.ebi.ac.uk/chembl BindingDB (UCSD) www.bindingdb.org
BindingDB www.bindingdb.org
Finding Protein-Ligand Data in BindingDB e.g., by Name of Protein “Target” e.g., by Ligand Draw Search
Sample Query Results BindingDB to PDB
PDB to BindingDB
Download data in machine-readable format Sample Query Results
Machine-Readable Chemical Format Structure-Data File (SDF) SDF Format Defines Chemical Bonds PDB Format Lacks Chemical Bonding
There are Many Other Chemical File Formats Interconvert with Babel
Compounds (available/synthesizable)
Chemical Similarity
Ligand-Based Drug-Discovery Similar Test experimentally Don’t bother
Molecule 1 Molecule 2
Chemical Fingerprints
Binary Structure Keys …
Intersection Union
Chemical Similarity from Fingerprints
Tanimoto Similarity or Jaccard Index, T
T
N I N U
0.25
N I =2 N U =8 Molecule 1 Molecule 2
Hashed Chemical Fingerprints
Based upon paths in the chemical graph C 1-atom paths: C F 2-atom paths: F-C C-C 3-atom paths: F-C-C C-C-N N H C-N C-N-H C-S C-S-O S S-O O C-H Each path sets a pseudo-random bit-pattern in a very long molecular fingerprint S-O etc.
Maximum Common Substructure
N common =34
Potential Drawbacks of Plain Chemical Similarity May miss good ligands by being overly conservative Too much weight on irrelevant details
Scaffold Hopping Identification of synthetic statins by scaffold hopping Zhao, Drug Discovery Today 12:149, 2007
Abstraction and Identification of Relevant Compound Features Ligand shape Pharmacophore models Chemical descriptors Statistics and machine learning
Pharmacophore Models Φάρμακο (drug) + Φορά (carry) A 3-point pharmacophore
+ 1 Bulky hydrophobe
3.2 ±0.4 Å
Aromatic
Molecular Descriptors
More abstract than chemical fingerprints Physical descriptors molecular weight charge dipole moment number of H-bond donors/acceptors number of rotatable bonds hydrophobicity (log P and clogP) Topological branching index measures of linearity vs interconnectedness Etc. etc.
Rotatable bonds
Point representing a compound in descriptor space A High-Dimensional “Chemical Space” Each compound is at a point in an n -dimensional space Compounds with similar properties are near each other Descriptor 2
Statistics and Machine Learning Some examples Partial least squares Support vector machines Genetic algorithms for descriptor-selection
Summary Overview of drug discovery Computer-aided methods Structure-based Ligand-based Interaction potentials Physics-based Knowledge-based (data driven) Ligand-protein databases, machine-readable chemical formats Ligand similarity and beyond Mike Gilson, School of Pharmacy, [email protected]
, 2-0622
Activities and Discussion Topics BindingDB: Advil Machine-readable format, Binding activities PDB/BindingDB 2ONY at PDB BindingDB Substructure search Similarity search Related data Combined computational approaches (physics + knowledge)-based docking potentials (ligand + structure)-based computational discovery Other data-driven methods where it may be hard to get enough statistics Validation of computational methods Protein-ligand databases: getting data and assessing data quality
Drug Discovery Pipeline ( One Model) Target identification Target validation Assay development Lead compound (ligand) discovery Lead optimization Animal Pharmacokinetics, Toxicity Phase I Clinical (safety, metab, PK) Phase II Clinical (efficacy) Phase III Clinical (comparison with existing therapy)
Updated Knowledge-Based PMF Potential Muegge J. Med. Chem. 49: 5895, 2006