Transcript Document
Presentation of a Structurally Diverse and Commercially Available Drug Data Set for Correlation and Benchmarking Studies Anders Karlén Uppsala University H2N SH O HO HO P OH H3C CH3 HO P HO CH3 O NH2 O NH2 O HO HO NH2 H2N O O O N H H3 C H 3C CH3 H3C O CH3 CH3 CH3 CH3 CH3 N HO CH3 O H O CH3 H3 C CH3 OH O CH3 O N N HO OH O S O O O H HO OH OH O NH2 O CH2NH2 NH2 Aim of study • Derive a “benchmark data set“ – – – – Drug-like Physicochemically diverse Commercially available and inexpensive Amenable to analytical measurements • Start the generation of benchmark data – Derive good-quality data from the same lab Possible use of the data set • General description of drugs • Developing ADME/TOX filters (permeability, solubility, plasma protein binding etc.) • To validate novel experimental techniques Generation of a ”benchmark” data set based on the list of drugs in Sweden (FASS 2001) Remove compounds •Molecular weight >900 •Polymers, polypeptides •Inorganic and metal containing 799 cpds 691 cpds 450 •Select only oral, nasal, pulminal, ocular, parenteral and rectal administered drugs 370 cpds Select commercially available < $800/g 332 cpds 284 cpds Remove “odd” ATC classes e.g. A01(Mouth and teeth), A05(Bile acids) Exp. A06 (Laxative)… design 24-compound data set Cost and availability of the 691-compound data set Histogram 450 of the 691 compounds can be bought Price range $0.03/gram - $3,228 000/gram (2001) N 200 N N N Methenamine 150 H3C 100 CH3 CH3 OH H 50 CH2 HO OH Calcitrol 0.0284 - 24.9 24.9 24.9–-50.2 50.2 0.03 -24.9 50.2–- 79.6 79.6 50.2 79.6–- 100 79.6 100 Binned Price/gram ($) 100 –- 995 100 995 995 995- –3228000 3,228 000 Back Principal component analysis • 8 • 6 4 • 2 0 • -2 -4 General descriptors General hydrogen bonding descriptors Hydrogen bond donor descriptors Hydrogen bond acceptor descriptors -6 S28 molecular descriptors -8 -10 -8 -6 -4 -2 0 2 4 6 8 10 SIMCA-P 1112 - 2006-11-0114 16:08:45 16 Principal component analysis 8 6 8 4 2 0 6 Polarity -2 -4 -6 4 Lipophilicity -8 Size -10 -8 -6 -4 -2 0 2 Series Series (Variable MOL_WEIGHT) (Variable MLOGP) Series (Variable PSASAVOL) 0 - 200-7 - -4 0 - 100 200 - 400 -4 - -1 100 - 200 400 - 600 -1 - 2 200 - 300 600 - 800 2-5 300 - 400 800 - 1000 5-8 t[2] 0 -2 -4 -6 -8 -10 -8 -8 -6 -6 -6 -4 -4-4 -2-2-2 00 0 22 2 4 4 4 6 6 68 8 12 12 14 81010 10 1214 16 1416 16 t[1] t[1]t[1] SIMCA-P+ 11 - 2006-11-10 10:27:53 10:32:21 10:34:12 2 4 6 8 10 16 SIMCA-P 1112 - 2006-11-0114 16:08:45 The factorial design “A face-centered central composite design” PC2 PC2 -+- -++ ++- +++ --- +-- PC1 --+ PC3 PC3 +-+ PC1 24-compound data set N H HO SH HS N N HO COOH O Thiamazole () O O N H S Cl N H H O O N N NO2 H O Prednisone () N H 2N-SO2 O Cl Flupenthixol () O O S F 3C N N H H 2N O S N O OH N H O O OH O OH O OH NH 2 N N H O Fenofibrate () H HO H OH O O H COOH OH N H N H 2N N H N N COOH N H Folic acid () N O NH 2 Meclizinea () O O Carisoprodola () HO I HO I N OH HO I H 2N O () OH COOH H I Terfenadineb PC3 N H Glipizide () N O N O N H N Tetracycline () N O S O O Cl PC1 NH Bendroflumethiazidea () Metoclopramide () O NH Tinidazole () H CF 3 PC2 Hydrochlorothiazide () OH OH O Chlorzoxazone () NH 2 H 2N-SO2 Chlorprothixene () O O N H NH 2 O O N Carbamazepine () Cl N S NH 2 S H 2N NH Amiloride () Sulindac () Cl Cl N S N O Cl O HOOC Amantadine () 20 proteolytes 4 nonproteolytes Levodopa () Captopril () F NH 2 COOH NH 2 Levothyroxine () OH HO O O O O N O O O Erythromycin () OH The cost of buying the entire data set (at least 1 gram of each compound) is less than $1,500 Comparison of the data sets with respect to some common molecular descriptors MW 691-compound data set 24-compound data set Min Min 60 Max 854 Mean 347 114 Max 777 Mean 349 NH2 O HO HO NH2 H2N O O OH O HO NH2 OH O PSA 0 373 93 8 246 99 logPMor 6.4 7.6 1.9 2.0 5.3 1.9 logDACD_6.5 10.6 12.3 0.74 5.0 4.8 0.94 NH2 O H CH2NH2 HO OH Neomycin HBD = 19 CH3 N O N HBD 0 19 2.4 0 8 2.7 O O HBA 0 19 4.9 1 14 4.7 O O O CH3 N N NH N Candesartan cilexetil logPMor= 7.6 Functional group AT I C M sA t-A H C Y H C LI C R O TE C M IN E M IN E ES AT I O N E M IN E TI C pA C C AT I AR O M M AT I PH A N ZE H IN E O AM C O p- IN E IN E IN E sAM H ET ER O AR O AR O M AM t- A M BE TI C TI C AL I PH A PH A AR O AL I AL I q- TI C TI C PH A PH A AL I AL I Percent of compounds containing the functional group Comparison of the data sets with respect to functional groups 24-set 75,00% 691- (druglike set FASS only) 50,00% 25,00% 0,00% Comparison of the data sets with respect to ATC classes Distribution in ATC Number of substances ATC A B C D G H J L M N P R S V Description GI Blood Cardio Topical Gen.hormones Hormones Infection Tum.,immuno Muscle,mov. Nervous Antiparasite Respiration Eye,ear Various 24-set 1 0 2 0 1 3 5 1 3 6 0 1 1 0 691-set 69 21 89 36 38 14 89 53 37 134 13 52 24 22 Percent of dataset 24-set 4,2% 0,0% 8,3% 0,0% 4,2% 12,5% 20,8% 4,2% 12,5% 25,0% 0,0% 4,2% 4,2% 0,0% 691-set 9,99% 3,04% 12,88% 5,21% 5,50% 2,03% 12,88% 7,67% 5,35% 19,39% 1,88% 7,53% 3,47% 3,18% The Anatomical Therapeutic Chemical (ATC) classification system is the most commonly used classification system for drug substances Start the generation of benchmark data. Derive good-quality data from the same lab 1. Measurment of pKa by pH-metric or pH-UV technique (n=20) 2. Measurment of lipophilicity 3. Measurment of intrinsic and kinetic solubility 4. Measurment of permeability across Caco-2 Cells. A to B direction (n=22) (a) pH-metric logP (n=18) (b) capacity factors by RP-HPLC (n=21) pH-metric solubility (CheqSol technique) or shake-plate solubility (n=17) -2,00 -3,00 an Be ta nd ro Am dine flu il m ori et de hi az C hl Ca ide or pt o p C rot pri l hl or hixe z n Er oxa e yt zo hr n o e F e my no cin Fl fibr up a en t e H yd th ix ro G ch lip ol lo i ro zid th e i Le azi Le vo de vo do th pa yr ox M M in et oc ecl e lo izin pr am e S ide Te ulin rfe da c Te nad tra in e Th cyc ia line m Ti azo ni da le zo le Am 2. Lipophilicity pH-metric measurment of logP and logD 7,00 6,00 5,00 4,00 2,00 1,00 0,00 -1,00 logP missing for; •Folic acid •Carbamazepin •Prednisone •Carisoprodol 3,00 Series1 logP (neutral) Series2 logD (pH 7.4) 2. Lipophilicity Experimental logP vs calculated logP R2 = 0,70 8,0 8,0 logPACD logPcrip 6,0 4,0 6,0 4,0 2,0 2,0 Crippen logP 0,0 2,0 4,0 6,0 8,0 ACD/LogP 0,0 0,0 -2,0 R2 = 0,88 -2,0 -2,0 logPexp -2,0 0,0 2,0 4,0 6,0 8,0 logPexp -4,0 -4,0 6,0 R2 = 0,89 logPMor logPClogP 8,0 6,0 4,0 3,0 Moriguchi logP 0,0 0,0 2,0 4,0 logPexp -4,0 4,0 1,0 ClogP (BioByte) -2,0 5,0 2,0 2,0 0,0 -2,0 R2 = 0,80 6,0 8,0 -2,0 -1,00,0 -2,0 -3,0 2,0 4,0 logPexp 6,0 8,0 2. Lipophilicity Correlation between the measured HPLC capacity factor (k) and pH-metric logD (pH 6.8) •Compounds from the 8 corner points have different colors R2 = 0.92 •The 2 compounds at each corner point have the same color •The axis points are colored black (pH=6.8) •Center point pink fe na di ne M e Ch cli zin lo rp e ro th ix en Fe e no fib ra te G lip iz Fo ide lic Ac Be i Su d nd ro lin flu da m c et hi Le az id vo e th yr ox Fl up ine en M et oc thix ol lo pr Ca am rb id am e az ep Pr in e ed ni Te son Hy e tra dr cy oc cli hl ne or ot Ch hia zid lo rz ox e a Am zon e an ta di ne Te r Log (mg/mL) 3. Solubility Measurment of intrinsic solubility using CheqSol (24-compound data set) 4,0 3,0 2,0 0,0 -1,0 -2,0 -3,0 Solubility ranges from 0.009 mg/ml to 2119 mg/ml 1,0 names 3. Solubility Name 1 Phthalic Acid 2 Quinine Equilibrium solubility CheqSol Shake-Flask Literature All results in µg/mL 5330 5950 363 201 3 Trazodone 134.6 138.0 4 Nitrofurantoin Kinetic Solubility Kinetic Solubility Chaser non-chaser 8462 491 391 435 112.5 109.5 78.9 5 Nortriptyline 27.0 49.3 20.0 6 Verapamil 48.5 48.5 9.7 7 Niflumic Acid 9.53 29.5 8 Imipramine 17.2 21.7 9 Flumequine 34.2 20.7 10 Furosemide 19.7 20.4 5.9 11 Maprotiline 5.80 8.05 3.49 77 12 Piroxicam 5.92 5.95 3.16 233 13 Warfarin 5.30 5.25 5.60 120 14 Chlorpromazine 2.70 2.41 15 Lidocaine 3500 3810 4600 740 1100 5900 2400 16 Famotidine 17 Hydrochlorothiazide 319 27.3 47.8 19 of the compounds studied also present in the 691compound data set 59 18.1 17.3 121 96 1.71 2.70 630 700 18 Chlorpheniramine 608.3 615.2 19 Sulfamerazine 200.3 203.0 701 20 Ketoprofen 130.6 178.0 336 21 Propranolol 81.0 70.0 340 22 Ibuprofen 50.0 49.0 180 23 Pindolol 41.7 32.7 1424 24 Miconazole 1.00 0.67 25 Diclofenac 0.90 0.80 26 Amodiaquin 0.41 8.8 27 Pamoic acid 0.0003 0.019 Compound not present in the 691 data set 668 CheqSol solubility ranges from 0.9 mg/mL to 3500 mg/mL in these 19 compounds In the 24-compound data set the solubility ranges from 0.009 mg/ml to 2119 mg/ml 45 http://www.cheqsol.com/download%20files/download01.pdf 24-compound data set is structurally diverse 8 6 4 8 2 0 Polarity -2 6 -4 -6 Lipophilicity -8 4 Size -10 -8 2 0 t[2] NoNoclass Class 19-data Classset 1 Classset 2 24-data -2 -4 -6 -8 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 t[1] SIMCA-P+ 11 - 2006-11-10 14:05:50 -6 -4 -2 0 2 4 6 8 10 16 SIMCA-P 1112 - 2006-11-0114 16:08:45 4. Permeability/absorption Furosemi de Hydro chl o ro thi azi de Atenol o l Ci meti di ne 10 M anni tol Terbutal i ne -4 Human jejunum permeability (x 10 cm/s) at pH 6.5 1 00 Amo xi ci l l i n (C) Li si no pri l (C) M eto pro l o l 1 Cephal exi n ( C) Enal apri l (C ) Pro prano l o l Phenyl al ani ne (C) Desi prami ne 0 .1 Anti pyri ne Pi ro xi cam l o g Y = 0 .6 53 2 l o gX - 0 .3 03 6, R2 = 0 .7 27 6 (al l drug s) l o g Y = 0 .7 52 4 l o gX - 0 .5 44 1, R2 = 0 .8 49 2 (passi vel y di ffusi ve) Log Y = 0 .5 4 2Lo g X + 0.06 , R2 = 0.78 54 (C arri er-medi at ed) 0.01 0 .0 1 0.1 1 Caco-2 permeability (x 10 10 -6 100 10 00 Verapami l (C) Keto pro fen Napro xen D-G l uco se (C) cm/s) at pH 6.5 Sun, D. et al. Comparison of Human and Caco 2 Gene Expression Profiles for 12,000 Genes and the Permeabilities of 26 Drugs in the Human Intestine and Caco 2 Cells. Pharm Res 2002, 19, 1398-1413 Er yt hr om Ca ycin pt o L e v pril Hy T dr et od o c ra op hl cy a o r cl ot in hi e a A m z id Be ilo e nd F ri ro flu olic de m a Le eth cid vo iaz t h id yr e o S u x in Te lin e r fe da n c F M lup adi et e n ne o Ch clo th ix lo pra ol rp m r o id th e ix G en Ca lip e ris izid Am o pr e an od o Pr tad l ed in ni e T Ca in son rb ida e am z a z ole T e Ch hia p in lo m a e rz zo ox le az on e P app/(10-6 cm s -1) High Medium Low 4. Permeability/absorption In vitro Papp values in human Caco-2 cells 100,00 10,00 1,00 0,10 0,01 Suggestions on the ”Uppsala diverse data set” usage • The 24 compounds can be used – as a test set for testing already derived models of permeability, lipophilicity, solubility etc. – as a validation set for new experimental techniques – on its own for building and validating models by dividing it into a training set and a test set We hope that other groups are willing to help us to supplement the herein-started characterization ”Bench mark data set” J. Med. Chem.; (ASAP); 2006; 49(23); 6660-6671 Acknowledgements Faculty of Pharmacy Uppsala University Christian Sköld Torbjörn Lundstedt Anders Hallberg Hans Lennernäs Sirius Analytical Instruments Ltd John Comer Karl Box Ruth Allen Jon Mole AstraZeneca R&D Mölndal Susanne Winiwarter Anna-Lena Ungell Johan Wernevik Fredrik Bergström Leif Engström