Transcript Document

Presentation of a Structurally Diverse and
Commercially Available Drug Data Set for
Correlation and Benchmarking Studies
Anders Karlén
Uppsala University
H2N
SH
O
HO
HO P OH
H3C
CH3
HO P
HO
CH3
O
NH2
O
NH2
O
HO
HO
NH2
H2N
O
O
O
N
H
H3 C
H 3C
CH3
H3C
O
CH3
CH3
CH3
CH3
CH3
N
HO
CH3
O
H
O
CH3
H3 C
CH3
OH
O
CH3
O
N
N
HO
OH
O
S
O
O
O
H
HO
OH
OH
O
NH2 O
CH2NH2
NH2
Aim of study
• Derive a “benchmark data set“
–
–
–
–
Drug-like
Physicochemically diverse
Commercially available and inexpensive
Amenable to analytical measurements
• Start the generation of benchmark data
– Derive good-quality data from the same lab
Possible use of the data set
• General description of drugs
• Developing ADME/TOX filters
(permeability, solubility, plasma protein
binding etc.)
• To validate novel experimental techniques
Generation of a ”benchmark” data set based on the
list of drugs in Sweden (FASS 2001)
Remove compounds
•Molecular weight >900
•Polymers, polypeptides
•Inorganic and metal containing
799 cpds
691 cpds
450
•Select only oral, nasal, pulminal, ocular,
parenteral and rectal administered drugs
370 cpds
Select commercially available
< $800/g
332 cpds
284 cpds
Remove “odd” ATC classes
e.g. A01(Mouth and teeth),
A05(Bile acids)
Exp.
A06 (Laxative)…
design
24-compound
data set
Cost and availability of the
691-compound data set
Histogram
450 of the 691 compounds can be bought
Price range $0.03/gram - $3,228 000/gram (2001)
N
200
N
N
N
Methenamine
150
H3C
100
CH3
CH3
OH
H
50
CH2
HO
OH
Calcitrol
0.0284
- 24.9 24.9
24.9–-50.2
50.2
0.03 -24.9
50.2–- 79.6
79.6
50.2
79.6–- 100
79.6
100
Binned Price/gram ($)
100 –- 995
100
995
995
995- –3228000
3,228 000
Back
Principal component analysis
•
8
•
6
4
•
2
0
•
-2
-4
General
descriptors
General
hydrogen
bonding
descriptors
Hydrogen
bond donor
descriptors
Hydrogen
bond acceptor
descriptors
-6
S28 molecular
descriptors
-8
-10
-8
-6
-4
-2
0
2
4
6
8
10
SIMCA-P 1112
- 2006-11-0114
16:08:45 16
Principal component analysis
8
6
8
4
2
0
6
Polarity
-2
-4
-6
4
Lipophilicity
-8
Size
-10
-8
-6
-4
-2
0
2
Series Series
(Variable
MOL_WEIGHT)
(Variable
MLOGP)
Series (Variable PSASAVOL)
0 - 200-7 - -4
0 - 100
200 - 400
-4 - -1
100 - 200
400 - 600
-1 - 2
200 - 300
600 - 800
2-5
300 - 400
800 - 1000
5-8
t[2]
0
-2
-4
-6
-8
-10
-8
-8
-6
-6
-6
-4
-4-4
-2-2-2
00 0
22 2 4 4 4 6 6 68 8
12 12 14
81010 10
1214 16
1416
16
t[1]
t[1]t[1]
SIMCA-P+ 11 - 2006-11-10 10:27:53
10:32:21
10:34:12
2
4
6
8
10
16
SIMCA-P 1112
- 2006-11-0114
16:08:45
The factorial design
“A face-centered central composite design”
PC2
PC2
-+-
-++
++-
+++
---
+--
PC1
--+
PC3
PC3
+-+
PC1
24-compound data set
N
H
HO
SH
HS
N
N
HO
COOH
O
Thiamazole ()
O
O
N
H
S
Cl
N
H
H
O O
N
N
NO2
H
O
Prednisone ()
N
H 2N-SO2
O
Cl
Flupenthixol ()
O
O
S
F 3C
N
N
H
H 2N
O
S
N
O
OH
N
H
O
O
OH
O
OH O
OH
NH 2
N
N
H
O
Fenofibrate ()
H
HO
H
OH
O
O H COOH
OH
N
H
N
H 2N
N
H
N
N
COOH
N
H
Folic acid ()
N
O
NH 2
Meclizinea ()
O
O
Carisoprodola ()
HO
I
HO
I
N
OH
HO
I
H 2N
O
()
OH
COOH
H
I
Terfenadineb
PC3
N
H
Glipizide ()
N
O
N
O
N
H
N
Tetracycline ()
N
O
S
O
O
Cl
PC1
NH
Bendroflumethiazidea ()
Metoclopramide ()
O
NH
Tinidazole ()
H
CF 3
PC2
Hydrochlorothiazide ()
OH
OH
O
Chlorzoxazone ()
NH 2
H 2N-SO2
Chlorprothixene ()
O
O
N
H
NH 2
O O
N
Carbamazepine ()
Cl
N
S
NH 2
S
H 2N
NH
Amiloride ()
Sulindac ()
Cl
Cl
N
S
N
O
Cl
O
HOOC
Amantadine ()
20 proteolytes
4 nonproteolytes
Levodopa ()
Captopril ()
F
NH 2
COOH
NH 2
Levothyroxine ()
OH
HO
O
O
O
O
N
O
O
O
Erythromycin ()
OH
The cost of buying the
entire data set (at least 1
gram of each compound) is
less than $1,500
Comparison of the data sets with respect to
some common molecular descriptors
MW
691-compound data set
24-compound data set
Min
Min
60
Max
854
Mean
347
114
Max
777
Mean
349
NH2
O
HO
HO
NH2
H2N
O
O
OH
O
HO
NH2
OH
O
PSA
0
373
93
8
246
99
logPMor
6.4
7.6
1.9
2.0
5.3
1.9
logDACD_6.5
10.6
12.3
0.74
5.0
4.8
0.94
NH2 O
H
CH2NH2
HO
OH
Neomycin
HBD = 19
CH3
N
O
N
HBD
0
19
2.4
0
8
2.7
O
O
HBA
0
19
4.9
1
14
4.7
O
O
O
CH3
N
N
NH
N
Candesartan cilexetil
logPMor= 7.6
Functional group
AT
I
C
M
sA
t-A
H
C
Y
H
C
LI
C
R
O
TE
C
M
IN
E
M
IN
E
ES
AT
I
O
N
E
M
IN
E
TI
C
pA
C
C
AT
I
AR
O
M
M
AT
I
PH
A
N
ZE
H
IN
E
O
AM
C
O
p-
IN
E
IN
E
IN
E
sAM
H
ET
ER
O
AR
O
AR
O
M
AM
t- A
M
BE
TI
C
TI
C
AL
I
PH
A
PH
A
AR
O
AL
I
AL
I
q-
TI
C
TI
C
PH
A
PH
A
AL
I
AL
I
Percent of compounds containing the functional group
Comparison of the data sets with respect to
functional groups
24-set
75,00%
691- (druglike
set
FASS
only)
50,00%
25,00%
0,00%
Comparison of the data sets with respect to
ATC classes
Distribution in ATC
Number of substances
ATC
A
B
C
D
G
H
J
L
M
N
P
R
S
V
Description
GI
Blood
Cardio
Topical
Gen.hormones
Hormones
Infection
Tum.,immuno
Muscle,mov.
Nervous
Antiparasite
Respiration
Eye,ear
Various
24-set
1
0
2
0
1
3
5
1
3
6
0
1
1
0
691-set
69
21
89
36
38
14
89
53
37
134
13
52
24
22
Percent of dataset
24-set
4,2%
0,0%
8,3%
0,0%
4,2%
12,5%
20,8%
4,2%
12,5%
25,0%
0,0%
4,2%
4,2%
0,0%
691-set
9,99%
3,04%
12,88%
5,21%
5,50%
2,03%
12,88%
7,67%
5,35%
19,39%
1,88%
7,53%
3,47%
3,18%
The Anatomical Therapeutic Chemical (ATC) classification system is the most
commonly used classification system for drug substances
Start the generation of benchmark data.
Derive good-quality data from the same lab
1.
Measurment of pKa by pH-metric or pH-UV technique (n=20)
2.
Measurment of lipophilicity
3.
Measurment of intrinsic and kinetic solubility
4.
Measurment of permeability across Caco-2 Cells. A to B
direction (n=22)
(a) pH-metric logP (n=18)
(b) capacity factors by RP-HPLC (n=21)
pH-metric solubility (CheqSol technique) or shake-plate
solubility (n=17)
-2,00
-3,00
an
Be
ta
nd
ro Am dine
flu il
m ori
et de
hi
az
C
hl Ca ide
or pt
o
p
C rot pri
l
hl
or hixe
z
n
Er oxa e
yt zo
hr n
o e
F e my
no cin
Fl fibr
up a
en t e
H
yd
th
ix
ro
G
ch lip ol
lo
i
ro zid
th e
i
Le azi
Le vo de
vo do
th pa
yr
ox
M
M
in
et
oc ecl e
lo izin
pr
am e
S ide
Te ulin
rfe da
c
Te nad
tra in
e
Th cyc
ia line
m
Ti azo
ni
da le
zo
le
Am
2. Lipophilicity
pH-metric measurment of logP and logD
7,00
6,00
5,00
4,00
2,00
1,00
0,00
-1,00
logP missing for;
•Folic acid
•Carbamazepin
•Prednisone
•Carisoprodol
3,00
Series1
logP (neutral)
Series2
logD (pH 7.4)
2. Lipophilicity
Experimental logP vs calculated logP
R2 = 0,70
8,0
8,0
logPACD
logPcrip
6,0
4,0
6,0
4,0
2,0
2,0
Crippen logP
0,0
2,0
4,0
6,0
8,0
ACD/LogP
0,0
0,0
-2,0
R2 = 0,88
-2,0
-2,0
logPexp
-2,0
0,0
2,0
4,0
6,0
8,0
logPexp
-4,0
-4,0
6,0
R2 = 0,89
logPMor
logPClogP
8,0
6,0
4,0
3,0
Moriguchi logP
0,0
0,0
2,0
4,0
logPexp
-4,0
4,0
1,0
ClogP (BioByte)
-2,0
5,0
2,0
2,0
0,0
-2,0
R2 = 0,80
6,0
8,0
-2,0
-1,00,0
-2,0
-3,0
2,0
4,0
logPexp
6,0
8,0
2. Lipophilicity
Correlation between the measured HPLC capacity
factor (k) and pH-metric logD (pH 6.8)
•Compounds from
the 8 corner
points have
different colors
R2 = 0.92
•The 2 compounds
at each corner
point have the
same color
•The axis points
are colored black
(pH=6.8)
•Center point pink
fe
na
di
ne
M
e
Ch
cli
zin
lo
rp
e
ro
th
ix
en
Fe
e
no
fib
ra
te
G
lip
iz
Fo ide
lic
Ac
Be
i
Su d
nd
ro
lin
flu
da
m
c
et
hi
Le
az
id
vo
e
th
yr
ox
Fl
up ine
en
M
et
oc thix
ol
lo
pr
Ca
am
rb
id
am
e
az
ep
Pr
in
e
ed
ni
Te son
Hy
e
tra
dr
cy
oc
cli
hl
ne
or
ot
Ch hia
zid
lo
rz
ox e
a
Am zon
e
an
ta
di
ne
Te
r
Log (mg/mL)
3. Solubility
Measurment of intrinsic solubility using CheqSol
(24-compound data set)
4,0
3,0
2,0
0,0
-1,0
-2,0
-3,0
Solubility ranges
from 0.009 mg/ml to
2119 mg/ml
1,0
names
3. Solubility
Name
1 Phthalic Acid
2 Quinine
Equilibrium solubility
CheqSol Shake-Flask Literature
All results in µg/mL
5330
5950
363
201
3 Trazodone
134.6
138.0
4 Nitrofurantoin
Kinetic
Solubility
Kinetic
Solubility
Chaser
non-chaser
8462
491
391
435
112.5
109.5
78.9
5 Nortriptyline
27.0
49.3
20.0
6 Verapamil
48.5
48.5
9.7
7 Niflumic Acid
9.53
29.5
8 Imipramine
17.2
21.7
9 Flumequine
34.2
20.7
10 Furosemide
19.7
20.4
5.9
11 Maprotiline
5.80
8.05
3.49
77
12 Piroxicam
5.92
5.95
3.16
233
13 Warfarin
5.30
5.25
5.60
120
14 Chlorpromazine
2.70
2.41
15 Lidocaine
3500
3810
4600
740
1100
5900
2400
16 Famotidine
17 Hydrochlorothiazide
319
27.3
47.8
19 of the compounds studied
also present in the 691compound data set
59
18.1
17.3
121
96
1.71
2.70
630
700
18 Chlorpheniramine
608.3
615.2
19 Sulfamerazine
200.3
203.0
701
20 Ketoprofen
130.6
178.0
336
21 Propranolol
81.0
70.0
340
22 Ibuprofen
50.0
49.0
180
23 Pindolol
41.7
32.7
1424
24 Miconazole
1.00
0.67
25 Diclofenac
0.90
0.80
26 Amodiaquin
0.41
8.8
27 Pamoic acid
0.0003
0.019
Compound not present in the 691 data set
668
CheqSol solubility ranges from
0.9 mg/mL to 3500 mg/mL in
these 19 compounds
In the 24-compound data set
the solubility ranges from
0.009 mg/ml to 2119 mg/ml
45
http://www.cheqsol.com/download%20files/download01.pdf
24-compound data set is
structurally diverse
8
6
4
8
2
0
Polarity
-2
6
-4
-6
Lipophilicity
-8
4
Size
-10
-8
2
0
t[2]
NoNoclass
Class
19-data
Classset
1
Classset
2
24-data
-2
-4
-6
-8
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15
16
17
t[1]
SIMCA-P+ 11 - 2006-11-10 14:05:50
-6
-4
-2
0
2
4
6
8
10
16
SIMCA-P 1112
- 2006-11-0114
16:08:45
4. Permeability/absorption
Furosemi de
Hydro chl o ro thi azi de
Atenol o l
Ci meti di ne
10
M anni tol
Terbutal i ne
-4
Human jejunum permeability (x 10 cm/s) at pH 6.5
1 00
Amo xi ci l l i n (C)
Li si no pri l (C)
M eto pro l o l
1
Cephal exi n ( C)
Enal apri l (C )
Pro prano l o l
Phenyl al ani ne (C)
Desi prami ne
0 .1
Anti pyri ne
Pi ro xi cam
l o g Y = 0 .6 53 2 l o gX - 0 .3 03 6, R2 = 0 .7 27 6 (al l drug s)
l o g Y = 0 .7 52 4 l o gX - 0 .5 44 1, R2 = 0 .8 49 2 (passi vel y di ffusi ve)
Log Y = 0 .5 4 2Lo g X + 0.06 , R2 = 0.78 54 (C arri er-medi at ed)
0.01
0 .0 1
0.1
1
Caco-2 permeability (x 10
10
-6
100
10 00
Verapami l (C)
Keto pro fen
Napro xen
D-G l uco se (C)
cm/s) at pH 6.5
Sun, D. et al. Comparison of Human and Caco 2 Gene Expression Profiles for 12,000 Genes and the
Permeabilities of 26 Drugs in the Human Intestine and Caco 2 Cells. Pharm Res 2002, 19, 1398-1413
Er
yt
hr
om
Ca ycin
pt
o
L
e v pril
Hy
T
dr et od
o c ra op
hl cy a
o r cl
ot in
hi e
a
A m z id
Be
ilo e
nd
F
ri
ro
flu olic de
m a
Le eth cid
vo iaz
t h id
yr e
o
S u x in
Te lin e
r fe da
n c
F
M lup adi
et e n ne
o
Ch clo th ix
lo pra ol
rp m
r o id
th e
ix
G en
Ca lip e
ris izid
Am o pr e
an od o
Pr tad l
ed in
ni e
T
Ca in son
rb ida e
am z
a z ole
T
e
Ch hia p in
lo m a e
rz zo
ox le
az
on
e
P app/(10-6 cm s -1)
High
Medium
Low
4. Permeability/absorption
In vitro Papp values in human Caco-2 cells
100,00
10,00
1,00
0,10
0,01
Suggestions on the ”Uppsala
diverse data set” usage
• The 24 compounds can be used
– as a test set for testing already derived models of permeability,
lipophilicity, solubility etc.
– as a validation set for new experimental techniques
– on its own for building and validating models by dividing it into a
training set and a test set
We hope that other groups are willing to help us to supplement
the herein-started characterization
”Bench mark data set”
J. Med. Chem.; (ASAP); 2006; 49(23); 6660-6671
Acknowledgements
Faculty of Pharmacy
Uppsala University
Christian Sköld
Torbjörn Lundstedt
Anders Hallberg
Hans Lennernäs
Sirius Analytical Instruments Ltd
John Comer
Karl Box
Ruth Allen
Jon Mole
AstraZeneca R&D Mölndal
Susanne Winiwarter
Anna-Lena Ungell
Johan Wernevik
Fredrik Bergström
Leif Engström