Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies de Lille [email protected].

Download Report

Transcript Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies de Lille [email protected].

Topological, pH-dependent Fuzzy
Pharmacophore Triplet Fingerprints
2D-FPT
Dragos Horvath, Fanny Bonachera
UMR 8576 CNRS
Université des Sciences & Technologies de Lille
[email protected]
Pharmacophore Patterns
• Ligand-site affinity ~ functional group complementarity
• Functional groups of similar physicochemical behavior
represent pharmacophore types:
– Hydrophobic, Aromatic, Hydrogen Bond (HB) donors, Cations,
HB Acceptors, Anions.
• The pharmacophore pattern of a molecule characterizes
the relative arrangement of all its pharmacophore types
– What pharmacophore types are represented?
– How are they arranged (spatially, topologically) with respect to
each other ?
– How can these aspects be captured numerically to yield
molecular descriptors of the pharmacophore pattern?
Exploiting pharmacophore patterns…
• N-dimensional vector D(M)=[D1(M), D2(M), …,DN(M)];
each Di encodes an element of the pharmacophore pattern
– Allows meaningful quantitative definitions of molecular
similarity:
• Neighborhood Behavior: Similar molecules - characterized by covariant
vectors - are likely to display similar biological properties
• As chemists do not easily perceive the pharmacophore pattern, such
covariance may reveal hidden but real molecular relatedness…
– May serve as starting point for searching a binding
pharmacophore – the subset of features that really participate in
binding to a receptor
• Machine learning to select those elements Di that are systematically
present in actives, but not in inactives of a molecular learning set!
Tricentric Pharmacophore Fingerprints:
monitoring feature arrangement
• Topological: the distance between two features equals the
(minimal) number of chemical bonds between them
9
4
11
• Spatial: if stable conformers are known, use the distance in
Ǻ between two features
Example: Binary Pharmacophore Triplets
Basis Triplets:
• all possible feature combinations
• at a given series of distances…
3
3
3
3
4
3
5
4
0
0
5
…
0
0
5
5
4
0

3
…
…
1
?
4
3
7
5
…
…
6
…
0
Pickett, Mason & McLay, J. Chem. Inf. Comp. Sci. 36:1214-1223 (1996)
…
…
0
…
First key improvement: Fuzzy mapping of
atom triplets onto basis triplets in 2D-FPT
3
3
3
3
0
4
5
0
…
4
0
0
…
+6
4
7
5
5
4
0
5
3
…
…
+3
6
…
…
…
Di(m) = total occupancy of basis triplet i in molecule m.
…
0
…
Combinatorial enumeration of basis triplets
• Example: there are 36796 basis triplets, verifying triangle
inequalities, when considering 6 pharmacophore types and
11 edge lenghts between Emin=3 to Emax=13 with an
increment of Estep=1: (3, 4, 5,…13)
– Canonical representation: T1d23-T2d13-T3d12 with T3≥T2≥T1
(alphabetically).

Hp7-Ar4-PC6
4
7
Ar4-Hp7-PC6 
6
– Out of two corners of a same type, priority is given to the one
opposed to the shorter edge.

Ar4-Hp7-Hp6
4
7
6
Ar5-Hp6-Hp7 
Triplet matching procedure
• The triplet matching score represents the optimal degree of
pharmacophore field overlap:
– if corner k of the triplet is of pharmacophore type T, e.g. F(k,T)=1,
then it contributes to the total pharmacophore field of type T,
observed at a point P of the plane:
3
T (P)F(k,T)exp(T dk,P)
2
k 1
Horvath, D. ComPharm pp. 395-439; in "QSPR /QSAR Studies by Molecular Descriptors", Diudea, M.,
Editor, Nova Science Publishers, Inc., New York, 2001
Control parameters for triplet enumeration &
matching in two 2D-FPT versions.
Parameter
Description
Emin
Minimal Edge Length of basis triangles (number of bonds
between two pharmacophore types)
2
4
Emax
Maximal Triangle Edge Length of basis triangles
12
15
Estep
Edge length increment for enumeration of basis triangles
2
2
e
Edge length excess parameter: in a molecule, triplets with
edge length > Emax+e are ignored
0
2

Maximal edge length discrepancy tolerated when attempting
to overlay a molecular triplet atop of a basis triangle.
2
2
Hp = Ar
Gaussian fuzziness parameter for apolar (Hydrophobic and
Aromatic) types
0.6
0.9
PC = NC
Gaussian fuzziness parameter for charged (Positive and
Negative Charge) types
0.6
0.8
HA = HD
Gaussian fuzziness parameter for polar (Hydrogen bond
Donor and Acceptor) types
0.6
0.7
Aromatic-Hydrophobic interchangeability level
0.6
0.5
Number of basis triplets at given setup
4494
7155
l
FPT-1 FPT-2
Second key improvement: Proteolytic
equilibrium dependence of 2D-FPT
?
12%
88%
Third key improvement: a novel similarity
scoring scheme for 2D-FPT
• Classical Euclidean and Hamming distances increase
whenever dk(m,M)=|Dk(M)-Dk(m)| >0…
– pairs of small & simple molecules (m,m’), with
Dk(m)=Dk(m’)=0 for almost all the triplets k, have few non-zero
contributions
– large & complex compounds (M,M’) with common, but slightly
differently populated triplets Dk(M)Dk(M’) have many small
contributions that may nevertheless sum up to higher Euclidean
scores!
• With correlation coefficients, the importance of common
triplets, contributing to the cross-product Dk(m)xDk(M)
may be overemphasized…
Piecewise monitoring of the differences in the
fingerprint…
• A triplet k may, with respect to a pair of molecules, be
shared (++), null (--) or exclusive (+-)
– fuzzy
levels of dissimilarity
association to score
each category
c={(++),(--),(+-)}
•The
FPT-specific
SFPT(M,m):
++(M,m) + t +- (M,m) + t -- (M,m) =1
such
that
t
• the linear combination of fractions and partial Hamming
• Specifically
for each category
c: with respect to a
distances withcalculate,
optimal Neighborhood
Behavior
– fractions of triplets fc in that category,
subset of training data
– weighed, normed partial Hamming distances PW c:
FPT
NT
+MW
,m
1
mf , M
m,M
M , m 0.1323
N ++
0.6357 W m , M
c
c
k
T k 1
NT
Wk
P
c
W
c
k
0.27951
m,M
k
1
m , M++
f
NT
Wk
k 1
m
m,M
k
k
M
Neighborhood behavior: in how far does
structural similarity guarantee similar activities?
BioPrint® activity profile differences L(m,M)
L(M,m) l
S(M,m) s
s
True
Positives
L(M,m)> l
False
Positives
(TP)  
(FP) 
False (?)
Negatives
True
Negatives
(FN) 
S
(S)
W (s)
(S)
NFP  NFN
NFP  NFN
(rand )
(rand )
W
1.0
()
s
(TN) 
 (s)
S
L(M,m)  L(M,m) (Ss)
L(M,m)  L(M,m) opt
Specific metric significantly improves the
Neighborhood Behavior of 2D-FPT (v1)
Sum of Heavy Atoms in Pair
Dice-N
Dice
Dice-W
FPT-1
1
0.98
.
0.96
Optimality
W
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.4
0.45
0.5
0.55
0.6
Consistency 
0.65
0.7
0.75
Consistency inversion of specific FPT metric
may be due to top ranking of complex pairs!
Dice
FPT-1
1
Optimality W
.
0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.55
.
0.57
0.59
0.61
0.63
0.65
Consistency 
0.67
0.69
0.71
0.73
Proteolytic equilibrium dependence significantly
improves the NB of 2D-FPT
2D-FPT using rule-based pharmacophore flagging strategy
FPT-1
1
0.98
Optimality W
.
0.96
0.94
0.92
0.9
0.88
0.86
0.84
0.4
0.45
0.5
0.55
0.6
Consistency 
0.65
0.7
0.75
Some ‘activity cliffs’ in rule-based descriptor
space are smoothed out in 2D-FPT-space
Neighborhood Behavior of 2D-FPT compares
favorably to the one of other descriptors/metrics
Sum of Heavy Atoms in Pair
CF
FBPA
PFR
0.6
0.65
PF
FPT-1
FPT-2
1
0.98
Optimality W
.
0.96
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.4
0.45
0.5
0.55
Consistency 
0.7
0.75
0.8
Successful Virtual Screening Simulations
Confirmed Actives (PF)
Confirmed Actives (FPT-2)
(OPT3)
Confirmed Actives (PF)
Confirmed Actives (FPT-2)
(OPT3)
Confirmed Inactives (PF)
Confirmed Inactives (FPT-2)
(OPT3)
7
% Retrieved Seed Compounds
% Retrieved Seed Compounds
90
80
70
60
50
40
30
20
5
4
3
2
0
8
0
45
D2
40
35
30
25
TK
20
15
10
% Retrieved Seed Compounds
50
% Retrieved Seed Compounds
6
1
10
7
6
5
4
3
2
1
5
0
90
0
% Retrieved Seed Compounds
45
40
% Retrieved Seed Compounds
Confirmed Inactives (PF)
Confirmed Inactives (FPT-2)
(OPT3)
35
30
25
20
15
10
80
70
60
50
40
30
20
10
5
0
0
0
20
40
60
80
100
120
Selection Size
140
160
180
200
0
20
40
60
80
100
120
Selection Size
140
160
180
200
Successful QSAR model construction with 2DFPT: predicting c-Met TK activity
.
Learning Set Compounds
Validation Set Compounds
9
8.5
8
Experimental pIC50
7.5
7
6.5
6
5.5
5
4.5
4
4
4.5
25 variables entering nonlinear model
153 molecules for training: RMSE=0.4 (log units), R2=0.82
540 molecules
5.5
6 validation:
6.5
7
7.5(log units),
8
for
RMSE=0.8
R28.5
=0.53 9
Calculated
8 validation molecules
outpIC50
of 40 mispredicted by more than 1 log
ChemAxon Tools used for development…
• Software written in Java, based on the ChemAxon API:
– molecule input and standardization tools
– ShortestPath class used to calculate topological distances
– pKaPlugin used to enumerate all microspecies and their
relative concentrations at given pH value
– PMapper used to set pharmacophore flag in each microspecies –
using a customized .xml setup file that relies on the actual formal
charges seen in the microspecies to set flags
– JChem used for 2D-FPT storage
– Marvin visualizer adapted to display actual occurrences of triplets
in molecules
In progress & on the wishlist…
• 3D FPT version under study
– does it pay off to generate conformers? How many would you
need to get better results than with 2D-FPT? What’s the best
conformational sampler to use?
• Accessibility-weighted fingerprints?
– class to return (topological and/or 3D) estimate of the solventaccessible fraction of an atom?
• Tautomer-dependent fingerprints?
– if tautomers and their percentage were enumerated like any other
microspecies…