Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies de Lille [email protected].
Download ReportTranscript Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies de Lille [email protected].
Topological, pH-dependent Fuzzy Pharmacophore Triplet Fingerprints 2D-FPT Dragos Horvath, Fanny Bonachera UMR 8576 CNRS Université des Sciences & Technologies de Lille [email protected] Pharmacophore Patterns • Ligand-site affinity ~ functional group complementarity • Functional groups of similar physicochemical behavior represent pharmacophore types: – Hydrophobic, Aromatic, Hydrogen Bond (HB) donors, Cations, HB Acceptors, Anions. • The pharmacophore pattern of a molecule characterizes the relative arrangement of all its pharmacophore types – What pharmacophore types are represented? – How are they arranged (spatially, topologically) with respect to each other ? – How can these aspects be captured numerically to yield molecular descriptors of the pharmacophore pattern? Exploiting pharmacophore patterns… • N-dimensional vector D(M)=[D1(M), D2(M), …,DN(M)]; each Di encodes an element of the pharmacophore pattern – Allows meaningful quantitative definitions of molecular similarity: • Neighborhood Behavior: Similar molecules - characterized by covariant vectors - are likely to display similar biological properties • As chemists do not easily perceive the pharmacophore pattern, such covariance may reveal hidden but real molecular relatedness… – May serve as starting point for searching a binding pharmacophore – the subset of features that really participate in binding to a receptor • Machine learning to select those elements Di that are systematically present in actives, but not in inactives of a molecular learning set! Tricentric Pharmacophore Fingerprints: monitoring feature arrangement • Topological: the distance between two features equals the (minimal) number of chemical bonds between them 9 4 11 • Spatial: if stable conformers are known, use the distance in Ǻ between two features Example: Binary Pharmacophore Triplets Basis Triplets: • all possible feature combinations • at a given series of distances… 3 3 3 3 4 3 5 4 0 0 5 … 0 0 5 5 4 0 3 … … 1 ? 4 3 7 5 … … 6 … 0 Pickett, Mason & McLay, J. Chem. Inf. Comp. Sci. 36:1214-1223 (1996) … … 0 … First key improvement: Fuzzy mapping of atom triplets onto basis triplets in 2D-FPT 3 3 3 3 0 4 5 0 … 4 0 0 … +6 4 7 5 5 4 0 5 3 … … +3 6 … … … Di(m) = total occupancy of basis triplet i in molecule m. … 0 … Combinatorial enumeration of basis triplets • Example: there are 36796 basis triplets, verifying triangle inequalities, when considering 6 pharmacophore types and 11 edge lenghts between Emin=3 to Emax=13 with an increment of Estep=1: (3, 4, 5,…13) – Canonical representation: T1d23-T2d13-T3d12 with T3≥T2≥T1 (alphabetically). Hp7-Ar4-PC6 4 7 Ar4-Hp7-PC6 6 – Out of two corners of a same type, priority is given to the one opposed to the shorter edge. Ar4-Hp7-Hp6 4 7 6 Ar5-Hp6-Hp7 Triplet matching procedure • The triplet matching score represents the optimal degree of pharmacophore field overlap: – if corner k of the triplet is of pharmacophore type T, e.g. F(k,T)=1, then it contributes to the total pharmacophore field of type T, observed at a point P of the plane: 3 T (P)F(k,T)exp(T dk,P) 2 k 1 Horvath, D. ComPharm pp. 395-439; in "QSPR /QSAR Studies by Molecular Descriptors", Diudea, M., Editor, Nova Science Publishers, Inc., New York, 2001 Control parameters for triplet enumeration & matching in two 2D-FPT versions. Parameter Description Emin Minimal Edge Length of basis triangles (number of bonds between two pharmacophore types) 2 4 Emax Maximal Triangle Edge Length of basis triangles 12 15 Estep Edge length increment for enumeration of basis triangles 2 2 e Edge length excess parameter: in a molecule, triplets with edge length > Emax+e are ignored 0 2 Maximal edge length discrepancy tolerated when attempting to overlay a molecular triplet atop of a basis triangle. 2 2 Hp = Ar Gaussian fuzziness parameter for apolar (Hydrophobic and Aromatic) types 0.6 0.9 PC = NC Gaussian fuzziness parameter for charged (Positive and Negative Charge) types 0.6 0.8 HA = HD Gaussian fuzziness parameter for polar (Hydrogen bond Donor and Acceptor) types 0.6 0.7 Aromatic-Hydrophobic interchangeability level 0.6 0.5 Number of basis triplets at given setup 4494 7155 l FPT-1 FPT-2 Second key improvement: Proteolytic equilibrium dependence of 2D-FPT ? 12% 88% Third key improvement: a novel similarity scoring scheme for 2D-FPT • Classical Euclidean and Hamming distances increase whenever dk(m,M)=|Dk(M)-Dk(m)| >0… – pairs of small & simple molecules (m,m’), with Dk(m)=Dk(m’)=0 for almost all the triplets k, have few non-zero contributions – large & complex compounds (M,M’) with common, but slightly differently populated triplets Dk(M)Dk(M’) have many small contributions that may nevertheless sum up to higher Euclidean scores! • With correlation coefficients, the importance of common triplets, contributing to the cross-product Dk(m)xDk(M) may be overemphasized… Piecewise monitoring of the differences in the fingerprint… • A triplet k may, with respect to a pair of molecules, be shared (++), null (--) or exclusive (+-) – fuzzy levels of dissimilarity association to score each category c={(++),(--),(+-)} •The FPT-specific SFPT(M,m): ++(M,m) + t +- (M,m) + t -- (M,m) =1 such that t • the linear combination of fractions and partial Hamming • Specifically for each category c: with respect to a distances withcalculate, optimal Neighborhood Behavior – fractions of triplets fc in that category, subset of training data – weighed, normed partial Hamming distances PW c: FPT NT +MW ,m 1 mf , M m,M M , m 0.1323 N ++ 0.6357 W m , M c c k T k 1 NT Wk P c W c k 0.27951 m,M k 1 m , M++ f NT Wk k 1 m m,M k k M Neighborhood behavior: in how far does structural similarity guarantee similar activities? BioPrint® activity profile differences L(m,M) L(M,m) l S(M,m) s s True Positives L(M,m)> l False Positives (TP) (FP) False (?) Negatives True Negatives (FN) S (S) W (s) (S) NFP NFN NFP NFN (rand ) (rand ) W 1.0 () s (TN) (s) S L(M,m) L(M,m) (Ss) L(M,m) L(M,m) opt Specific metric significantly improves the Neighborhood Behavior of 2D-FPT (v1) Sum of Heavy Atoms in Pair Dice-N Dice Dice-W FPT-1 1 0.98 . 0.96 Optimality W 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8 0.4 0.45 0.5 0.55 0.6 Consistency 0.65 0.7 0.75 Consistency inversion of specific FPT metric may be due to top ranking of complex pairs! Dice FPT-1 1 Optimality W . 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.55 . 0.57 0.59 0.61 0.63 0.65 Consistency 0.67 0.69 0.71 0.73 Proteolytic equilibrium dependence significantly improves the NB of 2D-FPT 2D-FPT using rule-based pharmacophore flagging strategy FPT-1 1 0.98 Optimality W . 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.4 0.45 0.5 0.55 0.6 Consistency 0.65 0.7 0.75 Some ‘activity cliffs’ in rule-based descriptor space are smoothed out in 2D-FPT-space Neighborhood Behavior of 2D-FPT compares favorably to the one of other descriptors/metrics Sum of Heavy Atoms in Pair CF FBPA PFR 0.6 0.65 PF FPT-1 FPT-2 1 0.98 Optimality W . 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8 0.4 0.45 0.5 0.55 Consistency 0.7 0.75 0.8 Successful Virtual Screening Simulations Confirmed Actives (PF) Confirmed Actives (FPT-2) (OPT3) Confirmed Actives (PF) Confirmed Actives (FPT-2) (OPT3) Confirmed Inactives (PF) Confirmed Inactives (FPT-2) (OPT3) 7 % Retrieved Seed Compounds % Retrieved Seed Compounds 90 80 70 60 50 40 30 20 5 4 3 2 0 8 0 45 D2 40 35 30 25 TK 20 15 10 % Retrieved Seed Compounds 50 % Retrieved Seed Compounds 6 1 10 7 6 5 4 3 2 1 5 0 90 0 % Retrieved Seed Compounds 45 40 % Retrieved Seed Compounds Confirmed Inactives (PF) Confirmed Inactives (FPT-2) (OPT3) 35 30 25 20 15 10 80 70 60 50 40 30 20 10 5 0 0 0 20 40 60 80 100 120 Selection Size 140 160 180 200 0 20 40 60 80 100 120 Selection Size 140 160 180 200 Successful QSAR model construction with 2DFPT: predicting c-Met TK activity . Learning Set Compounds Validation Set Compounds 9 8.5 8 Experimental pIC50 7.5 7 6.5 6 5.5 5 4.5 4 4 4.5 25 variables entering nonlinear model 153 molecules for training: RMSE=0.4 (log units), R2=0.82 540 molecules 5.5 6 validation: 6.5 7 7.5(log units), 8 for RMSE=0.8 R28.5 =0.53 9 Calculated 8 validation molecules outpIC50 of 40 mispredicted by more than 1 log ChemAxon Tools used for development… • Software written in Java, based on the ChemAxon API: – molecule input and standardization tools – ShortestPath class used to calculate topological distances – pKaPlugin used to enumerate all microspecies and their relative concentrations at given pH value – PMapper used to set pharmacophore flag in each microspecies – using a customized .xml setup file that relies on the actual formal charges seen in the microspecies to set flags – JChem used for 2D-FPT storage – Marvin visualizer adapted to display actual occurrences of triplets in molecules In progress & on the wishlist… • 3D FPT version under study – does it pay off to generate conformers? How many would you need to get better results than with 2D-FPT? What’s the best conformational sampler to use? • Accessibility-weighted fingerprints? – class to return (topological and/or 3D) estimate of the solventaccessible fraction of an atom? • Tautomer-dependent fingerprints? – if tautomers and their percentage were enumerated like any other microspecies…