Aucun titre de diapositive - u

Download Report

Transcript Aucun titre de diapositive - u

Ligand-Based Virtual Screening: Extraction of Knowledge from Experimentally Confirmed Ligands, and the Quest for other Candidates Matching the Known.

Dragos Horvath

Laboratoire d’InfoChimie, UMR 7177 CNRS – Université de Strasbourg 6700 Strasbourg, France

[email protected]

1. « Ceci n’est pas une molécule » Molecular Models and Descriptors in Chemoinformatics: Numerical Encoding of Structural Information

Molecular Descriptors or Fingerprints • Need to represent a structure by a

characteristic

bunch (

vector

) of numbers (

descriptors

).

– Example: (Molecular Mass, Number of N Atoms, Total Charge, Number of Aromatic Rings, Radius of Gyration) • Should include

property-relevant

aspects: – the “

nature

” of atoms, including information on their

neighbor hood-induced properties

, and their

relative arrangement

.

– Number of N Atoms  (Primary Amino Groups, Secondary Amino Groups, … , … , Amide, … , Pyridine N, …) –

unless being a

H bond acceptor

is the key (O or N alike)!

– Arrangement in Å) or in the

space

(

3D

, conformation-dependent distances in

molecular graph

(

2D

, topological distance = separating bond count)

Example 1: ISIDA Sequence Counts

O-C*C*C*C-N 1 O-C*N*C*C-N 1 … … (1,1,…) (2,0,…)

Example 2: Fuzzy mapping of pharmacophore triplets (2D-FPT)

Atoms labeled by their pharmacophore types:

• • •

Hydrophobic , Aromatic Hydrogen Bond Donor , Cation Hydrogen Bond Acceptor , Anion 3 3 3 3 4 4 5 5 3 5 5 4 4 6 7

0 0 0 … 0 0 …

+6

… …

+3

… … …

D i (m) = total occupancy of basis triplet i in molecule m.

… 0 …

Chemically Relevant Typing: Proteolytic equilibrium dependence of 2D-FPT ?

12% 88%

3: A 3D Example: Overlay-Based ComPharm Pharmacophore Field Intensities 1 2 3 4 5 Pharmacophoric Features Alk. Aro. HBA HDB (+) X 11 X 12 X 13 X 14 X 15 (-) X 16 X 21 X 22 X 23 X 24 X 25 X 26 X 31 X 41 X 51 X X X 32 42 52 X X X 33 43 53 X X X 34 44 54 X X X 35 45 55 X X X 36 46 56 •

A descriptor of the nature of the molecule’s pharmacophoric neigh borhood “seen” by every reference atom, assuming an optimal overlay of the molecule on the reference...

2. Computer-Aided Ligand-Based Design: the « Medicinal Chemistry » of Ligand Fingerprints « Similar molecules have similar properties » → « Molecules with similar fingerprints have similar properties » « Structure-Activity Relationships » → « Fingerprint-Activity Relationships » (or Quantitative Structure-Activity Relationships, QSAR)

2.1 Molecular Similarity in Chemoinformatics Molecular Similarity Expressed by Fingerprint Similarity

The Similarity Principle – Neighborhood Behavior

Molecule Pairs M,m * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * (TP) * * * * * * * Potentially (!) False Negatives (FN) * * * * * * * * * * * * * *

S

(m,M)

Similarity-Based Virtual Screening…

Active Reference Reference Fingerprint Nearest Neighbors Superposition-based Similarity Scoring Automated Fingerprint Matching...

Best Matching Candidates

Ligand Candidate Fingerprint Library

Strenght & Limitations of Similarity-based VS   (+) Only need ONE active ligand to seek for more like it… (+) With appropriate descriptors, calculated similarity may be complementary to the scaffold-based similarity perceived by medicinal chemists  → « Scaffold Hopping »: bypassing synthetic bottlenecks and/or pharmacokinetic property problems, patent space evasion,

etc.

?

 (--) Within the reference ligand, « all groups are equal, but some are more equal than others » when it comes to controlling activity…

mismatch the latter

??

so what if we

2.2: So, we need to LEARN the features that really matter – building QSARs Mol M 1 M 2 M 3 M 4 M ..

M m Act A 1 A 2 A 3 A c 4 A c ..

A c m D 1 d 11 d 12 d 13 d 14 d 1… d 1m D 2 d 21 d 22 d 23 d 24 d 2… d 2m D 3 d 31 d 32 d 33 d 34 d 3… d 3m … … … … … … … D n d n1 d n2 d n3 d n4 d n… d nm Model Fitting A QSAR model expresses observed correlations between

certain

descriptors and activity

A

 a

i i i

´ ´ ´

D i i i

linear D D D n M (D 1 ,D 2 ,D 3 ,…,D n ) A neural net (M active) D i

(M inactive) decision tree ?

neighbor hood model

1 2 3 4 5 6 7 8 9 10

MolID

Correlations: The Cornerstone of QSAR Philosophy (or, perhaps, Religion?) I always end up in this deplorable state,

Class

Phe Ring HB Acceptor Count Count of C-N pairs at 5 bonds

3 2 

1

1 1

Gin-Soda 2 2 Whisky-Soda… 1 1 2

… therefore, as of tomorrow, I decided to

1 3 2

0 0 0 0 0

2 1 2 1 1 2 1 0 2 1 0 2 3 5 4 5 4

Correlation is not Causality - an Obvious, but Inconvenient Truth…  SAR sets are always limited in diversity and therefore may (and always will) accomodate coincidental relationships between different features:

Diverse library of 16x6x10=960 compounds… with N PC =N HD

Why Lucky Correlations may Outperform more Rigorous Modeling… • Rule-based pharmacophore feature assignment has a hard time with the imine group =N–. In rule-based triplets of benzodiazepine receptor ligands, it was flagged as cation.

• Proper pH-sensitive flagging corrected this error… and dramatically reduced model quality!

• Labelling as ‘cation’ was a way to ‘highlight’ that N group – and, since preferentially seen in actives, highlighting made QSAR learning easier

CAUSAL QSAR… but not in the way you’d expect it! A Psychedelic µ Receptor Model… • Training set: small combinatorial carbamate library, of 240 compounds obtained by robotized synthesis, LC/MS purity control and µ receptor affinity (pIC 50 ) measurement (proof-of-concept study, CEREP 1997)

-OR, -NR 2

• A successful ComPharm QSAR model (R 2 ≈0.8) was built to explain the measured pIC 50 values (btw. µ- and mM) − HB-acceptor in affinity

para

of benzyl alcohol enhances µ receptor

…based on wrong experimental data!

• The most “active” carbamates of the training set turned out to be contaminated with ‰ traces of decarboxylation product, featuring the opioid ligand specific tertiary amine and having nanomolar potencies!

+

• Our QSAR actually explained… the decarboxylation mechanism:

p-

OR or –NR 2 stabilizes the intermediate carbocation… thus rendering contamination possible!

Not seeing the Scaffold because of the Molecules: why Water is a Thrombin Inhibitor.

• Non-linear Thrombin affinity model, with R 2 train =0.92, Q 2 =0.84 and R 2 validation =0.61

.

– pK i = 2.2e-4

Ar4Hp14PC12

2 –3.8e-5

Ar4Ar12HA10

2 +4.36/{1+exp[-(

Ar10Ar12Hp8

-71.3)/119]} + –1.4

HD8Hp6PC4

2 +

2.45exp[-(Ar6Ar10HD14-19.8) 2 /136 2 ] 0.77exp[-3(Ar12HD8Hp12-13.3) 2 /104 2 ]

–1.05/{1+exp[ (

Ar12Hp4Hp10

-185.1)/327]} +7.3e-4

HA4Hp4Hp6

–2.26{1+exp[-(

Ar12Hp6NC8

-3.2)/13]} –2.2e-3

HA10HD4Hp8

+

–4.71exp[-(HA4HD4Hp4-43.4) 5.94exp[-(HA12Hp14PC12-1.2) 2 /15 2 ] 2 /122 2 ]

• If all population levels are zero, the calculated pK i is of 5.9, mostly due to contributions from the highlighted Gaussians.

– Absence of pharmacophore triplets automatically qualifies any small molecule as micromolar thrombin inhibitor!

• The model has learned that, for benzamidines – the chemical class represented in the training set, the presence of underlined triplets coincides with a loss of activity.

Beware of “Antipharmacophores”!

• Ar12-HD8-Hp12 is an “antipharmacophore” of the Thrombin model: –

Absence of

this pharmacophore triplet means an enhanced activity, its

presence

correlates with an observed affinity loss • The undesirable Presence, in the above context, implicitly means presence

in specific points of the ubiquitous benzamidine scaffold

– Fragments or pharmacophore elements that are genuinely “bad” for activity, no matter where they are located, are rare.

An “antipharmacophore” rather reflects poor training set diversity!

Actives Inactives Actives, but not available for training

The Phantom Scaffold… materializes when adding diverse inactives to the training set • All the training molecules –

both actives and inactives

– being benzamidines, the QSAR model cannot possibly learn the importance of the benzamidine moiety!

• After enriching the thrombin data set with inactives,

one

model out of ~2000 was able to predict the activity of unrelated thrombin ligands of known binding geometries.

– Scaffold Hopping: Yes, we can!

• Triplets entering the model successfully corroborate some of the ligand features involved in binding.

These include features entering the pockets P 1 aromatic moiety binding in pocket P 3 .

and P 2 , but not the

Enhanced Training Set Diversity leads to Models with “Scaffold Hopping” abilities

Missing P 3 : deleting a Phe does not lead to ‘the same molecule, but with one phenyl less’. • The training set included both examples of actives, with required aromatic P 3 substituent and inactives, without this importance of P 3 ?

• In all training examples, however, the removal of the P 3 moiety was always done by deacylating the phenylalanine… ? P 3 – thus, compounds missing P 3 substitution were systematically

compounds having one excess protonable group

the model actually chose to learn the ‘bogus’ rule that one more cation causes an activity loss.

“Let

s

be a

representative sample

of the set

S

…” • It takes a sample of ~10 4 individuals to extrapolate the voting intentions of a population of ~10 7 .

What’s the representative subset size of 10 25 drug-like compounds?

– If we ever dared to publish QSARs trained on fewer compounds, shame on us!

• If given N=3 coordinate pairs (Y,X), not even Carl Friedrich Gauss could come up with a model more sophisticated than Y=aX 2 +bX+c – Don’t listen when they say that Support Vector Machines have very few “tunable parameters”!

• May your model apply to one million and one molecules – it may still fail for the one million and second!

One cannot validate QSAR – but just fail to invalidate it!

We are Medicinal Chemists – tell us about Pharmacophore Models, forget QSAR!!

No knowledge of the active site – need alternative overlay hypotheses !

• Bad news: Pharmacophore models are just a peculiar type of 3D-QSAR: – use overlay models to “bind” descriptors to specific spots in space – Pharmacophore hot spots are defined by the consensual presence of groups of similar type, throughout the series of known actives – Descriptors are occupancy levels of these spots

Kama Sutra with Ligands: Match As Many Equivalent Pharmacophore Features You May!

+ ?

?

?

+ ?

methotrexate + + + + ?

+ dihydrofolate Böhm, Klebe, Kubinyi, “Wirkstoffdesign” (1999)

The Cox-2 Scenario: An Ideal Pharmacophore Model

• As exhaustive and diverse as possible a set of CycloOxygenase-II (Cox2) inhibitors, including : – A set of 1914 marketed drugs of the U.S. Pharmacopeia, tested on Cox2 by the Cerep laboratories (BioPrint TM ).

– A set of 326 inhibitors compiled from literature by N. Baurin (thanks!), including co-crystallized selective Cox2 ligand SC 558 and related medicinal chemistry series .

– Potencies, expressed as pIC 50 = -log IC 50 [mol/l] can be directly compared (cross-check on compounds present in both subsets)

Minimalistic Overlay-based Model

– Training RMS=0.712, R 2 =0.712

– validation RMS=0.698, Q 2 =0.724

Descriptor Ar@Atom#2 Hp@Atom#11 HA@Atom#20 HD@Atom#20

zexp

(Ar-HA5)

zsig3

(#PC) Intercept Coeff. 0.191 0.179 0.430 -0.428 1.414 1.414 0.000

HipHop Acceptor!

H yd ro p h o b e R e q u e st e d Aromatic Requested D on or F or bi dd en Acceptor Requested

Furthermore, it supports « Scaffolfd Hopping » !

• it manages to explain the Cox2 activities of the apparently unrelated nonspecific Cox1/Cox2 inhibitors : • This is an ideal scenario – scaffold-independent model trained on thousands of compounds:

so maybe the overlay models are mechanistically relevant !

Val 116 Arg 120

… or maybe not!

Val 349

?

Trp 387 

SC-558

Arg 513

?

!

Tyr 355 His 90

Predictive? Yes! Enlightening? No!

Overlay-based models correctly

explained the behavior of the two distinct Cox2 binder classes… on hand of an

erroneous

working hypothesis!!

– The ‘correct’ overlay asks an ‘anion’ (flurbiprofene –COO ) to be aligned atop of a ‘hydrophobe’ (CF 3 ) – pharmacophore matching!

(Well, is CF 3

a heresy in

a hydrophobe??)

– QSAR building is never safe from correlation artifacts, not even in models with 6 variables versus 2200 observables and excellent statistics!

– Such model may be very successful in selecting database subsets enriched in new Actives - but QSAR alone would never have elucidated the binding mechanism to Cox2!

QSAR – a Bookkeeping Tool !?

• “Bookkeeping” QSAR: a quantitative way to wrap up the information contained in your training set compounds – Models are scaffold-bound, heavily populated by scores of bogus antipharmacophores – It will typically tell you things you’d notice by simply looking at the molecules – Sampling of all the possible models fitting the observations allows to enumerate all the alternative working hypotheses that still await to be discarded… – …or validated. The model might highlight not yet tried combinations of known features with better activities!

Think positive: this provides a rational plan to challenge these hypotheses, and thus learn more from better planned experiments.

3. Did we forget something? Each Model has its Limited Applicability Domain… … even General Relativity and Quantum Physics!

Defining the Conditions of Applicability of QSAR Models – and respecting them – might help!

The Applicability Domain (AD) – A Compromise… • Restrict the applicability of a QSAR model to a well defined subset of the chemical space – the one populated by the training molecules. Insufficient sampling of chemotypes outside this AD is then irrelevant.

– How do we define this subset of chemical space to be as large as possible, while nevertheless densely enough populated by training molecules?

Feature count 1 Example: the Feature Control Approach * * * * * * * * * * * * * * * * Feature count 2

… but no miraculous solution! A real-life inspired (Gedanken)Experiment • Modeling of metal ligation propensities, with a training set composed of three subfamilies, R being alkyl chains: • pK bind = a Anilin/Acid N Anil + a Pyr/Acid N Prd + g sizeF(R) + C Acid • AD requirements: the molecule should contain  N Anil =0 or 1 aniline fragment  N Acid =0 or 1 benzoic acid fragment  N Prd =0 or 1 pyridine fragment  Alkyl chains of size as seen in training set

Contributions of a good programmer, but lousy chemist, to the understanding of QSAR!

• Would you like to know whether propane is a potent metal binder ?

– Yes, it is: pK bind = benzoic acid) g sizeF(C3) + C Acid (same as for p-propyl – But it can’t possibly be within the AD, can it?

Feature count 1

   N Anil =0 or 1 aniline fragment

Control Approach

N Acid =0 or 1 benzoic acid fragment N Prd =0 or 1 pyridine fragment

* * * * * * * * *

Alkyl chains of size as seen in training set

* * * * * Feature count 2

Building Up Trust from Consensus • pK • pK bind bind = = • pK bind = a Anil/Acid N Anil + a Anil/Prd N Anil + a Pyr/Acid N Prd + a Acid/Prd N Acid + a Acid/Anil N Acid + a Prd/Anil N Prd + g g g sizeF(R) + C sizeF(R) + C sizeF(R) + C Acid Prd Anil • These three alternative models are perfectly equivalent – or “redundant” – as far as training set molecules are concerned – Identical prediction for each training molecule, identical statistical parameters • However, they cease to be redundant when it comes to propane: C Acid ≠ C Anil ≠ C Prd – Divergent prediction by allegedly “redundant” models is a clear signal of AD violation!

Conclusions… • In Ligand-based knowledge extraction, the single most important piece of hardware is a BRAIN • Correlation is not causality… it’s

correlation!

– So, if correlations observed within the training set do apply to other molecules, forget metaphysical afterthoughts and exploit them, in successful virtual screening – However, an in-depth analysis of the model –

if feasible

– may reveal intrinsic limitations and pitfalls, and help to better delimit the AD.

• Training set diversity is the key!

– Do not hesitate to add bogus inactives, in order to teach your model proper “border conditions” such as “cosmic vacuum is inactive”. Absence of features cannot provide activity…

Conclusions… • If a big pharma manager asks you “So, is QSAR useful?”, please reply “

Compared to what?

” – A wrong QSAR model may nevertheless ring a bell in a medicinal chemist’s brain, and help to make right decisions – There are moments in life when one should rely on the accumulated knowledge, and use QSAR

to discover new combinations of known features

: new actives and sensibly decrease synthesis/testing effort, – There are moments when one should put known things aside, and venture out for random search of new paradigm-breaking ligands –

new scaffold porting the known binding pharmacophore (lead hopping)

Planck’s constant – a flavor of fundamental science (and exotic resonance of the name, evoking some oriental wisdom from high-tech Japan and fittable coefficient n

new scaffold, new binding mode, new action mechanism, but...

conveniently small)