Transcript Document

Screening
a Virtual Compound Space
Szabolcs Csepregi
Ferenc Csizmadia
Szilárd Dóránt
Nóra Máté
György Pirok
Zsuzsanna Szabó
Jenő Varga
Miklós Vargyas
ChemAxon Ltd.
Máramaros köz 3/a
1037 Budapest
Hungary
www.chemaxon.com
Drug research
Finding or making a needle in the hay stack?
virtual screening
de novo design
JChem Screen
JChem AnalogMaker
advantages
• fast
• hits are readily available for
in vitro screening
disadvantages
• limited number of available
compounds
advantages
• practically unlimited virtual
compound space
• structural novelty
disadvantages
• synthetic accessibility of
virtual hits is a problem
Drug research
Finding or making a needle in the hay stack?
virtual screening
de novo design
JChem Screen
JChem AnalogMaker
advantages
• fast
• hits are readily available for
in vitro screening
disadvantages
• limited number of available
compounds
advantages
• practically unlimited virtual
compound space
• structural novelty
disadvantages
• synthetic accessibility of
virtual hits is a problem
Virtual Screening
Find something similar to a fistful of needles
corporate database
known actives
structures found
Molecular similarity
How to tackle it?
Quantitative assessment of similarity/dissimilarity of structures
• need a numerically tractable form
• molecular descriptors, fingerprints, structural keys
Sequences/vectors of bits, or numeric values that can be compared by
distance functions, similarity metrics.
E ( x, y ) 
n
 x
i 1
i
 yi 
2
T ( x, y ) 
B( x & y)
B( x)  B( y )  B( x & y )
Virtual screening using fingerprints
Multiple query structures
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0101110100110101010111111000010000011111100010000100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
queries
0101110100110101010111111000010000011111100010000100001000101000
hypothesis fingerprint
metric
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
targets
target fingerprints
hits
Optimized virtual screening
Parameterized metrics
scaled ,asymmetric
Tanimoto
D
s min(x , y )

( x, y)  1 
  x   s min(x , y )  1    y   s min(x , y )   s min(x , y )
i i
i i
i i
i
i
i
i i
i
i i
i
i
i i
  0,1 asymmetry factor
si  N
scaling factor
weighted, asymmetric
DEuclidean
( x, y) 
 wi xi  yi  
2
xi  yi
  0,1 asymmetry factor
wi  0,1 weights
 wi 1   xi  yi 
2
xi  yi
i
i
How good is optimized virtual screening?
β2-adrenoceptor antagonist
Number of Hits
10000
1000
100
10
1
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Tanimoto
Euclidean
Optimized
Ideal
Drug research
Finding or making a needle in the hay stack?
virtual screening
de novo design
JChem Screen
JChem AnalogMaker
advantages
• fast
• hits are readily available for
in vitro screening
disadvantages
• limited number of available
compounds
advantages
• practically unlimited virtual
compound space
• structural novelty
disadvantages
• synthetic accessibility of
virtual hits is a problem
JChem AnalogMaker
Workflow
Lead
Candidates
Fragmentation
Examples
Fragmentation rules
Amide
Original molecule
Generated fragments
Fragment 1
amide 2
amide 1
Fragment 2
Ester
ester 1
ester 2
Fragment 3
Fragmentation
RECAP rules
1 = amide
2 =ester
5 = ether
6 = olefin
9 = lactam N carbon
3 = amine
7 = quaternary nirogen
10 = aromatic carbon – aromatic carbon
4 = urea
8 = aromatic N carbon
11 = sulphonamide
Xiao Qing Lewell, Duncan B. Judd, Stephen P. Watson, Michael M. Hann; RECAP – retrosynthetic combinatorial analysis procedure: a powerful new technique
for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J. Chem. Inf. Comput. Sci. 1998, 38, 511–522
JChem AnalogMaker
General algorithm
start
create building block library
generate pharmacophore hypothesis of
active compounds
create several starting compounds by random
combination of some building blocks
select parent structure
generate  variants of parent
convergence or end
of optimization
stop
Variant generation
Example: TOPAS modifier
G. Schneider et al, J. Comput.-Aided Mol. Design, 14(2000): 487-494
G. Schneider et al, Angew. Chem. Int. Ed., 39(2000): 4130-4133
Drug research
Finding or making a needle in the hay stack?
virtual screening
de novo design
JChem Screen
JChem AnalogMaker
advantages
• fast
• hits are readily available for
in vitro screening
disadvantages
• limited number of available
compounds
advantages
• practically unlimited virtual
compound space
• structural novelty
disadvantages
• synthetic accessibility of
virtual hits is a problem
Drug research
Finding or making a needle in the hay stack?
virtual screening
?
de novo design
JChem Screen
?
JChem AnalogMaker
advantages
• fast
• hits are readily available for
in vitro screening
disadvantages
• limited number of available
compounds
advantages
• practically unlimited virtual
compound space
• structural novelty
disadvantages
• synthetic accessibility of
virtual hits is a problem
Drug research
Screening a virtual compound space
virtual screening random virtual synthesis
JChem Screen
advantages
JChem Synthesizer
advantages
de novo design
JChem AnalogMaker
advantages
• practically unlimited virtual
• fast
• fast
compound
space
virtual molecules
are
likely
• hits are readily •available
for
structural novelty
in vitro screeningto be synthetically• available
• practically infinite virtual
compound space
disadvantages
disadvantages
• structural novelty
• synthetic accessibility of
• limited number of available
virtual hits is a problem
compoundsdisadvantages
Screening a virtual compound space
Smart reactions
Generic (simple)
• the equation describes the transformation only
• few hundred generic reactions can form the
basic armory of a preparative chemist
Specific (complex)
• chemo-, recognizes reactive and inactive
functional groups
• regio-, "knows" directing rules
• stereo-, inversion/retention
Customizable
• to improve reaction model quality
Smart reactions
Chemoselectivity
REACTIVITY:
!match(ratom(3), "[#6][N,O,S:1][N,O,S]", 1)
Smart reactions
Regioselectivity
SELECTIVITY:
TOLERANCE:
-charge(ratom(1))
0.0045
Smart reaction library
Example
Baeyer-Villiger ketone oxidation
SELECTIVITY: charge(ratom(2), "sigma")
JChem Synthesizer
Workflow
Virtual compound space
Available
chemicals
Active
set1
Screen
Hits
Synthesizer
Active
setn
Smart
reaction
library
Screen
Hits
JChem Synthesizer example
Dopamine D2 actives
Active sets were kindly
provided by Aureus
Pharma within a
research collaboration
between
Aureus
and
ChemAxon.
JChem Synthesizer example
Virtual hits
similarity: 2D pharmacophore fingerprint,
weighted Euclidean metric optimized for 20 random d2 actives
JChem Synthesizer example
Best virtual hits
9.88
9.53
9.82
9.73
JChem Synthesizer example
Synthesis path
step 1
Knoevenagel-Doebner condensation
JChem Synthesizer example
step 2
Baylis-Hillman vinyl alkylation
JChem Synthesizer example
step 3
Lawesson thiacarbonylation
JChem Synthesizer example
step 4
Dess-Martin alcohol oxidization
JChem Synthesizer example
Software and performance data
• virtual reactions: 500-1000 reactions/s
• random synthesis: 10-20 structures/s
• pharmacophore fingerprint generation: 100 structure/s
(includes pharmacophore point perception)
• metric optimization: 57 sec (13 parameterized
metrics, 20 structures in training set, 50 spikes)
• virtual screening: 7500 structure/s
• pure Java
client: P4 1.6GHz, RH Linux, java 1.4.2
database server: P4 2.4GHz, Windows XP, MySQL
Acknowledgements
Jean-Michael Drancourt
François Petitet
(Axovan is now
part of Actelion.)
Modest von Korff, Matthias Steger
Alex Allardyce
ChemAxon
Contact
Miklós Vargyas
[email protected]
office: +36 1 453 2661
mobile: +36 70 381 3205