Title of Presentation
Download
Report
Transcript Title of Presentation
MultiY Recursive Partitioning
– Method and Applications
Robert Brown, Shashidhar Rao, Tom Stockfisch,
Accelrys Inc
David Roush, Litai Zhang, FMC
UK-QSAR meeting – June 2002
Outline
• Introduction
• PUMP-RP Methodology
• Selectivity Study – COX2 inhibitors
• HTS study – FMC
• Summary
Introduction
• High-throughput chemistry and biology are creating a wealth of data
that can lead to knowledge to expedite the drug-discovery process
• Requirement for high-throughput methods to model HTS data for insilico screening
–
HTS data is characterized by huge number of observations, low hit rates, lots of
noise
–
Need high-speed methods for prediction
–
Recursive Partitioning (CART, FIRM), Linear Discriminant Analysis, Neural Nets,
Binary QSAR etc
• Would like to understand trends and selectivity across assays
–
mine the HTS data matrix
Standard CART RP
• Input
–
Multiple descriptors (X) - continuous or categorical
–
Single screening result (Y) - categorical (e.g. yes/no)
• Decision tree aims to separate different types of observation into
different leaves of the tree
• Two step procedure:(1) overgrow then (2) prune
1
•decrease impurity during growth phase
•choose split with greatest drop in impurity
•stepwise procedure w/ no look-ahead
examines only a small fraction of possible trees
• over-grows the tree
•decrease R during pruning
2
•R R0 + Nterminal
•stepwise procedure finds optimum R over all
possible subtrees of the overgrown tree
Understanding Selectivity?
Target 1
Target 2
I
A
A
A
I
I
I
A
I
I
A
A I
A
I
I
A
• Hard or impossible to compare trees to see what produces selectivity
• Requires enough data to determine two separate trees
PUMP-RP
• One tree combines both
responses
–
Easy to see what makes a molecule
selective
–
Easy to see what the targets have in
common
–
Twice the activity data available to
determine generic portion
Yk-generic
splits
Activity
-type
splits
I1 I2 A1
I2 A1
I1 I1
• Use for Specificity (e.g., Y1, Y2
different targets)
• Use for multi-physical models
(e.g., Y1= activity, Y2= toxicity)
I2
Yk-specific
splits
Partially Unified Multiple Property Recursive Partitioning: A New
Method for Predicting and Understanding Drug Selectivity, Thomas
Stockfisch in preparation for J. Chem. Inf. Comput. Sci.
A2
I2
A2
A2
I2
I2
New algorithm
• Obtain a balance between a single
general tree and a series of unrelated
specific trees
• Procedure
–
1. Map data to a single Y variable
–
2. Grow a pure specific tree - k node at
level 1
–
–
3. Regrow a k-branch - save the k split and
replace with a non-k split
4. Recursively repeat step 3 moving the knodes “down” until arriving at the
maximally generic tree
–
5. Prune the generic tree - replace some
generic branches with specifics
–
6. Find the optimal tree to balance
specificity and generality
Yk=1
K-split wins
X
X1
0.1
X2 Y1 Y2
X1 0.41
k
1,I
0.2 0.4
0.3
I
I
A
I
1,I
1
X
X K
Y
X2 0.41 X
2
0.1 0.2
0.1 0.2
1
2
X
X12b 0.91
X12b 0.11
0.6 A unk
1,I
X
Y1,IkX
Y
0.2
Y
Single-YXk
plus
K column
2 .0.61
X1 .61
Multi-Y
1,I
X2,I
12a 0.91
2,I
0.2 0.4 1
1,A
0.2 0.4 2
2,I
X12a 0.11
0.3 0.6 1 1,A
X
1,A
separate Y1 model
2,I
2,I
2,I
2,A
separate Y2 model
Outline
• Introduction
• PUMP-RP Methodology
• Selectivity Study – COX2 inhibitors
• HTS study – FMC
• Summary
Selectivity Study: COX-2 selectivity
• Cyclooxygenase (COX) is a key enzyme in the prostaglandin
biosynthesis via the pathway of arachadonic acid breakdown.
• Two isoforms, COX-1 (constitutive) and COX-2 (triggered by
inflammatory insults) are known and characterized.
• COX-2 inhibitors are anti-inflammatory agents with minimal GI sideeffects.
–
Celebrex and Vioxx
• Inhibition of COX-1 can lead to gastric damage, hemorrhage or
ulceration
–
NSAIDS e.g Iboprofen, Aspirin etc
Partially Unified Multiple Property Recursive Partitioning (PUMP-RP)
Analyses of Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao
&Thomas P. Stockfisch in preparation for J. Chem. Inf. Comput. Sci.
Study Input
• 454 Diaryl heterocycle cyclooxygenase (COX) inhibitors with phenyl sulfones &
phenyl sulfonamides from published literature.
–
Inhibitory activities (IC50) against COX-1 and COX-2 isoforms of the enzyme.
–
Divided into 2 classes for each target:
• COX-1 - IC50 > 5 M (Class 0). IC50 <= 5 M (Class 1)
• COX-2 - IC50 > 0.5 M (Class 0). IC50 <= 0.5 M (Class 1)
–
Divided into
• Test set (TE) of 50 compounds: 17 COX-2 selective
• Training set (TR) of 404 compounds: 181 COX-2 selective.
• External validation sets
–
25 Merck cyclooxygenase inhibitors
• represents a different class of chemistry than that covered by the training and test sets
–
8 NSAIDs (aspirin, ketoprofen, naproxene, desmethylnaproxene, ibuprofen, indomethacin,
phenytoin and diclofenac)
• all active and non-selective
Example Tree
I1 (125)
A2 (30)
HB Donor <=1
Jurs-FNSA-3 <= -0.2
COX-2
selective
I2 (95)
A2 (112)
AlogP98 <=3.1
A1 (112)
I2 (61)
I1 (61)
ISIS_key59
generic split
I2 (6)
TRUE
A1 (6)
JY <=2.083
Yk = 1 split
FALSE
Specific split
A2 (100)
A1 (100)
Why not just calculate two trees?
FH2O <=-30.1
A2 (127)
AlogP98 <= 2.6
COX-2 Inhibition
I2 (4)
Apol <=14051.8
JX <=2.01
ISIS Key #75
I2 (4)
I2 (231)
A2 (9)
I2 (29)
I1 (148)
Dipole Mom. <=5.87
COX-1 Inhibition
JX <= 1.79
ISIS Key #94
A1 (8)
A1 (9)
A1 (6)
Shdw-XZ fract <= 0.7
ISIS Key #66
A1 (36)
A1 (104)
Shdw-nu <= 2
AlogP98 <= 3.1
I1 (63)
I1 (30)
Prediction of Selectivity
• Percentage of actives correctly predicted by RP trees compared to experiment
Both COX-1 & COX-2
COX-1 Only
COX-2 Only
TR
42% to 67%
71% to 85%
64% to 80%
TE
52% to 68%
64% to 84%
66% to 90%
• Enrichment in Cox2 selectives
–
1.56 to 1.86 in the training set (TR)
–
1.60 to 2.29 in the test set (TE)
–
Remember: 44% of TR is Cox2 selective, so the best possible enrichment in TR would be ~2.2
False positive and negative selectivity rates
Training set (TR)
Test Set (TE)
SRfp
16.9% to 26.8%
25% to 45.5%
SRfn
21.1% to 27.9% 27.1% to 40.2%
External Validation Sets
• 25 Merck compounds – 21 actives including 13 COX2 selective, 4
inactive
–
21 correctly predicted COX2 active, 8 correctly predicted COX1 active
–
8 correctly predicted COX2 selective
–
Correctly predict that none are COX1 selective
• 8 NSAIDs: aspirin, ketoprofen, naproxene, desmethylnaproxene,
ibuprofen, indomethacin, phenytoin and diclofenac.
– All predicted to be non-selective
– five of them (ketoprofen, naproxene, ibuprofen, indomethacin and diclofenac) are
predicted to be active
–
three including aspirin predicted inactive
• Aspirin is a weak inhibitor of both COX 1 and 2 (IC50 ~ 150-300 nM)
Outline
• Introduction
• PUMP-RP Methodology
• Selectivity Study – COX2 inhibitors
• HTS study – FMC
• Summary
Assay Enrichment Study
• 66000 FMC compounds library screened in two functional assays (I and
II) returning two classes of activity (0 and 1)
–
Assay I has two follow up assays [I(1); I(2); I(3)]
• 60, 33, 24 actives respectively
–
Assay II has one follow up assay[II(1); II(2)]
• 109, 12 actives respectively
–
X(1) is a primary assay, whilst (2) and (3) are related to specific mechanisms
• Goal
–
Combine multiple data from multiple assays for endpoint X to
• Explain factors causing activity
• Use maximum data to get best predictive model
Computational Protocol
• The 66000 compounds were divided in half for training and test sets
with even distributions of actives/inactives for both assays
• Six sets of descriptors
–
Bcuts (8),
–
Cerius2 Fast descriptors (199),
–
Jurs descriptors (30),
–
ISIS keys (166),
–
3D Atom pairs (825)
–
CCG-2D (145)
Mining Large Databases Using Multiple Y Recursive
Partitioning, David Roush, Litai Zhang, Thomas Stockfisch and
Shashidhar Rao, in preparation for J. Chem. Inf. Comput. Sci.
Single Y vs Multi Y – Cerius2 Descriptors
Actual Hit False Negative
Rate (%)
(%)
False Postive (%)
Enrichment Factor
Assay I
(1)
0.17
30
38
99.1
95.3
5x
26x
Assay I
(2)
0.09
31
37
99.2
94.4
8x
55x
Assay I
(3)
0.07
46
42
99.5
99.5
8x
6x
Test Set Results
Single Y vs Multi Y – ISIS Keys
Actual Hit False Negative
Rate (%)
(%)
False Postive (%)
Enrichment Factor
Assay I
(1)
0.17
41
55
97
97
16x
16x
Assay I
(2)
0.09
35
55
98
97
22x
34x
Assay I
(3)
0.07
59
59
99.7
99.1
4x
15x
Test Set Results
Single Y vs Multiple Y
• Multi Y produces better enrichments with better false positive rates
• Single Y produces better false negative rates
• => More information has produced a more selective screen
• Logistically, only one experiment to run
• Multi Y allows the factors/descriptors important to all assays to be
identified
PUMP-RP - Assays I(1)
PUMP-RP - Assays I(2)
PUMP-RP - Assays I(3)
PUMP-RP - Assays I (all assays)
Summary
• PUMP-RP procedure creates tree with target-generic splits near the
root, target-specific splits near the leaves, and separated by splits on
the activity type.
–
the generic splits benefit from being determined by a larger amount of data than if
separate models were made
–
easy to interpret which splits determine specificity and which show commonality of
target
• Prediction and understanding of COX-2 selective molecules
• Large scale experiments with FMC show use of multiple assay data to
enhance understanding of activity
• Commercial released in Cerius2 4.6
Forthcoming Publications
• Methodology
–
Partially Unified Multiple Property Recursive Partitioning: A New Method for
Predicting and Understanding Drug Selectivity, Thomas Stockfisch, in preparation
for J. Chem. Inf. Comput. Sci.
• COX Selectivity Study
–
Partially Unified Multiple Property Recursive Partitioning (PUMP-RP) Analyses of
Cyclooxygenase (COX) Inhibitors, Shashidhar N. Rao &Thomas P. Stockfisch in
preparation for J. Chem. Inf. Comput. Sci.
• FMC HTS Study
–
Mining Large Databases Using Multiple Y Recursive Partitioning, David Roush, Litai
Zhang, Thomas Stockfisch and Shashidhar Rao. in preparation for J. Chem. Inf.
Comput. Sci.