Molecular Field Topology Analysis

Download Report

Transcript Molecular Field Topology Analysis

QSAR/QSPR: the Universal Approach
to the Prediction of Properties of
Chemical Compounds and Materials
V.A.Palyulin, I.I.Baskin, N.S.Zefirov
Department of Chemistry
Moscow State University
"Every attempt to employ mathematical methods
in the study of chemical questions must be
considered profoundly irrational and contrary to
the spirit of chemistry. If mathematical analysis
should ever hold a prominent place in chemistry an aberration which is happily almost impossible
- it would occasion a rapid and widespread
degeneration of that science."
A. Compte, 1798-1857
Fundamental Problem
in Chemistry:
Evaluation of relationships
between the structures
of chemical compounds and
their properties or biological activity
QSAR/QSPR: General Approach
A
T
r
a
i
n
i
n
g
Structure
–
–
Descriptors
–
–
–
N
Cl
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
N
–
Model
F: A=F(S)
N
–
N
Te
s
t
–
N
Cl
–
Predictivity
ΔA
N
Br
N
e
w
?
?
N
N
Prediction
PROPERTIES
Physico-chemical properties:
Boiling points, melting points, density,
viscosity, surface tension, solubility in
various solvents, lipophilicity, magnetic
susceptibility, retention indices, dipole
moments, enthalpy of formation, etc.
Biological activity:
IC50, EC50, LD50, MEC, ILS, etc.
Structural formula,
Molecular graph, Connectivity,
H2H4
C2H6O
H 1 C1 C2 O H 6
H3H5
C1 C2 O H1 H2 H3 H4 H5 H6
C1
1
C2 1
O
0
1
1
1
0
0
0
1
0
0
0
1
1
0
0
0
0
0
0
1
0
0
0
0
0
C1
0
0
0
0
C2 1
0
0
0
O
0
0
0
1
H1 1
0
0
H2 1
0
0
0
H3 1
0
0
0
0
H4 0
1
0
0
0
0
H5 0
1
0
0
0
0
0
H6 0
0
1
0
0
0
0
0
0
C1 C2 O
1
0
0
1
1
DESCRIPTORS
Topological indices:
Connectivity indices (Randic, c;
Kier-Hall, mcv, solvation indices mcs), Wiener W and expanded
Wiener, Balaban J, Gutman indices, Hosoya, Merrifield-Simmons
indices, indices based on local invariants, informational
indices, …
Fragmental descriptors: The number of fragments of
various size (chains, cycles, branched fragments) in a molecule
with several levels of classification of atoms
Physico-chemical descriptors: Indices based on
atomic charges and electronegativities, atomic inductive
constants, VdW volume and surface, H-bond descriptors,
Lipophilicity (Log P), …
Quantum-mechanical
3D
Usp.Khim. (Russ.Chem.Rev.), 57 (3), 337-366 (1988)
Randić Index (c)

C1
CH3
c = S 1/vivj
C2
CH CH2
bonds
CH3
CH3
C3
C5
C4
C1 C2 C3 C4 C5
1
1
2
3
4
1
C1
3
1
C2 1
2
1
c = 1/(3)1/2+1/(3)1/2+1/(6)1/2+1/(2)1/2=2.27
0
0
0
1
0
1
1
0
C3 0
1
C4 0
0
1
C5 0
1
0
0
0
Prediction of Non-Specific Solvation
Enthalpy of Organic Compounds
Solvation enthalpy (kJ/mol)
H
A/ Y
solv
 4.52  9.04 c
1
S
Vaporization enthalpy (kJ/mol)
A
Hvap
 4.13 9.521c S  0.827 2
n = 141 R = 0.985 s = 2.1
n = 528 R = 0.989 s = 2.0
μ – dipole moment
1χS –
1-st order solvation topological index
Zi – period number (measure of atom size)
δi – number of non-hydrogen neighbors
1
Zi Z j
1
c  
4 ( bonds )  i j
S
Dokl. Akad. Nauk, 1993, 331(2), 173-176
The scheme of the design of
new topological indices (TIs)
Selection
of fragments
Construction
a of
graph matrices and
their storage
Selection
of functions
Construction of topological indices
a) Using matrices
b) Using already
constructed TIs
The set of constructed TIs
for QSAR/QSPR studies
Prediction of Diffusion of
Small Molecules in Polymers
log D  11.25  9.16(min  HOMO ) 2 
~
 4.13[ln(1  W )] / N at 
~
~
 1.82[ln(1  W )] / N at
log D exp.
n = 14 R = 0.989 s = 0.103 F = 145
D – diffusion coefficient (cm2/s)
Nat – number of non-hydrogen atoms
min ρHOMO – minimal HOMO π-electron density
~ ~~
W ,W
– extended and inverted extended Wiener indices
Dokl. Akad. nauk. 1994 337 (2) 211-214
log D pred.
Sulfenamide Vulcanization
Accelerators
Resistance to preliminary vulcanization (min)
[ln( min5 )] / N  0.33 0.022CSI ( ' )
N
S
N
S
n = 12 R = 0.989 s = 0.004 F = 444
Vulcanization rate constant (min-1)
k2  0.439 5.87 max / N 16.3(maxCLUMO )2
n = 12 R = 0.990 s = 0.15 F = 213
Maximum torque increase (Nm)
R / N  8.06  5.6 ln Sm  447(maxCLUMO )2
n = 12 R = 0.989 s = 0.054 F = 134
N – number of non-hydrogen atoms
max CLUMO – maximum carbon LUMO π-electron density
Sm – molecular electronegativity
CSI ( ), max – indices based on atomic induction effect parameters
Dokl. Akad. nauk. 1993 333(2) 189-192
R2
R1
Prediction of Mutagenicity of
Substituted Biphenyls
ln (Nhis+) pred.
ln (Nhis+) pred.
ln (Nhis+) exp.
ln (Nhis+) exp.
ln(N his )  1.18Fr1  0.72Fr2 
 36.7d3  10.4d 4  0.967
 1.03Fr3  3.59
n = 19 R = 0.94 s = 0.75 F = 39.3
H
H
N
COOH
H
Fr1
Fr2
Fr3
ln(N his )  111d1  2.66d 2 
n = 19 R = 0.95 s = 0.69 F = 35
Nhis+ – number of revertants
Fr1-3 – number of fragments
d1 – minimum squared C-atom LUMO contribution
d2 – minimum squared N-atom LUMO contribution
d3 – maximum C-atom free valence index
d4 – average O-atom free valence index
Dokl. Akad. nauk. 1993 332(5) 587-589
Fragmental Descriptors
The numbers of fragments of various kind and
various size (chains, cycles, branched fragments) in a
molecule with several levels of classification of atoms.
For each molecule hundreds of fragmental descriptors
can be computed.
If a structure-property data set is sufficiently large to
allow building statistically significant models, then any
topological index can be replaced with a set of
substructural (or fragmental) descriptors.
NEURAL NETWORK SOFTWARE: NASAWIN
Fragmental descriptors in QSPR
Predicted property
Parameters
of models
Boiling Point, оС
log (),
(Pas)
d20, g/cm3
log (VP),
(Pa)
Log P
Number of compounds
509
531
367
803
352
7805
Average number of selected
descriptors
46
54
46
69
56
741
Rav
0.9920
0.9960
0.9885
0.9980
0.9981
0.9827
RMSrain
8.7
3.7
0.084
0.021
0.090
0.3233
RMSval
14.2
4.5
0.104
0.046
0.122
0.3936
RMSpred
16.6
5.4
0.141
0.051
0.0152
0.3968
RMSav
0.9814
0.9946
0.9794
0.9885
0.9902
0.9702
RMStrain
12.9
4.3
0.111
0.038
0.198
0.4171
RMSval
16.7
5.0
0.195
0.055
0.248
0.4541
RMSpred
18.6
5.5
0.212
0.067
0.258
0.4324
p1, p2, p3
p1, p2, p3,
p4, p5, c4,
c5, s4, s5
p1, p2
px, cx, sx,
bx, tx
Neural
network
model
MLR
Fragment types
p1, p2, p3, p4, p5, p8, c3,
p1, p2, p3 c4, c5, c6, c7, c8, c9, s4,
b0, b1, b4, b5
Water Solubility
Boiling point
[1]
(diverse set of 885 compounds)
fragment types
p1, p2, p3, p4, p5, p6, c3, c4, c5, c6, s4, s5, s6
Boiling point
(2)
Anticoccidial Activity of Triazinediones
O
H
N
O
N
Cl
N
R1
X
R4
R5
R3
Cl
R2
Glass Transition Temperature of Polymers
Molar Heat Capacity of Polymers in the Liquid State
Architecture of the Neural Device for Direct QSAR
Neural device in application to the propane molecule :
Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S.,
J. Chem. Inf. Comput. Sci., 37, 715 (1997)
BRAIN
EYE 1
EYE 2
("looks" at atoms)
(1)
(2)
("looks" at bonds)
(1,2)
(3)
1
CH3
CH3
3
1
(2,3)
(3,2)
SENSOR FIELD
CH2
2
(2,1)
2
3
(each sensor detects the number
of the attached hydrogen atoms)
EXAMPLES OF THE DIRECT STRUCTUREPROPERTY CORRELATIONS
Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 37, 715 (1997)
PROPERTY
Class of compounds
boiling point
alkanes
viscosity
hydrocarbons
heat of evaporation
hydrocarbons
density
hydrocarbons
heat of solvation in
cyclohexane
mixed set of compounds
polarizability
mixed set of compounds
anesthetic pressure
of gases
mixed set of organic and
inorganic gases
Correlation
coefficient
0.999
0.996
0.996
0.971
0.985
0.995
0.990
New approach in QSAR:
Neural Quantitative Structure-ConditionsProperty Relationships
Number of
entries
R
St
Sv
Boiling point of
hydrocarbons under
different pressures (оС)
14346
0.9996
2.8
2.8
Dynamic viscosity of
hydrocarbons under
different temperatures
(ln units)
3426
0.9949
0.14
0.16
Density of hydrocarbons under different
temperatures (g/ml)
3056
0.9977
0.0063
0.0063
Acid hydrolysis rate
constants for carboxylic
acid esters under different temperatures and
different 2-component
solvent composition
(log units)
2092
0.9669
0.27
0.34
Investigated
property
R – correlation coefficient;
St and Sv – RMSE for the
training and validation sets
Molecular Field Topology Analysis (MFTA)
Construction of Molecular Supergraph
R
N
- Electrostatic
- Steric
- Lipophilic
- Hydrogen bonding
- Stereochemical
- Topological
C H3
O
N
NH
Local descriptors:
C H3
Construction of
Descriptor Matrix
Model building
Q1
QN
R1
RN
Q1
Q0
R1
R0
Generation of novel
promising structures
Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 40, 659 (2000)
Molecular Supergraph
Construction
1)
N
Me
N
O
NH
Me
2)
N
Me
N
O
NH
Me
3)
N
Me
N
O
NH
Me
4)
N
Me
N
O
NH
Me
5)
N
Me
N
O
NH
Me
n)
R
N
Me
N
O
NH
Me
Local Descriptors
Sufficient coverage of major interaction types
Easy extension of the descriptor set
Electrostatic
Gasteiger's atomic charge Q (electronegativity equalization)
Absolute atomic charge Qa = abs(Q)
Sanderson's electronegativity c
Electrotopological state ETS (Hall, Mohney, Kier)
Steric
Bondi's van der Waals radius R
Atomic contribution to the molecular van der Waals surface S
Relative steric accessibility A=S/Sfree
Lipophilic
Atomic lipophilicity contribution La (environment-dependent - Ghose, Crippen)
Group lipophilicity Lg (atom and attached hydrogens)
Hydrogen bonding
Hydrogen bond donor (Hd) and acceptor (Ha) ability of an atom (Abraham)
Stereochemical
Local stereochemical indicator variables
Topological
Site occupancy factors for atoms Pa and bonds Pb (1 if a feature is present)
Affinity of substituted
2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptor
Training set: 31 compounds
R1 = H, Me, CH2CN
R1
R
N
N
R2
R2 =
R
R
R
N
R
R
N
N
N
S
H2C
N
R
N
CH3
N
N
R = H, Me, F, Cl, Br, OH, NH2, OMe,
CN, CH2NH2, CONH2, NO2, PhCOO
Affinity of substituted
2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptor
Ki – inhibition of competitive binding
MED – minimum effective dose (hot plate test)
Predicted lg(1/Ki)
y (predicted)
3
lg(1/Ki)
lg(1/MED)
Q,R,Ha,Hd,Lg
F=7
R=0.960
Q2=0.850
Q,R
F=4
R=0.977
Q2=0.918
2
1
0
-1
Fit
-2
-3
-4
-5
-5
-4
-3
-2
-1
y (original)
0
Experimental
1
2
3
Affinity of substituted
2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptor
Q
Ha
Ki – inhibition of competitive binding
R
Lg
Affinity of substituted
2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptor
R1
Construction of novel potentially active structures
Total generated structures: 171
N
R
5 best structures wrt lg(1/Ki)
N
2
N
R1 = Me, Et, CN, Pr, i-Pr, t-Bu, Ph,
R
N
R
4.01
N
R
Br
R
R
N
R
N
N
N
N
3.69
N
N
где R = CH3, Cl, Br, NO2
R2 = Me, Et, Pr, CN, i-Pr, t-Bu
3.44
Br
N
3.66
N
3.69
N
N
N
N
N
N
N
N
Cl
Activity range in training set -3.41 ... 2.05
Bradycardic activity of 3,7,9,9-tetraalkyl3,7-diazabicyclo[3.3.1]nonanes
R3
N
R1
Training set: 26 compounds
R4
N
R2
R1, R2 = Me, Pr, i-Pr, Bu, i-Bu,
C5H11, C6H13, C10H21, CH2-c-Pr,
CH2-c-C6H11, CH=CH2,
CH2CH2CH=CH2
R3, R4 = Me, Et, Pr, Bu,
-(CH2)3-, -(CH2)4-, -(CH2)5-
Bradicardic activity of 3,7,9,9-tetraalkyl3,7-diazabicyclo[3.3.1]nonanes
SR75 – ability to decrease pacemaker pulse frequency (target effect)
F75 – ability to decrease myocardium contraction force (side effect)
SelF – selectivity wrt F
FRP75 – ability to increase refractory period (side effect)
SelFRP – selectivity wrt FRP
lg(1/SR75)
lg(1/F75)
lg(1/FRP75)
SelF
SelFRP
Q,R,Ha,Hd
F=5
R=0.976
Q2=0.830
Q,R,Ha,Hd
F=3
R=0.932
Q2=0.800
Q,R,Ha,Hd
F=7
R=0.952
Q2=0.510
Q,R,Ha,Hd
F=6
R=0.972
Q2=0.819
Q,R,Ha,Hd
F=1
R=0.310
Q2=0.022
Bradicardic activity of 3,7,9,9-tetraalkyl3,7-diazabicyclo[3.3.1]nonanes
SR75 – ability to decrease pacemaker pulse frequency (target effect)
Predicted
y (predicted)
Q
0.9
0.6
0.3
0
-0.3
Fit
R
-0.6
-0.9
-1.2
-1.5
-1.5
-1.2
-0.9
-0.6
-0.3
0
y (original)
Experimental
0.3
0.6
0.9
Bradicardic activity of 3,7,9,9-tetraalkyl3,7-diazabicyclo[3.3.1]nonanes
SelF – selectivity of antiarrhythmic activity wrt
myocardium contraction force
y (predicted)
Predicted
190
170
150
Q
130
110
90
Fit
70
50
30
R
10
-10
-10
10
30
50
70
90
110
130
150
170
190
Experimental
y (original)
Ha
Bradicardic activity of 3,7,9,9-tetraalkyl3,7-diazabicyclo[3.3.1]nonanes
Construction of novel potentially active structures
R1
N
R2
N
Total generated structures: 105
5 best structures wrt SelF
R3
R1, R3 = Me, Et, Pr, i-Pr, t-Bu,
N
R2 = Me, Et, Pr, i-Pr, t-Bu
N
63.83
N
N
70.75
N
70.74
N
63.82
N
N
63.12
N
Activity range in training set 0.4 ... 177
N
Conclusions
QSAR/QSPR (Quantitative structure-activity/property
relationships) approaches can be considered as
universal techniques for the modeling and prediction
of nearly any properties of chemical compounds and
many properties of materials.
Some properties of materials can be predicted as
dependent on the structure of small molecules used
as additives (e.g. antioxidants, etc.).
A number of properties of polymers had been
modelled as dependent of the chemical structure of
monomeric unit (e.g. glass transition temperature,
molar heat capacity for liquid and solid state,
dielectric constant, refraction index).
AMPA–receptor modulators
(“ampakines”)
The group of molecular design
Academician N. S. Zefirov – Head of Organic Chemistry Division
Dr. V.A. Palyulin – Head of Group
Dr. I.I. Baskin
Dr. A.A.Oliferenko
Dr. E.V.Radchenko
Dr. M.I.Skvortsova
Dr. I.G.Tikhonova
Dr. M.S.Belenikin
Dr. A.A.Ivanov
Dr. A.Yu.Zotov
S.A.Pisarev
A.A.Ivanova
A.A.Melnikov