No Slide Title

Download Report

Transcript No Slide Title

Quantitative Structure-Activity
Relationships (QSAR)
Comparative Molecular Field
Analysis (CoMFA)
Gijs Schaftenaar
Outline
• Introduction
• Structures and activities
• Analysis techniques:
Free-Wilson, Hansch
• Regression techniques:
PCA, PLS
• Comparative Molecular Field Analysis
QSAR: The Setting
Quantitative structure-activity relationships are used
when there is little or no receptor information, but
there are measured activities of (many) compounds
From Structure to Property
O
O
O
H O
H O
H O
H OH
H
H OH
HO
H OH
H
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H OH
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H
H
HO
H
H
HO
H
H
H
HO
H
H
HO
H
H
HO
H
H
HO
HO
9
8
7
6
5
4
3
2
1
0
1
3
5
7
9
EC50
11
13
15
From Structure to Property
O
O
O
H O
H O
H O
H OH
H
H OH
HO
H OH
H
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H OH
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H
H
HO
H
H
HO
H
H
H
HO
H
H
HO
H
H
HO
H
H
HO
HO
LD50
From Structure to Property
O
O
O
H O
H O
H O
H OH
H
H OH
HO
H OH
H
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H OH
HO
OH
H
HO
H OH
H
HO
H OH
HO
H
H
H
HO
H
H
HO
H
H
H
HO
H
H
HO
H
H
HO
H
H
HO
HO
QSAR: Which Relationship?
Quantitative structure-activity relationships
correlate chemical/biological activities
with structural features or atomic, group or
molecular properties.
within a range of structurally similar compounds
Free Energy of Binding and
Equilibrium Constants
The free energy of binding is related to the
reaction constants of ligand-receptor complex
formation:
DGbinding
= –2.303 RT log K
= –2.303 RT log (kon / koff)
Equilibrium constant K
Rate constants kon (association) and koff (dissociation)
Concentration as Activity Measure
• A critical molar concentration C
that produces the biological effect
is related to the equilibrium constant K
• Usually log (1/C) is used (c.f. pH)
• For meaningful QSARs, activities need
to be spread out over at least 3 log units
Free Energy of Binding
DGbinding = DG0 + DGhb + DGionic + DGlipo + DGrot
DG0
entropy loss (translat. + rotat.) +5.4
DGhb
ideal hydrogen bond
–4.7
DGionic
ideal ionic interaction
–8.3
DGlipo
lipophilic contact
–0.17
DGrot
entropy loss (rotat. bonds)
+1.4
(Energies in kJ/mol per unit feature)
Basic Assumption in QSAR
The structural properties of a compound contribute
in a linearly additive way to its biological activity
provided there are no non-linear dependencies of transport or
binding on some properties
An Example: Capsaicin Analogs
X
HO
MeO
H
N
O
X
EC50(mM)
log(1/EC50)
H
11.80
4.93
Cl
1.24
5.91
NO2
4.58
5.34
CN
26.50
4.58
C6H5
0.24
6.62
NMe2
4.39
5.36
I
0.35
6.46
NHCHO
?
?
An Example: Capsaicin Analogs
p
s
Es
1.03
0.00
0.00
0.00
5.91
6.03
0.71
0.23
-0.97
NO2
5.34
7.36
-0.28
0.78
-2.52
CN
4.58
6.33
-0.57
0.66
-0.51
C6H5
6.62
25.36
1.96
-0.01
-3.82
NMe2
5.36
15.55
0.18
-0.83
-2.90
I
6.46
13.94
1.12
0.18
-1.40
NHCHO
?
10.31
-0.98
0.00
-0.98
X
log(1/EC50)
H
4.93
Cl
MR
MR = molar refractivity (polarizability) parameter; p = hydrophobicity parameter;
s = electronic sigma constant (para position); Es = Taft size parameter
An Example: Capsaicin Analogs
X
HO
MeO
H
N
O
log(1/EC50) = -0.89 +
0.019 * MR +
0.23 * p +
-0.31 * s +
-0.14 * Es
An Example: Capsaicin Analogs
X
HO
MeO
H
N
O
X
EC50(mM)
log(1/EC50)
H
11.80
4.93
Cl
1.24
5.91
NO2
4.58
5.34
CN
26.50
4.58
C6H5
0.24
6.62
NMe2
4.39
5.36
I
0.35
6.46
NHCHO
?
?
First Approaches: The Early Days
• Free- Wilson Analysis
• Hansch Analysis
Free-Wilson Analysis
log (1/C) = S aixi + m
xi: presence of group i (0 or 1)
ai: activity group contribution of group i
m: activity value of unsubstituted compound
Free-Wilson Analysis
+ Computationally straightforward
– Predictions only for substituents already included
– Requires large number of compounds
Hansch Analysis
Drug transport and binding affinity
depend nonlinearly on lipophilicity:
log (1/C) = a (log P)2 + b log P + c Ss + k
P:
n-octanol/water partition coefficient
s:
Hammett electronic parameter
a,b,c:
regression coefficients
k:
constant term
Hansch Analysis
+ Fewer regression coefficients needed for
correlation
+ Interpretation in physicochemical terms
+ Predictions for other substituents possible
Molecular Descriptors
• Simple counts of features, e.g. of atoms, rings,
H-bond donors, molecular weight
• Physicochemical properties, e.g. polarisability,
hydrophobicity (logP), water-solubility
• Group properties, e.g. Hammett and Taft constants,
volume
• 2D Fingerprints based on fragments
• 3D Screens based on fragments
2D Fingerprints
Br
HO
H
N
MeO
O
C
N
O
P
S
X
F
Cl
Br
I
Ph CO NH OH Me Et Py CHO SO C=C CΞC C=N Am Im
1
1
1
0
0
1
0
0
1
0
1
1
1
1
1
0
0
0
0
1
0
0
1
0
Regression Techniques
• Principal Component Analysis (PCA)
• Partial Least Squares (PLS)
Principal Component Analysis (PCA)
• Many (>3) variables to describe objects
= high dimensionality of descriptor data
• PCA is used to reduce dimensionality
• PCA extracts the most important factors
(principal components or PCs) from the data
• Useful when correlations exist between descriptors
• The result is a new, small set of variables (PCs)
which explain most of the data variation
PCA – From 2D to 1D
PCA – From 3D to 3D-
Different Views on PCA
• Statistically, PCA is a multivariate analysis technique
closely related to eigenvector analysis
• In matrix terms, PCA is a decomposition of matrix X
into two smaller matrices plus a set of residuals:
X = TPT + R
• Geometrically, PCA is a projection technique in which
X is projected onto a subspace of reduced dimensions
Partial Least Squares (PLS)
y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1
(compound 1)
y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2
(compound 2)
y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3
(compound 3)
…
…
yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en
(compound n)
Y = XA + E
X = independent variables
Y = dependent variables
PLS – Cross-validation
• Squared correlation coefficient R2
• Value between 0 and 1 (> 0.9)
• Indicating explanative power of regression equation
With cross-validation:
• Squared correlation coefficient Q2
• Value between 0 and 1 (> 0.5)
• Indicating predictive power of regression equation
PCA vs PLS
• PCA:
• PLS:
The Principle Components describe the variance
in the independent variables (descriptors)
The Principle Components describe the variance
in both the independent variables (descriptors)
and the dependent variable (activity)
Comparative Molecular Field Analysis
(CoMFA)
• Set of chemically related compounds
• Common substructure required
• 3D structures needed (e.g., Corina-generated)
• Bioactive conformations of the active
compounds are to be aligned
CoMFA Alignment
OH
B
d1
HO
d1
d2
Cl
L
C7
OMe
d3
L A
L
D
d3
LA
MeO
BL
d2
C1
L
Cl
Cl
L
d3
d2
L
d1
L
"Pharmacophore"
OH
A
L
HO
O
d3
L
C
OH
d3
L
B
d2
L7
d1O C
L
A
HO
d1
d2
C
L1
NMe2
B
O
CoMFA Grid and Field Probe
(Only one molecule shown for clarity)
Electrostatic Potential Contour Lines
CoMFA Model Derivation
• Molecules are positioned in a regular grid
according to alignment
• Probes are used to determine the molecular field:
Electrostatic field
(probe is charged atom)
Van der Waals field
(probe is neutral carbon)
Ec = S qiqj / Drij
Evdw = S (Airij-12 - Birij-6)
3D Contour Map for Electronegativity
CoMFA Pros and Cons
+ Suitable to describe receptor-ligand interactions
+ 3D visualization of important features
+ Good correlation within related set
+ Predictive power within scanned space
– Alignment is often difficult
– Training required