Receptor-Based Design
Download
Report
Transcript Receptor-Based Design
An Introduction to QSAR
Dr. Bahram Hemmateenejad
Chemistry Department
Shiraz University
Computer-Aided Molecular
Design (CAMD)
Computer-Aided Ligand Design (CALD)
Computer-Aided Drug Design (CADD)
Approaches in CAMD
Receptor-based design
Known receptor
Protein binding site
Supramolecular host
Ligand-based design
Known set of ligands
Unknown receptor
Receptor-Based
Design
Build or Find the key that fits the lock
Receptor-Based
Design
Docking
Interaction
energy
Molecular alignment
Pharmacophor modeling
Ligand-Based
Design
Quantitative structure-activity relationship (QSAR)
Quantitative structure-Property relationship
(QSPR)
Quantitative structure-Toxicity relationship (QSTR)
Quantitative structure-Retention relationship
(QSRR)
Quantitative structure-Migration relationship
(QSMR)
Quantitative structure-Electrochemistry
relationship (QSAR)
Quantitative structure-Function relationship
(QSFR)
QSAR/QSPR
Definition
Prediction of biological activities or chemical
property of organic compounds from their
molecular structures using mathematical
equations
(obs. biological activity) (molecular descriptors)
Y = f (Xi)
Prediction
Ligand-Based
Molecular Design
Infer Binding Pocket
QSAR
What to achieve
estimate
the value of unknown
physical/chemical/biological properties of
compounds based on known or computationally
accessible properties
How to achieve:
determine
the value of the response variable y
as a function of descriptor variables xi
yik=F(Ai,xk) + Bik
What we need for QSAR
models?
Dataset including compounds with known
biological activity
Descriptors that are accessible for all
members of the dataset
Algorithms for the development of a QSAR
model
Validation protocol for the evaluation of the
model
Requirements for QSAR
datasets
Compounds should
belong to a congeneric series (more
important in 2D)
have same mechanism of action
have comparable binding mode
have biological activtity that correlates to
binding affinity
Requirements for QSAR
datasets
Compounds should
have enumerated biological response
measured
in
same organism/tissue/cell/protein
using same type of measurement
(binding/functional/IC50/Ki etc.)
using same protocol (radioligand, activator,
cofactor, pH, buffer
etc.)
QSAR Origin
Linear Free-Energy Relationships (LFER)
Hammet
K
Log
K
0
Free-Wilson analysis
log 1/C = Σ ai + μ
ai = substituents (R1, R2,
etc.) contributions
μ = activity contribution of
reference compound
R1
R2
R3
Free-Wilson analysis
NO R1
Cl
1
0
2
1
R2
OH Me Cl
1
0
1
0
0
1
R3
OH Me Cl
0
0
0
0
0
0
OH Me
0
0
1
0
3
4
5
0
1
1
0
1
0
0
0
0
Hansch Analysis
Official Birth
1
2
Log a(log P) b log P c ... k
C
C=Biological effect
P=Partition Coefficient
σ=Electronic Hammet Constant
Linear Hansch model
log 1/C = a log P + b σ + c MR + ... + k
Nonlinear Hansch models
log 1/C = a (log P)2 + b log P + c σ + ... + k
log 1/C = a log P - b log (ßP + 1) + c σ + ... + k
Mixes Hanch/Free-Wilson model
log 1/C = a (log P)2 + b log P + c σ +...+ Σ ai + k
Complementarily principles in binding
molecules to macromolecular targets
Interaction
Property
Descriptor
Steric
Topology
Distance, volume,
surface
Electrostatic
Electron
Density
σ, partial charges,
Quantum chemical
Hydrophobic
Lipophilicity
logP, π
van der Waals
Polarizability
MR, parachor
Descriptor types for QSAR
Substituent variables:
Property of substituents only
Molecule variables:
Property of the whole
molecule
Interaction variables:
Property of a given
interaction
Descriptors for QSAR
Constitutional
MW, Nheteroatoms ,Natoms
Topological
Connectivity, Weiner index, E-state indices
Electrostatic
Polarity, dipol moments, partial charges
Geometrical Descriptors
Distances, molecular volume, PSA
Quantum chemical
HOMO and LUMO energies
Vibrational frequencies
Bond orders
Energies, entalphies, entropy
Descriptors for QSAR
3D descriptors
MEP – Molecular Electrostatic Potential
MLP – Molecular Lipophilicity Potential
GRID – total energy of interaction: the sum of
steric (Lennard-Jones), H-bonding and
electrostatics
CoMFA – standard: steric and electrostatic,
additional: H-bonding, indicator, parabolic and
others.
Conditions for applicability of
QSAR
Selection of compounds
The
same mechanism of action
Homogeneity
Representativity
Experimental design
Biological data
High
quality and reliable
Same protocol and same laboratory
The level of experimental error
Conditions for applicability of
QSAR
Type of data
Continues
Discrete
Data scaling and transformation
Logarithmic
transformation
Normalization
Conditions for applicability of
QSAR
Descriptors
As
meaningful as possible
Interpretable
Calculation simplicity
Calculation uncertainty
Software reliability
QSAR Steps
1. Formulation of classes of similar compounds
Ideal situation: Classes of chemically and
biologically similar compounds
All the compounds should be structurally similar
and function according to the same mode of
action
Compounds must be disimilar enough to cause
some systematic change in biological activity
QSAR Steps
2. Quantitative description of structural
variations (descriptor calculation)
Usually several descriptors are required
It is difficult to predict which descriptors
will be useful
It is convenient to have a set of
independent descriptors
QSAR Steps
3. Selection of training set compounds
Training set: is used to optimize and develop the model
Calibration set: Calculating model coefficient
Validation set: Validate the constructed model
External test set
has no contribution in the model development step
Measures the overall prediction ability of the proposed
model
Selection criteria
Random
Experimental design
Classification methods
PCA
Classification
SIMCA
…
and regression tree (CART)
Example of a PCA-based selection method
QSAR steps
4. QSAR model development (data analysis)
Regression method
Variable selection method
Regression methods
Linear regression
Preferred for simplicity and ease of
calculation
More descriptive
Non-linear regress
Usually are complex
Higher prediction ability
High risk of over-fitting
Linear regression
Multiple linear regression (MLR)
The
simplest and the mostly used method
More interpretable
Collinearity
Number of variables considered in the model
Factor analysis based methods
Principal
component regression (PCR)
Partial least squares (PLS)
PCR and PLS
Both overcome colinearity by producing orthogonal
variables
PCR is a continuum between MLR and PLS
PLS is more predictive
PCR is more descriptive
PLS generate latent variables
Two-step model building
Variable selection
Factor selection
Higher risk of over-fitting with respect to MLR
Nonlinear regression
Artificial neural networks (ANN)
Feed-forward
Counter propagation
Kohonen networks
Wavelet neural network
Neuro fuzzy
Nonlinear PCR and PLS
Quadratic PCR or PLS
PC-ANN
PLS-ANN
Support vector machine (SVM)
…
Variable selection
Search strategy
Searching
different subsets of descriptors
Scoring function
Evaluating
the performances of the variable
combination
Regression methods are used for scoring
Variable selection always is coupled with a
regression method
Variable selection
Feature selection
Different variable selection methods
Stepwise
Genetic algorithm
Ant colony optimization
…
Feature extraction
PCA scores
Kohonen scores
SVM scores
Wavelet coefficients
Combined feature selection-feature extraction
QSAR steps
5. Model validation
The essential part of a QSAR study
Internal validation
Cross-validation
External validation
Some advises
QSAR models should be
Simple
Transparence
Mechanistically
Non
comprehensible
over-fitted
Use as low number of variables as possible
Some advises
Be associated with a biological end point
Take the form of unambiguous and easily applicable
algorithm
Ideally have a clear mechanistic basis
Be accompanied by a definition of applicability domain
Be associated with a measure of good-ness of fit
Be accessed in term of its predictive capacity
The last advise
Using Experimental design and QSAR to
increase the rate of proposing new
compounds
Medicinal chemists or drug designers
Good
diversity
Molecular
volume
HO
C
2H
O
2HC
HC
HN
O
Rotatable
bonds
2HC
2HC
Dipole
Poor
diversity
O
O
2HC
2HC
O
Synthesis->
Biol. testing->
QSPR model
Predicted value
O
HO
Actual Value
Models are not real
but
sometimes are helpful