Presentazione di PowerPoint

Download Report

Transcript Presentazione di PowerPoint

Milano Chemometrics and QSAR Research Group
Roberto Todeschini
Viviana Consonni
Manuela Pavan
Andrea Mauri
Davide Ballabio
Alberto Manganaro
chemometrics
molecular descriptors
QSAR
multicriteria decision making
environmetrics
experimental design
artificial neural networks
statistical process control
Department of Environmental Sciences
University of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.unimib.it/chm/
Roberto Todeschini
Milano Chemometrics and QSAR Research Group
Molecular descriptors
Autocorrelations, eigenvalue-based
and information indices
Iran - February 2009
Contents
 Autocorrelation descriptors
 Molecule representation by matrices
 Eigenvalue-based descriptors
 Information content
 Information indices
Autocorrelation on a molecular graph
w is the vector collecting the weights of each atom
- quadratic molecular property
A
P  w  I  w   w i2
T
i 1
1 = (1,A) (A,A) (A,1)
- quadratic molecular property with interaction terms
2
A
A 1


2
ATS  w  U  w    w i    w i  2   j i w i  w j
i 1
 i 1  i 1
A
T
Autocorrelation on a molecular graph
Moreau - Broto autocorrelation of a topological structure
1984
A 1
ATSk   j i w i  w j    k; dij 
k 0
i 1
1
  k; dij   
0
if dij  k
if dij  k
LAG
A
ATS0   w i2
i 1
A
A 1
d
i 1
k 1
ATS   w  2   j i w i  w j  ATS0  2   ATSk
i 1
2
i
Autocorrelation on a molecular graph
Example : 4-hydroxy-2-butanone
ATS0  w  w  w  w  w  w
2
1
2
2
2
3
2
4
2
5
6
O
2
6
ATS1  w1  w2  w2  w3  w3  w4  w 4  w5  w2  w6
ATS2  w1  w3  w2  w4  w3  w5  w3  w6  w1  w6
ATS3  w1  w 4  w2  w5  w 4  w 6
ATS0  122  122  122  122  162  162  1088
C
2
C
1
w i  mi
C
3
C
4
O
5
atomic masses
ATS 0  1088 / 6  1813
.
ATS1  12  12  12  12  12  12  12  16  12  16  816 ATS 1  816 / 5  163.2
ATS2  12  12  12  12  12  16  12  16  12  16  864 ATS 2  864 / 5  172.8
ATS3  12  12  12  16  12  16  528
ATS 3  528/ 3  176
Eigenvalue-based descriptors
Eigenvalue descriptors are derived from the
diagonalization of symmetric matrices derived
from a molecular graph, such as:
 Adjacency matrix
 Vertex distance matrix
 Edge adjacency matrix
 Edge distance matrix
 Detour matrix
 Geometrical distance matrix
 Covariance matrix
... and any weighted symmetric matrix
Eigenvalue-based descriptors
Lovasz - Pelikan index (or leading eigenvalue)
1973
The largest eigenvalue derived from the adjacency matrix

LP
1
Eigenvalue-based descriptors
General functions of eigenvalues
n
SpSumk  M, w     i
k
i 1
n
n
SpSumk  M, w     
i 1

 k
i
n
SpAD  M, w     i  
SpMAD  M, w     i   / n
MinSp  M, w   min i  i 
MaxSp  M, w   max i  i 
MaxSpA  M, w   max i   i 
SpDiam  M, w   MaxSp - MinSp
i 1
i 1
n
SpSumk  M, w     i
i 1
k
Eigenvalue-based descriptors
The trace of the adjacency matrix (and of the
distance matrix) is equal to zero.
trace( A)        0

j

j
Eigenvalue-based descriptors
VAA indices (from adjacency matrix)
Balaban et al., 1991
VAA1   

j

VAA2 


A
b g
A
VAA3   log VAA1
10

j
Eigenvector-based descriptors
VEA indices (from adjacency matrix)
Balaban et al., 1991
A
VEA1   iA
i 1
VEA1
VEA2 
A
b g
A
VEA3   log VEA1
10
where A is largest negative eigenvalue derived from the
adjacency matrix
Eigenvalue-based descriptors
VAD, VED and VRD indices
(from distance matrix)
Balaban et al., 1991
The same indices defined above are calculated
on the topological distance matrix
Molecular geometry
The geometry matrix G (or geometric distance matrix) is
a square symmetric matrix whose entry rst is the
geometric distance calculated as the Euclidean distance
between the atoms s and t:
0
r21
G

r12
 r1 A
0  r2 A
  
rA1 rA 2 
0
Distance / distance matrix
Distance / distance matrix (DD)
Randic et al., 1994
DD
ij

G
ij
D
ij

G ij : geometry matrix
D ij : distance matrix
rij
dij
Eigenvalue-based descriptors
Folding degree index
Randic et al., 1994
The largest eigenvalue derived from the distance/distance
matrix


A
DD
1
This quantity tends to 1 for linear molecules (of infinite
length) and decreases in correspondence with the
folding of the molecule.
Conventional bond order
 single bond:
* = 1
 double bond:
* = 2
 triple bond:
* = 3
 conjugated bond:
* = 1.5
Eigenvalue-based descriptors
BCUT descriptors
Burden - CAS - University of Texas eigenvalues
1997
The largest absolute eigenvalues 1, 2, 3, ..., L,
derived from the following B matrix:
Bii  wi
w atomic properties
Bij
R

|
S
|T0
*
ij
i , j bonded
otherwise
* conventional bond order
Topological information indices
Indices based on the information content and
entropy measures derived from the molecular
graphs.
Information content
The information content of a system having n elements is
a measure of the degree of diversity of the elements in the
set.
IC 
G
n
g
log2 ng
g 1
where G is the number of different equivalence classes
and ng is the number of elements in the g-th class and
n
G
n
g 1
g
Information content
IMAX  n  log2 n
Maximum information content
Total information content
G
IT  n  log2 n   ng log2 ng
g 1
Information content
The Shannon entropy of a system having n elements is
the mean information content of a set of elements
G
H   pg log2 pg
g 1
where G is the number of different equivalence classes
and pg is the probability of the g-th class and
pg 
ng
n
G
pg  1

g 1
Information content
HMAX  log2 n
Maximum entropy
Standardized entropy
G
H 
*
 pg log2 pg
g 1
0  H 1
*
log2 n
Information content
Me
Me
... on atoms
F
Br
IMAX = 9 log2 9 = 28.529
F
HMAX = log2 9 = 3.170
n=9 C=7 F=2
F
n = 9 C = 7 F = 1 Br = 1
IC = 7 log2 7 + 2 log2 2 =
IC = 7 log2 7 + 2 (1 log2 1) =
19.651 + 2.000 = 21.651
19.651 + 0 = 19.651
IT = 28.529 – 21.651 = 6.878
IT = 28.529 – 19.651 = 8.878
H = -(7/9) log2 (7/9) + -(2/9) log2 (2/9)
H = -(7/9) log2 (7/9) - 2 (1/9) log2 (1/9)
= 0.282 + 0.482 = 0.764
= 0.282 + 2 x 0.352 = 0.986
H* = 0.764 / 3.170 = 0.241
H* = 0.986 / 3.170 = 0.311
Information content
1
... on vertex degrees
Me
1
2
3
n = 9 V1 = 3 V2 = 3
F
3
H = 3*[-(3/9) log2 (3/9)] = xxx
3
... on vertex degree magnitudes
2
2
SV1 = 3 SV2 = 6
V3 = 3
1
F
n = 18 V1 = 3 V2 = 6
V3 = 9
SV3 = 9
H = -(3/18) log2 (3/18) - (6/18) log2 (6/18) -(9/18) log2 (9/18) = xxxx
Milano Chemometrics and QSAR Research Group
Roberto Todeschini
Viviana Consonni
Manuela Pavan
Andrea Mauri
Davide Ballabio
Alberto Manganaro
chemometrics
molecular descriptors
QSAR
multicriteria decision making
environmetrics
experimental design
artificial neural networks
statistical process control
Department of Environmental Sciences
University of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.disat.unimib.it/chm/
THANK YOU
X
X
X
X
X
Roberto Todeschini
Milano Chemometrics and QSAR Research Group
Molecular descriptors
Autocorrelations, eigenvalue-based
and information indices
Prof. Roberto Todeschini
Dr. Davide Ballabio
Dr. Viviana Consonni
Dr. Alberto Manganaro
Dr. Andrea Mauri
X
X
X
Autocorrelation ona molecular graph