Presentazione di PowerPoint

Download Report

Transcript Presentazione di PowerPoint

QSAR PREDICTION OF AQUATIC TOXICITY OF ESTERS
Gramatica, P., Battaini, F., Papa, E.
QSAR and Environmental Chemistry Research Unit, University of Insubria, Varese (Italy).
Web: http://dipbsf.uninsubria.it/qsar/
e-mail: [email protected]
ABSTRACT
INTRODUCTION
Esters are an important class of industrial chemicals, for which the EU-Directive “White Paper on a strategy for a
future Community Policy for Chemicals” requires toxicity data by, at the latest, the end of 2005. The object of the
study was to develop QSAR models to rapidly predict the aquatic toxicity of esters. Unfortunately the experimental
toxicity data are not known for a large number of these compounds or, if known, the data are not all homogeneous,
hindering an accurate and comparable evaluation of the toxicological behaviour of the considered compounds.
Different theoretical molecular descriptors (1D-constitutional, 2D-topological, and different 3D-descriptors) are
calculated by the DRAGON software. The Genetic Algorithm (GA-Variable Subset Selection) is used to select the
more relevant molecular descriptors in the modelling by Ordinary Least Squares (OLS) regression. The studied endpoints are: LC50 in Pimephales promelas, EC50 in Daphnia magna and in seaweed, IGC50 in Entosiphon sulcatum
and chronic toxicity in Daphnia magna. The best models were validated for their predictive performance using leaveone-out (Q2LOO=70-90%), leave-many-out (30% of perturbation, Q2LMO=70-90%) and the scrambling of the responses.
The models were not all externally validated owing to the small dimension (14-30) of the studied sets. The reliability
of the predictions was always checked by the leverage approach in order to verify the chemical domain of the
models. A PCA model, based on four acute toxicity end-points, has been proposed to evaluate the trend of aquatic
toxicity for the studied esters. The PC1 score is also modelled by theoretical molecular descriptors (Q2LOO=89%,
Q2LMO=88%): this last model can be used as an evaluative method for screening esters according to their aquatic
toxicity, just starting from their molecular structure.
A large number of compounds (more than 100,000) are currently in common use, and
about 2,000 new ones appear each year. No data are available for the majority of these
compounds so we have no understanding of their environmental fate, their behavior or
effects [1]. This general lack of knowledge has led to the European Commission adopting
a “White Paper on a strategy for a future Community Policy for Chemicals” [2]. This
Directive requires, at the latest by the end of 2005, physico-chemicals data and toxicity
data for HPV (High Production Volume) compounds with production volume of 1,000
tonnes/year. Among the HPV compounds the class of esters is one of the largest and
environmentally most “interesting”. Some esters, i.e. phthalates, are known for their
weak carcinogenic and estrogenic effects [3], thus, there is a need to identify these
compounds to assess their potential health hazard and their impact on the environment.
The aim of our research was to develop “local” QSAR (Quantitative Structure-Activity
Relationship) models to rapidly predict the toxicity of esters. As this prediction is based
simply on knowing molecular structure, the approach could be applied usefully to new
chemicals, even those not yet synthesised, if they belong to the chemical domain of the
training set. In this case it is possible to reduce the cost and the time needed for
experimental data.
MATERIALS & METHODS
EXPERIMENTAL DATA
The studied end-points are: LC50 in Pimephales promelas, EC50 in Daphnia magna, in Pseudomonas and in seaweed, IGC50 in Entosiphon sulcatum, in Scenedesmus and in Pseudomonas. Also studied was the
chronic toxicity of phthalates in Daphnia magna. The experimental data were taken from literature [4-7], reported in mmol/L and transformed in logarithmic units.
MOLECULAR DESCRIPTORS
The molecular structure of the studied compounds was described using several molecular descriptors calculated by the DRAGON software [8]:
 descriptors 0D – costitutional descriptors (atoms and group counts)
 descriptors 1D – functional groups, atom centered fragments and empirical descriptors
 descriptors 2D – BCUTs, Galvez indices from the adjacency matrix, walk counts, various autocorrelations from the molecular graph and topological descriptors.
 descriptors 3D – Randic molecular profiles from the geometry matrix, WHIMs, GETAWAY and geometrical descriptors
CHEMOMETRIC METHODS
Multiple Linear Regression analysis and variable selection were performed by the software MOBY DIGS [9] using the Ordinary Least Square Regression (OLS) method and GA-VSS (Genetic Algorithm-Variable
Subset Selection) [10]. All the calculations were performed using the leave-one-out (LOO) and leave-many-out (LMO) procedures and the response scrambling for the internal validation of the models.
External validation [11-12] was performed on a validation set obtained with the splitting at 75% of the original data set by Experimental Design procedure, applying the software DOLPHIN of Todeschini et al [13].
Tools of regression diagnostics as residual plots and Williams plots were used to check the quality of the best models and define their applicability regarding to the chemical domain, using the chemometric
package SCAN [14]. RMS (residual mean squares) are also reported for model comparison with ECOSAR [15].
RESULTS AND DISCUSSION
Log (1/EC50) in Daphnia magna
The more relevant molecular descriptors, calculated by the DRAGON software, were select by Genetic Algorithm
(GA – Variable Subset Selection). For each end-points the best model was validated with more validation
techniques:
Log (1/LC50) in Fish
Log(1/EC50)= 14.4 - 0.03 TI1 - 1.4 Jhetv - 7.1 GATS1v
Log(1/LC50) = - 2.4 + 0.7 DP02 + 1.1 n=CH2
3.5
• Leave-one-out using QUIK rule (Q Under Influence of K (18)) to avoid chance correlation.
2.6
butyl benzyl phthalate
• Strongest validation using leave-many-out procedure (15-30%).
2.1
The models were not all validated externally owing to the small sets studied (14-30 obj.). The reliability of the
predictions was always checked by the leverage approach in order to verify the chemical domain of the models.
The regression lines of the fish and Daphnia models are reported (outliers and influential chemicals are
highlighted). Table 1 shows the performance of the best models for each end-point.
Predicted Log(1/LC50)
• Y scrambling ( permutation testing by recalculating models for randomly reordered response ).
Predicted Log(1/EC50)
2.5
1.5
glycerol trienanthate
0.5
methyl acrylate
1.6
diethyl phthalate
1.1
0.6
0.1
-0.5
-0.4
-1.5
-1.5
-0.5
0.5
1.5
2.5
3.5
-0.9
-0.9
-0.4
0.1
Experimental Log(1/EC50)
End-point
LC50
EC50
EC50
EC50
IGC
IGC
IGC
LOEC
NOEL
Species
Fish
Daphnia
Seaweed
Pseudomonas
Entosiphon
Scenedesmus
Pseudomonas
Daphnia
Daphnia
N.obj.
30
30
12
13
18
17
15
13
14
R2
82.5
85.1
96
92.5
91.5
89.6
83.4
94
91.4
Variables
DP02 n=CH2
TI1 Jhetv GATS1v
DIPp H8u
GATS5e R2v+
Me Xindex
AAC Jhetm
GATS1e R5u+
BELm4
BELm4
Q2
79.2
80.8
93.5
86.5
87.7
81.9
74.3
90.1
86.9
Q215%
79.7
80.2
92.9
85.9
88.1
82.4
73.7
91.2
87.3
Tab.2 – Comparison of models
Ento
sip
Daphnia hon
0
.5
5
7
5
5
PC2
5
0
.0
0
.5
1
.0
5
0
1
6
3
1
1
7
5
1
2
6
d
Seawee
2
5
Fish
4
0
4
6
2
8
2
2
1
4
4
1
2
2
7
4
5
2
1
95
2
1
2
0
4
8
1
5
1
0
1
6
3
9
1
2
2
3
4
7
4
93
6
4
4
2
9
4
2
3
8
3
7
3
5
1
3 3
4
3
Aquatic Toxicity
8
5
31
98
2
4
5
4
1
.5
0
.5
2
.5
1
1
te
s
t(
1
4o
b
j.)
tr
a
in
g
in(
4
3o
b
j.)
4
.0
4
.0
30
12
DP02 n=CH2
DIPp H8u
0.31
0.13
0.38
3.47
 The PC1 score highlights the global trend of aquatic toxicity and is modelled by
theoretical molecular descriptors. This model can be used for the screening and ranking
of esters according to their global toxicity, just starting from their structure.
1
.0
2
.0
2
.5
3
.5 3
.0 2
.5 2
.0 1
.5 1
.0 0
.5 0
.0 0
.5 1
.0 1
.5 2
.0 2
.5 3
.0 3
.5
Fish
Seaweed
 Principal Component Analysis has been used to propose an esters ranking for global
aquatic toxicity for 4 acute toxicity end-points (LC50 in fish, EC50 in Daphnia magna and in
seaweed, IGC in Entosiphon sulcatum).
2
.0
4
3
0
1
.0
LC50
EC50
 All models have good predictive power, verified by internal validation techniques.
5
6
AQUATICTOXICITYcalculated
3
2
Experimental Log(1/LC50)
 These models are based only on theoretical molecular descriptors selected by Genetic
Algorithm.
3
4
1
.5
2.6
 New predictive “local” models for ecotoxicity end-points of esters are proposed.
3
.5
7 3
3
2.1
CONCLUSIONS
2
.5
2
.0
1.6
For comparison purposes the RMS (Residual Mean Squares) values are reported only for LC50 in fish and EC50 in seaweed as the other
end-points are not included in the ECOSAR software. The ECOSAR models for LC50 in fish and our new models show similar performance;
but the EPA model for EC50 in seaweed has the biggest RMS (tab.2). This result appears particulary satisfactory considering that EPIWIN
model was obtained on a training set bigger than our data
End-Point Species Obj. training
Variables
RMS from our model RMS from ECOSAR
set.
A
Q
U
A
T
IC
T
O
X
IC
IT
Y
=
-5
.7
6
-0
.3
9
T
I2
+
4
.9
9
G
A
T
S
1
v
-2
.7
4
D
IS
P
p
C
u
m
.E
v
%
=8
2
%(P
C
1=6
1
.8
%
)
1.1
Tab.1 – Model Performances
Q230%
78.8
78.2
88.1
83.7
87.9
81.1
71.6
90.1
85.5
The aquatic toxicity values of 57 esters, with experimental and predicted LC50 in fish, EC50 in Daphnia and
seaweed and IGC in Entosiphon sulcatum, were studied in the Principal Component space. The first
component was found to be the most important with 61.8% of explained variance and can be considered as
a general index of aquatic toxicity. In order to have a fast method to rank the esters according to their
aquatic toxicity, the PC1 was modeled by theoretical molecular descriptors.
The best model, selected by Genetic Algorithm, was verified for stability and
n.obj=43 R2=91.5% Q2=89.9%
Q2LMO30%=89.9% Q2EXT=95.6%
predictivity by internal and external validation.
P
rin
c
ip
a
lC
o
m
p
o
n
e
n
tA
n
a
ly
s
is
0.6
2
.5
P
C
1
1
.0
0
.5
2
.0
The application of those models reduces animal testing and minimises the time and money
needed for experimental data.
3
.5
A
Q
U
A
T
IC
T
O
X
IC
IT
Y
fr
o
m
P
C
A
REFERENCES
[1] Gramatica P., Fine Chemicals and Intermediates technologies (Chemistry Today), 1991, 18-24;
[2] http://europa.eu.int/comm/environmental/chemicals/whitepaper.htm;
[9] Todeschini, R., 2001. Moby Digs - Software for multilinear regression analysis and variable subset selection by Genetic Algorithm, rel. 2.3 for
Windows, Talete srl, Milan (Italy);
[3] Thomsen M. and al. Chemophere, 1999, 38, 2613-2624.
[10] Leardi, R.; Boggia, R.; Terrile, M.,. J. Chemom., 1992, 6, 267-281;
[4] Cash G.G.and Clements R.G., SAR and QSAR in Environmental Research, 1996, 5, 113-124;
[11] Wold, S. Eriksson, L. Chemometric Methods in Molecular Design, 1995, VCH, Germany, 309-318;
[5] European Commission – Joint Research Centre IUCLID CD-ROM, 2000;
[12] Golbraikh, A. Tropsha, A., J. Mol. Graph and Mod., 2002, 20, 269-276.
[6] Verschueren K., Handbook of Environmental Data on Organic Chemicals, 1983, 2th Edition, Van Nostrand Reinhold
[13] Todeschini, R. and Mauri, A., 2000; DOLPHIN- Software for Optimal Distance-based Experimental Design rel 1.1 for Windows, Talete srl, Milan
[7] Rhodes J.E. and al., Environmental Toxicology and Chemistry, 1995, 14, 1967-1976
[8] Todeschini R., Consonni V. and Pavan E. 2002. DRAGON – Software for the calculation of molecular descriptors, rel. 1.12 for
Windows. Free download available at http://www.disat.unimib/chm.;
(Italy);
[14] SCAN- Software for Chemometric Analysis, rel. 1.1 for Windows, Jerll. Inc., Standard, CA, 1992;
[15] ECOSAR in EPIWIN-EPI Suite 2001, Ver.3.10, Environmental Protection Agency (http://www.epa.gov)