pKaData Poster EuroCUP 2011

Download Report

Transcript pKaData Poster EuroCUP 2011

IUPAC pKa Compilations Converted to Substructure Searchable Databases Tony Slater

§

and Joe Corkery

ǂ

Abstract

OpenEye’s partner, pKaData Limited has converted all four aqueous pKa compilations of organic acids and bases sponsored by the International Union of Pure and Applied Chemistry (IUPAC) from book form into fully curated, computer-readable data, searchable by substructure.

The 13697 molecules with 30415 pKa experimental pKa values can be searched very flexibly due to the careful assignment of information to defined data fields. Simple examples of such searches are provided below.

Introduction

pKaData Limited has kindly been granted exclusive permission by IUPAC to convert their extensive compilations of experimental pKa values of organic acids and bases from book form into fully curated, computer-readable data, searchable by sub structure.

The project was completed in mid 2011, providing researchers with access to 30415 experimental pKa values in aqueous solution.

Data Source

The four books of pKa compilations sponsored by IUPAC are: 1. Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin 2. Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D.

Perrin 3. Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel, and K. Andrussow 4. Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey

Conversion

Figure 1 below illustrates how the data were extracted from the books and assigned to defined fields. Note the production of a SMILES description was derived from the molecule name and/or molecular diagram for each molecule. pKbs were converted into pKas using the method of Bandura and Lvov 1 .

assign data and text to relevant fields supply details of method supply full reference convert pKb into pKa convert names to smiles translate quality assessment into confidence limits assign non-numeric text (eg. <) to separate field, also assign ion group assign temperature to correct field and put other text (eg. ~ or >) in separate field

Figure 1: Example of data extraction from book into database.

• • • • • • • • • • •

Features

• • • • Molecule names and structures converted to SMILES IUPAC critical data quality assessment Data assigned to separate fields (e.g. ionic strength, concentration and temperature) Associated alphabetic data placed in separate field to numerical data (e.g. <5.3 assigned to two data fields) for enhanced search capabilities • Full reference and method description for each record Ionisation assignment for logD calculations Very flexible searching due to careful field assignment: Substructure Search for basic pKa with 6.5 < pKa < 7.5

Use only the highest quality data Search for where 35 ° C ≤ temp ≤ 40 ° C Any combination of the above and much more Database can be merged with existing in-house data, with the IUPAC-sourced data clearly identified Tautomers were enumerated using OpenEye’s QuacPac program, with the ability to display just a single representatidve tautomer Total number of 75232 records and 79 columns Good range of organic chemistry with applicability to pharmaceutical, agrochemical and specialty chemicals research, as well as pKa prediction research

Searches Search 1

A simple search first, say we are interested in the effect of a para halogen substituent on the pKa of aniline as in Figure 2. Furthermore, we only want pKas measured at 25 ° C and we are interested only in the most reliable data as defined by the IUPAC data quality assessment. The results are shown in Table 1.

Hal Figure 2: Search for pKa of aniline with para-substituted

halogen measured at 25

°

C.

Hal

F Cl Br I

pKa range (# of obs)

4.53 - 4.65 (5) 3.82 - 4.15 (9) 3.8 - 3.95 (6) 3.75 - 3.84 (6)

most reliable value

4.64

3.982

3.888

3.812

Table 1: Results for search for pKa of aniline with para-

substituted halogen measured at 25 degC.

Search 2

In this search, we have an enzyme active site that we think will accept an ionised phenol, but can we bring the phenolic pKa down enough to be in the range 6.5 < pKa < 7.5 using a para substituent as in Figure 3? Will nitro suffice and are there any “off the wall” substituents that will do the job? The results are shown in Table 2.

X Figure 3: Search for para-substituted phenols in the pKa

range 6.5 < pKa < 7.5

Substituent X pKa

7.152

7.42

7.3

Table 2: Results for search for para-substituted phenols in

the pKa range 6.5 < pKa < 7.5

Products

pKa Databases created by conversion of the following IUPAC books:

Base 1 (3775 molecules, 8766 pKas)

Dissociation Constants of Organic Bases in Aqueous Solution, by D. D. Perrin

Acid 1 (1063 molecules, 2893 pKas)

Dissociation Constants of Organic Acids in Aqueous Solution, by G. Kortum, W. Vogel and K. Andrussow

Base 2 (4275 molecules, 7844 pKas)

Dissociation Constants of Organic Bases in Aqueous Solution, Supplement 1972, by D. D. Perrin

Acid 2 (4584 molecules, 10912 pKas)

Ionisation Constants of Organic Acids in Aqueous Solution, by E. P. Serjeant and Boyd Dempsey

Complete Database (13697 molecules, 30415

pKas)

Conclusions

In association with OpenEye, pKaData Limited has converted all four IUPAC compilations of aqueous pKa data in book form into computer-readable and substructure-searchable form. The databases are fully curated, and the complete database provides access to researchers to 30415 experimental pKa determinations. The IUPAC critical data quality assessment provides confidence limits to the measurements. Ionisation assignments have been provided for logD calculations

References

1) Bandura, A. V., and S. N. Lvov, "The Ionization Constant of Water over a Wide Range of Temperatures and Densities." J. Phys. Chem. Ref.

Data, Vol. 35, 2006, pp. 15 – 30.

ǂ §

pKaData Limited

116 Wood Road RD9 Maungatapere Whangarei 0179 New Zealand 9 Bisbee Ct Suite D Santa Fe, NM 87508 +6494346197 [email protected]

www.pKaData.com

505.473.7385

[email protected]

www.eyesopen.com