Standardizer Molecular Cosmetics for Chemoinformatics György Pirok Nóra Máte István Cseh Szilárd Dóránt Péter Kovács Szabolcs Csepregi Ferenc Csizmadia.

Download Report

Transcript Standardizer Molecular Cosmetics for Chemoinformatics György Pirok Nóra Máte István Cseh Szilárd Dóránt Péter Kovács Szabolcs Csepregi Ferenc Csizmadia.

Standardizer
Molecular Cosmetics for Chemoinformatics
György Pirok
Nóra Máte
István Cseh
Szilárd Dóránt
Péter Kovács
Szabolcs Csepregi
Ferenc Csizmadia
Why standardize structures?

Canonicalisation


Beautification


Uniformization of structures without changing the chemical content to
recognize duplicates, functional groups (aromatization, mesomers,
tautomers, ... )
Making the structures visually more attractive (dearomatization,
cleaning coordinates, wedge orientation, ... )
Modification

Conversion of structures by modifying its original content as a
preparation step for further chemoinformatics tasks (transformations,
removing stereo, removing R-groups, ...).
often difficult to categorize the standardization actions
Canonicalisation


Hydrogens

Tautomers
making hydrogens
explicit
converting to canonical
tautomer form
making hydrogens
implicit
transforming to user
defined tautomer form
Resonant structures

Other
aromatizing Kekülé
rings
converting to canonical
mesomer form
removing small
fragments
transforming to user
defined mesomer form
expanding
stoichiometry
removing user defined
fragments
setting the chiral
flag
Mesomers
Tautomers
oxo-enol, enamine-imine
Tautomers
pyridone-pyridol
Fragment removal
Specific counterion removal
Solvent removal
Stoichiometry expansion
expanding salt stoichiometry
Stoichiometry expansion
expanding reaction stoichiometry
Beautification

Hydrogens
making hydrogens
implicit

Resonant structures
converting aromatic
rings to Kekülé format

Groups
contracting/expanding/
ungrouping abbreviated
and multiple groups

Cleaning
calculating 2D
coordinates
reallocating wedge
bonds
template based
cleaning
3D geometry
optimization
Template-based Cleaning
2D-coordinate calculation of macrocycles or bridged systems
Template-based Cleaning
aligning search results to the query
query
Canonicalization During Database Import
client
server
input structures
Standardizer
canonicalization configuration
JChem Base
/ Cartridge
canonicalized structures
original structures
Relational
Database
Sending Query to the Database
client
server
query structure
Standardizer
canonicalization configuration
JChem Base
/ Cartridge
canonicalized query
query is compared
to the canonicalized
structures
Relational
Database
Displaying Result Structures
client
server
beautified structures
Standardizer
JChem Base
/ Cartridge
beautification configuration
original structures
Relational
Database
Modification
+
custom
transformations
API and command line interface
Standardizer st = new Standardizer(new File("standardize.xml"));
st.standardize(mol);
standardize input.sdf -c config.xml -o output.smiles
Live Demonstration
Applications: Virtual Synthesis
Applications: Structure Databases
How can ChemAxon Help

Free for non commercial
websites

Free for academic teaching and research
“Academic Package”

Free Academic Package to be extended to cover
academic networks – campus-wide roll out
Acknowledments







Ferenc Csizmadia
Nóra Máté
István Cseh
Szabó Attila
Szilárd Dóránt
Péter Kovács
Szabolcs Csepregi