ChemAxon Presentation

Download Report

Transcript ChemAxon Presentation

Scientific & technical presentation
JChem Cartridge for Oracle
version 5.3, January 2010
Contents
• Purpose of JChem Cartridge
• Features of JChem Cartridge
• Constituents of the JChem Cartridge API
• Normal Tables vs. JChem Tables
• Architecture of JChem Cartridge
Purpose of JChem Cartridge
• Access JChem functionality using SQL:
SELECT count(*) FROM nci WHERE jc_contains(structure,
'Brc1cnc2ccccc12') = 1
Access JChem in any programming environment offering Oracle
connectivity (.NET, Java, Perl, PHP, Python, Apache mod_plsql...)
• Execute SQL queries efficiently using extensible
indexes
Precompute chemical information on structures by creating jc_idxtype
indexes:
CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype
The jc_idxtype implementation scans the indexed column for eligible
structures in one single performance-optimized operation: domain index
scan
Features of JChem Cartridge
• Adds chemistry knowledge into the SQL language of Oracle
(SELECT, INSERT, UPDATE, ...)
• Substructure, superstructure, full structure, similarity searching
• Complex chemical expressions using the Chemical Terms
language that includes logP, pKa, ...
• Automatic property calculation during registration
• Standardization (canonicalization) during registration
• Structure format conversions (MRV, Molfile, SDfile, RDfile,
SMILES, CML, etc.); 2D, 3D image generation
•
•
•
•
Structure enumeration using reaction rules
User-defined fingerprint columns
Custom similarity search through molecular descriptors
Interaction with Oracle optimizer
Structure search features
• Wide range of query atoms
• Query properties
• R-group queries
• Full SMARTS support
• Coordination compounds
• Link nodes
• Pseudo atoms, lone pairs
• Relative stereo
• Reaction search features
• Hit coloring, position variation
• Polymers
See detailed information on structure search:
www.chemaxon.com/conf/Structural_Search.ppt
Search options
• Stereo on/off
• Ignore charge/isotope/radical/valence/mixture brackets
• Vague bond matching options
• Chemical Terms filter
• Tautomer search
• Inverse hit list
• Maximum search time / number of hits
• Combine with non-structure conditions
• Ordering of results
• etc.
Searching in Markush structures
Combinatorial Markush structure registration and search
• Markush features handled in search &
enumeration:
• R-groups (nesting to any depth)
• Atom lists, bond lists
• Position variation bond
• Link nodes and repeating units
• Homology groups
• Compatible Markush enumeration plugin
Detailed description:
http://www.chemaxon.com/jchem/doc/user/Query.html#combinatorialMarkush
Standardization
• Default standardization
includes:
– Hydrogen removal
– Aromatization
• Custom standardization
can be specified for each
table or JChem index
JChem Cartridge http://www.chemaxon.com/conf/Standardizer.ppt
Custom Standardization Example
before
after
Compatibility and integration
File formats:
• SMILES
• MDL molfile
(v2000 and v3000)
• MDL SDF
• RXN
• RDF
• MRV
• IUPAC name, InChI
• Markush DARC
• CDX
Operating
systems:
• Windows
• Linux
• Solaris
• HP-UX
• etc.
DB engines:
Oracle versions 9i R2 or above
for alternative RDBMS systems, see the JChem Base
presentation: http://www.chemaxon.com/JChem_Base.ppt
Elements of the JChem Cartridge API
• Operators (jc_...) for SQL and their
functional forms (jcf package) for PL/SQL
• Parameters for index creation
• DML operators for JChem tables
• Support functions for user defined operators
Operators and functions I.
Typical operator:
jc_<some-operation>(<target-structure-column>, <someoperand>)
Operator for substructure search:
jc_contains(<target-structure-column>, <query-structure>)
“Swiss-army-knife” search operator:
jc_compare(<target-structure-column>, <query-structure>,
<options>)
Operators and functions II.
• Chemical Terms
–Over 100 built-in functions, including
- elemental analysis
- topological descriptors
- property predictions (logP/D, pKa, PSA, H bond
donors/acceptors, charge etc).
- tautomers, protonation forms
–User-defined functions.
–Example: The Lipinski-rule in chemical terms
SELECT count(*) FROM nci_3m WHERE jc_compare(structure,
'O=C1ONC(N1c2ccccc2)-c3ccccc3','sep=! t:s!ctFilter:(mass() <=
500) && (logP() <= 5) && (donorCount() <= 5) &&
(acceptorCount() <= 10)') = 1
Operators and functions III.
•
jc_compare: substructure/similarity/exact searching combined with
Chemical Terms expressions
•
jc_matchcount: number of occurences of the query structure in the
target
•
jc_evaluate: Chemical Terms evaluation
•
jc_molweight: molecular weight
•
jc_formula: molecular formula
•
jc_react: structure enumeration based on virtual reactions
•
jc_standardize: structure canonicalization
•
jc_molconvert: conversion to different formats (image generation is
supported)
•
jc_tanimoto: similarity search
•
jcf.hitColorAndAlign: substructure coloring and alignment
Operators and functions IV.
Similarity search example displaying ID, SMILES code, and
molweight:
SELECT cd_id, cd_smiles, cd_molweight FROM my_structures
WHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8;
Chemical Terms and Query Prefiltering:
SELECT id, purchase_date FROM compounds_instock WHERE
jc_compare(structure, 'C(=S)([N][N])[S]', 'sep=!
t:t!simThreshold:0.9!ctFilter:logp()>1!filterQuery:SELECT rowid
FROM compounds_instock WHERE purchase_date > DATE ''2002-0101''') = 1
Prefiltering allows to execute search on a subset of rows more efficiently.
Dynamic generation of static images:
SELECT jc_molconvertb(structure, 'png -2') FROM
nci where id = :1
Avaliable image formats: png, jpeg, svg, ...
PNG
Operators and functions V.
Calculate logp:
SELECT jc_evaluate('OC(=O)c1c2ccccc2nc3ccccc13', 'logp')
FROM dual;
Generate tautomers:
SELECT jc_evaluate_x('NC1=C(CC=O)C=CCC1',
'chemTerms:tautomers() outFormat:smiles') FROM dual;
Generate resonants:
SELECT jc_evaluate_x('NC1=C(CC=O)C=CCC1',
'chemTerms:resonants() outFormat:smiles') FROM dual
Index parameters
Index parameters affect:
• Fingerprint attributes
• Standardizer configuration
• Table space and storage options of the index table
Examples:
• Standardization by stripping hydrogens and using basic
aromatization:
CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype
PARAMETERS('STD_CONFIG=removeexplicitH..aromatize:b')
• Add structural keys to fingerprint for more efficient substructure
searching (structural keys are defined in table stfp_keys):
CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype
PARAMETERS('STRUCTURALFP_CONFIG=select structure from
stfp_keys')
Calls Not Using Indexes
Using SQL statements for calling JChem
operators on structures not stored in a table
Sample SQL statement without index information:
SELECT jc_contains('O=C1C=CNC=C1', 'n1ccccc1') FROM dual
Setting default properties for calls not using indexes:
CALL jc_set_default_property('standardizerConfig',
'aromatize:b')
Supported Column Types
• VARCHAR2: typically for short formats, e.g. SMILES
• CLOB
• BLOB
for longer formats, e.g. MDL molfile,
Marvin (mrv)
Supported Structure Table Types
• Regular Table: nci_1k
CREATE INDEX jcxnci_1k...
Rowid of the base table (nci_1k)
Index table:
jcxnci_1k_jcx
• JChem Table (generated by jcman or API):
CREATE INDEX jcxjc_nci_1k...
jc_nci_1k
Regular Tables vs. JChem Tables
• Regular structure tables
–
–
base table and index table are physically distinct
index properties are specified as index parameters
• JChem structure tables
–
–
base table and index table are physically the same
most of the “index” properties are specified during table creation (jcman or
Java API)
• Pros & Cons:
– inserts from outside the database are faster with JChem tables
– JChem tables require Java API or the jcman command line tool (for table
creation) and Java API or special cartridge functions for INSERTs,
UPDATEs and DELETEs; standard SQL can be used with regular tables in
all cases.
JChem Cartridge Architecture
Computation intensive operations are performed in a separate
Sun JVM.
JChem Server
Oracle
RMI
JChem Cartridge
JChem Streams
JChem Base
Update
Search
Cache
Cache
JDBC
Advantage:
JChem Core
fast execution (optimized native code)
flexibility in deployment
Performance
Table containing 19,528,372 structures from PubChem with Intel Quad
CPU Q6600 2.40GHz desktop PC, 8GB memory desktop PC
Substructure search results:
Query Structure
C1CN1c2cnnc3c(cncc23)C4=CSC=C4
O=C1ONC(N1c2ccccc2)c3ccccc3
Oc1c(N=N)c(cc2cc(ccc12)S(O)(=O)=O)S(O)(=O)=O
C(Sc1ncnc2ncnc12)c3ccccc3
NC1=CC=NC2=C1C=CC(Cl)=C2
c1ncc2ncnc2n1
Clc1ccccc1
JChem 5.2
Hit Count
Time (ms)
0
1487
129
823
93
764
489
786
6,001
1,189
146,256
6,665
2,975,285
82,646
Future plans
• Flexible 3D pharmacophore search
• R-Group decomposition
• Clustering
• Maximum common substructure search type
• Extended fingerprint connectivity (EFPC)
Summary
JChem Cartridge for Oracle allows to access the
rich functionality of JChem Base in a flexible and
efficient manner.
JChem Cartridge for Oracle uses creative solutions
to broaden the applicability of JChem's core
functions while preserving key benefits of the Java
platform.
Links
• Documentation
– www.jchem.com/doc/admin/cartridge.html
– www.jchem.com/doc/guide/cartridge/index.html
– www.jchem.com/doc/guide/cartridge/index.html
• Forum
– www.chemaxon.com/forum/
• Brochure
– www.chemaxon.com/brochures/JChem_Cartridge.pdf
Visit other technical presentations
MarvinSketch/View
http://www.chemaxon.com/MarvinSketch_View.ppt
MarvinSpace
http://www.chemaxon.com/MarvinSpace.ppt
Calculator Plugins
http://www.chemaxon.com/Calculator_Plugins.ppt
JChem Base
http://www.chemaxon.com/JChem_Base.ppt
JChem Cartridge
http://www.chemaxon.com/JChem_Cartridge.ppt
Standardizer
http://www.chemaxon.com/Standardizer.ppt
Screen
http://www.chemaxon.com/Screen.ppt
JKlustor
http://www.chemaxon.com/JKlustor.ppt
Fragmenter
http://www.chemaxon.com/Fragmenter.ppt
Reactor
http://www.chemaxon.com/Reactor.ppt