Transcript Slide 1
Interactive Datamining of Large-Scale Screening Datasets
Frank Oellien, Wolf D. Ihlenfeldt
Computer-Chemie-Centrum University Erlangen-Nuremberg
Klaus Engel, Thomas Ertl
Visualization and Interactive Systems Group University Stuttgart
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Chemical data
18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0
C 3
© Oellien, Ihlenfeldt, Engel, Ertl
Merck Katalog Synopsys PG ACX NCI DTP ChemInform Spresi Beilstein CAS Current datasets
MMWS 2002
Multi-Variate and Multi-Dimensional Numeric Datasets Today Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis) experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today !
tools for interactive mining and information visualization are needed
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data Standard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependent Our goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
3D Tools for Interactive Information Visualization Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approaches
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 Glyphs
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Java3D Canvas Java/Java3D InfVis Applet Tool Panel (filters, selection tools, details) Control Panel
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Java/Java3D InfVis Applet 3D Render Panel 3D Glyphs
C 3
© Oellien, Ihlenfeldt, Engel, Ertl 3D Barchart MMWS 2002
Java/Java3D InfVis Applet 3D Tool Panel Dynamic Filter Tools
C 3
© Oellien, Ihlenfeldt, Engel, Ertl Selection Tools Detail Tools MMWS 2002
Java/Java3D InfVis Applet 3D Control Panel
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Advantages of Volume-based InfVis Tools Databases with millions of data points – Glyph-based InfVis approaches • produce millions of geometric primitives • interactive visualization not possible – Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possible
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules Goal: Analysis of the reaction space
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions: reagent, solvent, time, temperature, stoichiometry, reagent order, FG-compatibility
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
ChemCodes - Reaction Optimization II
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
ChemCodes - Reaction Planning Functional Group Compatibility Check H N H H O
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compound
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Lead Compound Discovery II
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Lead Compound Discovery II
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002
Acknowledgment • Prof. Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive Systems University of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.
• Marc Nicklaus Laboratory of Medicinal Chemistry NCI, NIH • Deutsche Forschungsgemeinschaft
C 3
© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002