Transcript Slide 1

Interactive Datamining of Large-Scale Screening Datasets

Frank Oellien, Wolf D. Ihlenfeldt

Computer-Chemie-Centrum University Erlangen-Nuremberg

Klaus Engel, Thomas Ertl

Visualization and Interactive Systems Group University Stuttgart

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Chemical data

18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0

C 3

© Oellien, Ihlenfeldt, Engel, Ertl

Merck Katalog Synopsys PG ACX NCI DTP ChemInform Spresi Beilstein CAS Current datasets

MMWS 2002

Multi-Variate and Multi-Dimensional Numeric Datasets Today Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis)  experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today !

 tools for interactive mining and information visualization are needed

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data Standard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependent Our goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

3D Tools for Interactive Information Visualization Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approaches

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 Glyphs

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Java3D Canvas Java/Java3D InfVis Applet Tool Panel (filters, selection tools, details) Control Panel

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Java/Java3D InfVis Applet 3D Render Panel 3D Glyphs

C 3

© Oellien, Ihlenfeldt, Engel, Ertl 3D Barchart MMWS 2002

Java/Java3D InfVis Applet 3D Tool Panel Dynamic Filter Tools

C 3

© Oellien, Ihlenfeldt, Engel, Ertl Selection Tools Detail Tools MMWS 2002

Java/Java3D InfVis Applet 3D Control Panel

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Advantages of Volume-based InfVis Tools Databases with millions of data points – Glyph-based InfVis approaches • produce millions of geometric primitives • interactive visualization not possible – Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possible

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules Goal: Analysis of the reaction space

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions: reagent, solvent, time, temperature, stoichiometry, reagent order, FG-compatibility

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

ChemCodes - Reaction Optimization II

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

ChemCodes - Reaction Planning Functional Group Compatibility Check H N H H O

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compound

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Lead Compound Discovery II

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Lead Compound Discovery II

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Overview Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002

Acknowledgment • Prof. Johann Gasteiger Computer-Chemie-Centrum University of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive Systems University of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.

• Marc Nicklaus Laboratory of Medicinal Chemistry NCI, NIH • Deutsche Forschungsgemeinschaft

C 3

© Oellien, Ihlenfeldt, Engel, Ertl MMWS 2002