What’s new in LibraryMCS Miklós Vargyas*, Judit Vaskó-Szedlár UGM 2007 Talk Overview • Introduction to LibraryMCS – Concepts, motivation – Main features – GUI • 2006 Roadmap.
Download ReportTranscript What’s new in LibraryMCS Miklós Vargyas*, Judit Vaskó-Szedlár UGM 2007 Talk Overview • Introduction to LibraryMCS – Concepts, motivation – Main features – GUI • 2006 Roadmap.
What’s new in LibraryMCS Miklós Vargyas*, Judit Vaskó-Szedlár UGM 2007 Talk Overview • Introduction to LibraryMCS – Concepts, motivation – Main features – GUI • 2006 Roadmap accomplishment • New features in detail – Performance – Iterative clustering – Additive clustering • Current roadmap and wishlist UGM 2007 Introduction – Concept of MCS Maximum Common Substructure Looks simple, yet hard to compute! UGM 2007 Introduction – Motivations • MCS based clustering – More intuitive than similarity based – Closer to chemists golden standard • Initial requirements – Focused set analysis • screens: 2000 – 10000 structures • lead optimization: 3000 – 5000 structures – Should be hierarchical (outliers) – Ultimate goal: cluster 5000 compounds in 5 seconds • Further application areas UGM 2007 – Library profiling – Compound acquisition Introduction – Main features • MCS based hierarchical clustering • Flexible search options • No theoretical size limitation • Fast operation • Filtering by chemical properties • Cluster statistics • Hierarchy browser UGM 2007 GUI – Dendogram view • Interactive navigation, selection • Zoom & move UGM 2007 GUI – Molecule view UGM 2007 GUI – SAR-table • Cluster statistics, structure filtering by properties UGM 2007 GUI – R-table UGM 2007 2006 Roadmap accomplishment ... UGM 2007 Preserving rings UGM 2007 Iterative clustering • Outliers – Singletons – Large blobby clusters • Aim – Minimise number of singletons – Maintain high quality UGM 2007 Additive clustering Pre-clustering, stored Corporate database registration Cluster diversity enrichment new set UGM 2007 Performance • Depends on various factors – – – – average structure size diversity minimal required MCS size atom/bond constraints 16 14 12 10 Normal 8 Fast 6 Fastest 4 2 0 CombiLib UGM 2007 MixedLib Maybridge Performance • Scales linearly 4000 3500 Running time (sec) 3000 2500 2006 2000 2007 1500 Linear (2007) 1000 500 0 -500 0 5000 10000 15000 20000 Structure count UGM 2007 25000 30000 35000 Performance • Maximum speed achieved:1 000 structures/s 14000 12000 run time (s) 10000 8000 Ward 512 Jarp 512 LibMCS 6 6000 4000 2000 0 100 1000 10000 20000 library size • Memory requirements UGM 2007 – scalable – 50 000 structures occupy <100MB 40000 100000 In the pipeline • Multi-stage clustering • Additive clustering • Disconnected MCS (Maximum Overlapping Set) • Enhanced R-group decomposition • Markush export • Further clustering criteria – Ring count • Performance tuning – Easier memory control of memory usage UGM 2007 Current roadmap and wishlist • Simpler table view • IJC integration • Multi-cluster members • Clustering million compound libraries • Integrate Chemical Terms • Stereo care MCS • • UGM 2007 Acknowledgements • Co-workers – Péter Vadász – Judit Vaskó-Szedlár • Ideas – Ferenc Csizmadia, Szabolcs Csepregi, Ákos Papp, György Pirok • Partners, early adaptors UGM 2007