What’s new in LibraryMCS Miklós Vargyas*, Judit Vaskó-Szedlár UGM 2007 Talk Overview • Introduction to LibraryMCS – Concepts, motivation – Main features – GUI • 2006 Roadmap.

Download Report

Transcript What’s new in LibraryMCS Miklós Vargyas*, Judit Vaskó-Szedlár UGM 2007 Talk Overview • Introduction to LibraryMCS – Concepts, motivation – Main features – GUI • 2006 Roadmap.

What’s new in LibraryMCS
Miklós Vargyas*, Judit Vaskó-Szedlár
UGM 2007
Talk Overview
• Introduction to LibraryMCS
– Concepts, motivation
– Main features
– GUI
• 2006 Roadmap accomplishment
• New features in detail
– Performance
– Iterative clustering
– Additive clustering
• Current roadmap and wishlist
UGM 2007
Introduction – Concept of MCS
Maximum Common Substructure
Looks simple, yet hard to compute!
UGM 2007
Introduction – Motivations
• MCS based clustering
– More intuitive than similarity based
– Closer to chemists golden standard
• Initial requirements
– Focused set analysis
• screens: 2000 – 10000 structures
• lead optimization: 3000 – 5000 structures
– Should be hierarchical (outliers)
– Ultimate goal: cluster 5000 compounds in 5 seconds
• Further application areas
UGM 2007
– Library profiling
– Compound acquisition
Introduction – Main features
• MCS based hierarchical clustering
• Flexible search options
• No theoretical size limitation
• Fast operation
• Filtering by chemical properties
• Cluster statistics
• Hierarchy browser
UGM 2007
GUI – Dendogram view
• Interactive navigation, selection
• Zoom & move
UGM 2007
GUI – Molecule view
UGM 2007
GUI – SAR-table
• Cluster statistics, structure filtering by properties
UGM 2007
GUI – R-table
UGM 2007
2006 Roadmap accomplishment
...


UGM 2007
Preserving rings
UGM 2007
Iterative clustering
• Outliers
– Singletons
– Large blobby clusters
• Aim
– Minimise number of singletons
– Maintain high quality
UGM 2007
Additive clustering
Pre-clustering, stored
Corporate
database
registration
Cluster diversity enrichment
new set
UGM 2007
Performance
• Depends on various factors
–
–
–
–
average structure size
diversity
minimal required MCS size
atom/bond constraints
16
14
12
10
Normal
8
Fast
6
Fastest
4
2
0
CombiLib
UGM 2007
MixedLib
Maybridge
Performance
• Scales linearly
4000
3500
Running time (sec)
3000
2500
2006
2000
2007
1500
Linear (2007)
1000
500
0
-500
0
5000
10000
15000
20000
Structure count
UGM 2007
25000
30000
35000
Performance
• Maximum speed achieved:1 000 structures/s
14000
12000
run time (s)
10000
8000
Ward 512
Jarp 512
LibMCS 6
6000
4000
2000
0
100
1000
10000
20000
library size
• Memory requirements
UGM 2007
– scalable
– 50 000 structures occupy <100MB
40000
100000
In the pipeline
• Multi-stage clustering
• Additive clustering
• Disconnected MCS (Maximum Overlapping Set)
• Enhanced R-group decomposition
• Markush export
• Further clustering criteria
– Ring count
• Performance tuning
– Easier memory control of memory usage
UGM 2007
Current roadmap and wishlist
• Simpler table view
• IJC integration
• Multi-cluster members
• Clustering million compound libraries
• Integrate Chemical Terms
• Stereo care MCS
•
•
UGM 2007
Acknowledgements
• Co-workers
– Péter Vadász
– Judit Vaskó-Szedlár
• Ideas
– Ferenc Csizmadia, Szabolcs Csepregi,
Ákos Papp, György Pirok
• Partners, early adaptors
UGM 2007