CCDC_Intro proposal for Industry day

Download Report

Transcript CCDC_Intro proposal for Industry day

CCDC Tools for Mining Structural Databases
Or – Building Solid Foundations for a
Structure Based Design Campaign
John Liebeschuetz, Peter Carlqvist, Simon Bowden
Cambridge Crystallographic Data Centre
12 Union Rd., Cambridge, UK
www.ccdc.cam.ac.uk
Assessment and Comparison of
Ligand – Protein Structural Models
• For the Crystallographer
– What is wrong with my model?
– What interesting features or differences with related
structures can I highlight in my publication?
• For the Molecular Modeller
– What is wrong with the Crystallographer’s model?
– What interesting features or differences with related
structures can I use to inform my structure-based drug
design campaign ?
– Are there non-homologous structures with similar features
that I need to watch out for?
www.ccdc.cam.ac.uk
Why can’t I take a structure from
the PDB and just use it ?
• Validation of ligand structures bound to proteins
15% of 100 recent PDB entries have ligand geometry that are
almost certainly in significant error (in house analysis using
Relibase+/Mogul)
evaluation of pdb ligand dataset from 1990's with Mogul and Relibase
evaluation of most recent pdb ligand dataset with Mogul and Relibase
correct
29%
correct
34%
not unusual
40%
correct
correct
wrong
wrong
not
unusual
not unusual
not unusual
55%
wrong
16%
wrong
26%
Pre 2000
www.ccdc.cam.ac.uk
2006
How much ligand strain is
accomodated by the protein?
• Accepted View –Many ligands adopt strained
conformation when bound to proteins, some
(60%) do not bind even in a local minimum
conformation. (Perola & Charifson, J. Med. Chem.
2004, 47, 2499-2510)
• Alternative view – Ligands usually (but not
always) bind in a local minimum. Many ‘strained’
structures found in the PDB are imperfectly
refined. (Open-Eye, B. Kelley and G. Warren,
EuroCYP)
www.ccdc.cam.ac.uk
CCDC Tools that can help you
• Relibase/Relibase+ - Web-based database system for
searching, retrieving and analysing 3D structures of protein-ligand
complexes in the Brookhaven Protein Data Bank (PDB)
– Relibase is freely available for academics
– Relibase+ has extra features (some of these will be used in this
workshop)
• The Cambridge Structural Database System Database of > 400,000 small molecule crystallographic structures,
and associated query software
– Mogul and IsoStar knowledge-bases of molecular geometry and intermolecular interactions
– Directly linked access from Relibase+
www.ccdc.cam.ac.uk
The Workshop
Part 1: Validation of models and structural analysis
•
Analysing a protein structure for errors and interesting features
•
Comparing a structure with structures related by homology or by functionality
Part 2: Probing the Protein-Ligand Interface
•
Substructure searching in Relibase/Relibase+
•
Comparing the interactions of different ligands with the same target
•
Validating an unusual interaction using substructure searching in Relibase+
www.ccdc.cam.ac.uk
Relibase+
• Relibase+
– Web-based database system for searching, retrieving and
analysing 3D structures of protein-ligand complexes in the
Brookhaven Protein Data Bank (PDB)
– Successor to ReLiBase (developed by Manfred Hendlich et al.
(Merck, Marburg U.)
M. Hendlich, Acta Cryst. D54,1178-1182, 1998
• Relibase: free on WWW for academics
– http://relibase.ccdc.cam.ac.uk/
– http://relibase.rutgers.edu/
www.ccdc.cam.ac.uk
Relibase+
Basic Functionality
• Keyword searching
• FASTA protein sequence searching
• 2D substructure searching
• 3D protein-ligand interaction searching
• Protein-protein interaction searching
• Similarity searching for ligands
• SMILES substructure matching
• Automatic superposition of related binding sites to
compare ligand binding modes, water positions, etc.
• 3D visualisation with AstexViewer and ReliView(Hermes)
www.ccdc.cam.ac.uk
Relibase+
Advanced Functionality
• Functionality for generation and search of proprietary
databases of protein-ligand complexes alongside the PDB
• Links to the Mogul and IsoStar modules of the CSDS for
geometry validation
• Additional modules: Crystal packing, WaterBase,
CavBase
• Detailed analysis of superimposed binding sites
• Enhanced treatment of hitlists
• Reliscript: Command-line access via a Python-based
toolkit
• Coming Soon: SecBase including Turn Classification
www.ccdc.cam.ac.uk
CavBase
CavBase
• Detect unexpected similarities amongst protein cavities
(e.g. active sites) that share little or no sequence
homology.
• Similarity judged by matching 3D property descriptors
(pseudocentres) that encode the shape and chemical
characteristics of each cavity
• No sequence information used, can detect similar cavities
even if they have no obvious secondary-structure
relationship
• Developed by S.Schmitt et al., J.Mol.Biol. (2002)
www.ccdc.cam.ac.uk
Cambridge Structural Database
• Repository for the world’s small organic and metal-organic
crystal structures (up to 500 non-H atoms)
• Experimentally determined 3D structures via X-ray, and
neutron diffraction methods
• 2007 release contains 423,798 entries
– approximately 32,000 entries added per year
• Derived from around 1200 published sources
– official depository for >80 major journals
– majority of data directly deposited electronically (CIF)
• Increasing number of Private Communications
www.ccdc.cam.ac.uk
How much Data is Available?
Growth of the CSD
CSD Growth
1970-2006
Predicted Growth
to 2010
600000
500000
400000
300000
200000
100000
0
2001
419,768 entries June 2007
www.ccdc.cam.ac.uk
2003
2005
2007
2009
>500,000 entries during 2009
CSD Information content
Crystal structure data
Atomic coordinates, unit-cell, space-group symmetry (fully
validated)
www.ccdc.cam.ac.uk
CSD Information content
Bibliographic and Chemical Information
•
Bibliographic and chemical text
and properties (all searchable)
•
Chemical diagram and chemical
connectivity to enable 2D and
3D searching for substructures,
pharmacophores and
intermolecular interactions
•
Cross-referencing between
entries
4-Oxonicotinamide-1(1’-beta-D-2’,3’,5’-tri-O-acetyl-ribofuranoside)
Source: Rothmannia longiflora
Colour: pale yellow
Habit: acicular
Polymorph: Form IV
C17 H20 N2 O9
G. Bringmann, M. Ochse, K. Wolf,
J. Kraus, K. Peters, E-M. Peters,
M. Herderich, L. Ake, F. Tayman
Phytochemistry 51 (1999), p271
R-factor: .0506
www.ccdc.cam.ac.uk
Cambridge Structural Database System
VISTA
PreQuest
Database
Production
Cambridge
Structural
Database
Statistical
analysis
ConQuest
Database
Search
Mercury
Graphical display,
packing analysis
IsoStar
Mogul
Library of
Intermolecular Interactions
Library of
Molecular Geometry
www.ccdc.cam.ac.uk
Knowledge
Bases
Mogul
A Knowledge Base of Molecular Geometries
Bruno et al., J. Chem. Inf. Comput. Sci., 44, 21332144, 2004
www.ccdc.cam.ac.uk
Mogul
Rapid access to CSD information
 Incorporates pre-computed libraries of bond lengths,
valence angles and torsion angles, derived entirely from
the CSD
 Sketch or import molecule, then click on feature of
interest to view distribution, mean values and statistics
 Very fast search speeds, with hyperlinks to the CSD to
view specific structures
 Complete geometry: retrieve distributions for all bonds,
angles and torsions in the molecule
www.ccdc.cam.ac.uk
IsoStar
A Knowledge Base of Intermolecular Interactions
• Experimental data from:
– Cambridge Structural Database
– Protein Data Bank (protein-ligand complexes only)
– Theoretical potential energy minima (DMA, IMPT)
• Interaction distributions displayed immediately as
scatterplots or contour surfaces
• >20,000 CSD scatterplots, >5,500 PDB, 1,500 Eminima
www.ccdc.cam.ac.uk
IsoStar Methodology
central group: -CONH2
contact group: NH
Search CSD or PDB for structures containing
desired contact
Superimpose hits and display as scatterplots
www.ccdc.cam.ac.uk
Density Maps
Can also represent distribution as density maps
www.ccdc.cam.ac.uk
The Workshop
Part 1: Validation of models and structural analysis
• Analysing a protein structure for errors and interesting features
• Comparing a structure with structures related by homology or by
functionality
Part 2: Probing the Protein-Ligand Interface
• Substructure searching in Relibase/Relibase+
• Comparing the interactions of different ligands with the same target
• Validating an unusual interaction using substructure searching in
Relibase+
www.ccdc.cam.ac.uk
How to access the workshop
Webpage
http://relibase.ccdc.cam.ac.uk/
Email address
[email protected]
Password
s1mple
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
Cavity Detection
O
N
O
N
N
N
N
O
N
O
N
O
O
PROTEIN
N
O
N
O
N
Based on the LIGSITE Program
M.Hendlich et al., J.Mol.Graph. (1997).
www.ccdc.cam.ac.uk
The
pseudo-centre
concept
Coding Molecular Recognition into Simple
Descriptors
NH
O
acceptor
donor
O
O
aliphatic
pi/aromatic
H
N
HN
O
www.ccdc.cam.ac.uk
HN
O
Cavity
O
N
H
Protein
3D Property Description
www.ccdc.cam.ac.uk
Similarity Search
www.ccdc.cam.ac.uk
Similarity Search
Clique detection
Bron-Kerbosch
www.ccdc.cam.ac.uk
Similarity Search
Clique detection
Bron-Kerbosch
www.ccdc.cam.ac.uk
Similarity Analysis
Scoring based on
matching pseudocentres, and the
associated
surface patches
www.ccdc.cam.ac.uk
An Example
1OXO/1F2D
• Overlay of PLP ligands
• Matching pseudo-centres
and surface patches
shown
www.ccdc.cam.ac.uk
Crystal Packing
Important e.g. when docking ligands
Concanavalin A (1cjp)
www.ccdc.cam.ac.uk
Binding site in Relibase+
1mtw
reference ligand, no packing
reference in green, first-rank solution atom-coloured
www.ccdc.cam.ac.uk
1mtw, Packing Included
reference
no packing
GOLD’sligand,
first-rank
solution
including neighbouring chains
www.ccdc.cam.ac.uk