Transcript ChEMBL

Overview of ChEMBL Database
Gareth Owen, ChEBI group, EMBL-EBI
Northwestern University
16th October 2012
EBI is an Outstation of the European Molecular Biology Laboratory.
What is ChEMBL?
• Open access database for drug discovery
• Freely available (searchable and downloadable)
• Content:
• 2D structures & calculated properties (logP, MW, Lipinski, etc.)
• Associated bioactivity data extracted from the primary medicinal
chemistry journals such as J. Med. Chem.
• Deposited data from neglected disease screening (e.g. malaria)
• Subset of data from PubChem
• Covers ~30 years of compound synthesis and testing
• Annotated FDA-approved drugs
• Secure searching (https://www.ebi.ac.uk/chembldb )
2
ChEMBL Database
• Content ChEMBL14
Targets: 9,003
Compounds: 1,376,469
Activities: 10,129,256*
Publications: 46,133
* Includes:
~5,900,000 (PubChem)
~100,000 (Deposited malaria screening sets)
Assays are classified as:
• Binding measurements
• Functional assays
• ADME/toxicity data
3
3
60% proteins
20% organisms

20% cell lines
ChEMBL Assays – Binding, Functional, ADMET
Binding Assays
• Assays which directly measure the binding of a
compound to a particular target
• E.g., competition binding assays with a radioligand
• Various endpoints measured, but most commonly
reported are:
•
•
•
•
4
IC50 (half maximal inhibitory concentration)
Ki (binding affinity)
MIC (minimum inhibitory concentration)
% Inhibition (of activity)
Functional Assays
Whole organism assays
(e.g., anti-infectives/parasitics)
(e.g., human ovarian cancer cell line cytotoxicity)
Tissue or cell-based disease model
(e.g., glucose uptake by adipocytes)
Tissue or cell-based assay for target effect
(e.g., contraction of guinea-pig ileum)
Cell-based assay over-expressing target
(e.g., GPCR calcium mobilisation)
5
Target association
Disease association
Disease-derived cell-line
ADMET Assays
• Assays measuring:
Absorption, Distribution, Metabolism, Excretion, Toxicity properties of
compounds
• Examples include:
• Half-life of compound in rats
• Tissue distribution of compound
• Levels of metabolites
6
ChEMBL Targets:
Protein
e.g., PDE5
Protein complex
e.g., Nicotinic acetylcholine receptor
Cell Line
e.g., HEK293 cells
7
Tissue
e.g., Nervous
Protein family
e.g., Muscarinic receptors
Sub-cellular Fraction
e.g., Mitochondria
Nucleic Acid
e.g., DNA
Organism
e.g., Drosophila
Protein Targets
• Each protein target linked to a sequence in UniProt
• Information from UniProt used in ChEMBL to allow
searching:
• Protein name/description
• Synonyms and gene names
• Organism (and NCBI Tax ID)
• Proteins in ChEMBL also classified according to family
(e.g., Receptor, Kinase, Protease, Transporter etc).
• Used for searching by target tree (Browse Targets)
8
ChEMBL Compounds
• Chemical structures are stored as .mol files
• If the stereochemistry is known it is drawn as a specific
enantiomer
• Tautomers of the same compound are
treated as the same compound. The
form shown is as in the paper
• Identifying unique compounds is done using standard Inchis
• Salts and parent molecules are grouped together for
displaying bioactivity data although activity data is recorded
against the specific salt
9
ChEMBL Home Page
10
https://www.ebi.ac.uk/chembldb
ChEMBL Main Search Page
11
Drug
Information
Clickable structure
Parent and Salt
Forms
12
Small molecule resources at the EBI
13
Click to
display data
14
15
16
ChEBI Link:
18
This will take you back to ChEMBL
19
ChemSpider Links:
The link works
both ways. They
link TO
ChemSpider and
FROM
ChemSpider.
They link on
Standard_Inchi
20
Wikipedia Links:
We also have links with
Wikipedia. These also use
the Standard_Inchi as the
common identifier. These
links will link to the
Compound Report Card in
ChEMBL.
The links are added by a
ChemoBot and can be
updated with each
release, if required.
21
Use Case 1 - Searching by Target
• What is known about chemical structures that bind to a
specific protein (Adenosine A2a)?
• What is known about their potency/selectivity/ADMET
Properties
• Is there any protein structure data?
22
Use Case 1 Searching by Target in ChEMBL
Choose Sources to
include in search
23
Retrieving Bioactivity Data - Single Target
3D Structures
Bioactivity data for
target
Assay data for
target
24
Display all
bioactivity data for
target
Click pie chart to
retrieve particular
end-points
Filtering Bioactivities
Select targets
of interest
Select required
activity types and
define cut-offs
e.g Ki<100nM
25
Bioactivity Results
Compound
structures
26
Activity values
Assay details
Target details References
Selectivity Data
For example:
Can search ChEMBL for all data on compounds that have
adenosine A2a Ki values <100nM
27
ADMET Data
Summary of ChEMBL bioavailability data for compounds with A2a Ki
values <100nM
Example of
Bioavailability data
28
Use Case 2 – Searching by Structure
• What compounds contain a particular substructure?
• What is known about their bioactivities?
• Known drugs/clinical Trials
29
name
Lists of Identifiers
Types of synonyms:
• Research codes
• Trade names
• INN, USAN
Different sketchers
30
Similarity and Substructure Searching
31
Display/Download Bioactivity Data
Filtering Data on Lipinski Properties etc
32
Display Bioactivities of subset
Bioactivities
names
Structure
33
Bioactivities
Properties
Cross-references
34
Clinical Trials
Links to Other Resources
35
Links to Other Resources
PDBe - http://www.ebi.ac.uk/pdbe
36
Marketed Drugs
37
Select set of interest
Export to Excel or
Export SDF
Use Case 3 – Similar Targets
• Are there any available data on compounds that bind to
proteins similar to IRAK2?
• For these compounds what bioactivity data is there on
compounds with related sub-structures?
• Is there any crystal structure data on these proteins?
38
Protein Sequence Search
• More precise method for identifying targets
• Input is a protein sequence of interest
• Uses BLAST* algorithm to perform pair-wise comparisons
between input sequence and all proteins in the Target
Dictionary, to find most closely related matches
• Results are scored according to similarity to input
sequence (determined by number of amino acids that are
identical or have similar properties)
39
*Altschul SF et al., J Mol Biol. 215(3), p403-10 (1990)
Use Case 3 – Similar Targets
Protein Sequence of
Interest e.g from UniProt
http://www.uniprot.org
40
Data on IRAK1,IRAK3 and
IRAK4 but not IRAK2
IRAK1, IRAK3 and IRAK4 data
Identify sub-structure of interest
What other data available on compounds with this
sub-structure?
41
Use Case 4 - Assay keyword search
• Some ChEMBL data (e.g., functional assays) may not be
mapped against molecular targets
• May want to perform a more general search (e.g., for a
disease process, animal model, cell type of interest)
• Examples:
1. What compounds have been tested in disease models (cholesterol
lowering)?
2. What data is available for brain penetration (brain to plasma ratio)?
42
Assay Search for “Cholesterol Lowering”
43
Assay Search for “Brain to Plasma”
44
Accessing ChEMBL Data
45
Useful Links
ChEMBL Blog:
http://chembl.blogspot.com
If you would like help:
[email protected]
For ChEMBL news and data releases subscribe to:
http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce
46
Acknowledgements
ChEMBL Group
John Overington
Anne Hersey
Anna Gaulton
Mark Davies
Jon Chambers
Louisa Bellis
Kazuyoshi Ikeda
Patricia Bento
Shaun McGlinchey
Yvonne Light
Felix Krueger
Ben Stauch
Ruth Akhtar
Francis Atkinson
Rita Santos
47
EMBL-EBI
Samuel Kerrien, Sandra Orchard, Bruno
Aranda, Rafael Jimenez, Reactome,
UniProt and ChEBI teams
Collaborators
Imperial Cancer Research, University of
Dundee, University of Cambridge,
Sanger Centre, University of Maryland,
NCBI, TDR, IUPHAR, Bayer-Schering,
Pfizer, GSK, Schering-Plough, MMV,
Novartis, St Jude Children’s Research
Hospital
Former Inpharmatica colleagues
Exercises!
48