Daylight and Discovery

Download Report

Transcript Daylight and Discovery

Daylight and Discovery
How do I impress the boss when I get
back?
7/20/2015
1
What is Discovery?
• A constant fight
against the
hedgehogs!!
7/20/2015
2
What have I learned this week?
• Above all you have learned new languages that
allow you to communicate chemical concepts to,
and between, machines.
• These languages also allow you to communicate
these concepts via machines to your colleagues.
• You have also learned about other descriptions of
a molecular structure, such as fingerprints.
7/20/2015
3
Language recap
• SMILES
• SMARTS
• SMIRKS
• (FINGERPRINTS)
7/20/2015
4
SMILES
• SMILES contains the same information as might be found in an
extended connection table.
• The primary reason SMILES is more useful than a connection table is
that it is a linguistic construct, rather than a computer data structure.
• SMILES is a true language, albeit with a simple vocabulary (atom and
bond symbols) and only a few grammar rules.
• SMILES can be canonicalised. I.e. there is a unique, universal “name”
for a structure
• SMILES representations of structure can in turn be used as “words” in
the vocabulary of other languages designed for storage and retrieval of
chemical information .E.g HTML, XML or query languages such as
SQL.
7/20/2015
5
SMILES syntax
[atom]bond[atom] etc
atom : ‘[‘ <mass> symbol <chiral> <hcount> <sign<charge>> <‘:’class> ‘]’
;
bond : <empty> | ’-’ | ‘=‘ | ‘#’ | ‘:’ | ‘.’
;
Common elements, in the organic subset B,C,N,O,P,S,F,Cl,Br,I, in
their lowest common valence state(s), can be written without
brackets. If bonds are omitted, they default to single or aromatic, as
appropriate, for juxtaposed atoms.
7/20/2015
6
Example SMILES
CC
eth an e
[O H 3 +]
h yd ro n iu m
O =C=O
carb o n d io xid e
[2 H ]O [2 H ]
d eu teriu m o xid e
C#N
h yd ro gen cyan id e
F /C = C /F
E -d iflu o ro eth en e
C C (= O )O
acetic acid
F /C = C \F
Z -d iflu o ro eth en e
C1CCCCC1
cyclo h exan e
N [C @ @ H ](C )C (= O )O
L -alan in e
c1 ccccc1
b en zen e
N [C @ H ](C )C (= O )O
D -alan in e
7/20/2015
7
SMARTS
• In the SMILES language, there are two fundamental types of symbols:
atoms and bonds. Using these SMILES symbols, one can specify a
molecule's graph (its "nodes" and "edges") and assign "labels" to the
components of the graph (that is, say what type of atom each node
represents, and what type of bond each edge represents).
• The same is true in SMARTS: One uses atomic and bond symbols to
specify a graph. However, in SMARTS the labels for the graph's nodes
and edges (its "atoms" and "bonds") are extended to include "logical
operators" and special atomic and bond symbols; these allow
SMARTS atoms and bonds to be more general. For example, the
SMARTS atomic symbol [C,N] is an atom that can be aliphatic C or
aliphatic N; the SMARTS bond symbol "~" (tilde) matches any bond
7/20/2015
8
Example SMARTS
cc
an y p air o f attach ed aro m atic carb o n s
c:c
aro m atic carb o n s jo in ed b y an aro m atic b o nd
c-c
aro m atic carb o n s jo in ed b y a sin gle b on d ( e.g.b iph en yl).
[O ;H 1]
sim p le h yd ro x y o x yg en
[O ;D 1]
1 -co nn ected (h yd ro xy o r h yd ro xid e) o xygen
[F,C l,B r,I]
th e 1 st fo u r h alo gen s.
[N ;R ]
m u st b e alip h atic nitro gen A N D in a rin g
*@ ;!:*
7/20/2015
tw o ato m s co n n ected b y a n o n -aro m atic rin gb on d
9
Useful SMARTS
Heavy atom
[!$([#6,#7,#8,#9,#15,#16,#17,#35,#53])]
Rotatable bonds
[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]
Secondary amides [N&H1&D2]-&!@[#6&X3]
H-donors
H-acceptors
[!#6;!H0]
[$([!#6;+0]);!$([F,Cl,Br,I]);!$([o,s,nX3]);!$([Nv5,Pv5,Sv4,Sv6])]
Isolating carbons
[#6;!$(C(F)(F)F);!$(c(:[!c]):[!c]);!$([#6]=,#[!#6]);!$([#6;!+0])]
Stereo atoms
Stereo bonds
Stereo allenes
[$([X4&!v6&!v5;H0,H1]),$([SX3]([#6])([#6])~O)]
[CX3;!H2]=[CX3;!H2]
[CX3;H0]=C=[CX3;H0,H1]
7/20/2015
10
Rotatable bonds
[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]
• An atom which is
– NOT triply bonded to another atom
– AND NOT 1-connected ( I.e. Not terminal )
• Bonded by
– A single bond
– AND NOT a ring bond
• to the same type of atom
7/20/2015
11
Chemical Information Concepts
in Discovery
• Matching
– Total
– Partial
• Similarity
– Qualitative
– Quantitative
• Both matching and similarity are opinions
as they depend on descriptors.
7/20/2015
12
Filtering
• Quite often you may wish to eliminate
compounds which are inappropriate for
some activity or test.
– E.g. Delete any molecule from a list which
contains a “heavy metal” i.e. a non-common
element
– > $CONTRIB/smarts_filter -v \
‘[!$([#6,#7,#8,#9,#15,#16,#17,#35,#53])]’
7/20/2015
13
Counting things
• Count matches to patterns defined in
SMARTS
–
–
–
–
–
–
–
7/20/2015
Molecular formula
H-donors
H-acceptors
Rotatable bonds
Chiral centres
Rings
Fragments
14
Example
• Molecular formula
C13H22N4O3S
• H-donors
2
• H-acceptors
6
• Rotatable bonds 8
• Chiral centres
1
• Rings
1
• Fragments
6
N
O
S
HN
HN
N
O
O
7/20/2015
15
Estimating Measured Properties
• Any property which is an additive
constitutive property of a molecule can be
calculated by
– counting the matches of the constituent patterns
– lookup the weight for the pattern
– summing the products of the count and
individual pattern weights.
– apply any correction factors
7/20/2015
16
Examples of properties to
calculate
•
•
•
•
•
•
Molecular Weight
logP
Parachor
Molar Volume
Molar Refractivity
……….
7/20/2015
17
Molecular weight: a simple
example
• Molecular weight
– Molecular formula
• (count(atom(i))*atomic_weight(atom(i)))
• Accuracy depends on accuracy of atomic
weights ( IUPAC)
– C13H22N4O3S
• 314.45 (average molecular weight )
• 314.141235 ( accurate mass of commonest isotope)
7/20/2015
18
CLOGP: A more complicated
example
• Algorithmic definition of fragment
– Pattern = NOT an isolating carbon
– Match the pattern to find all the fragments
• Look up the fragment value(s) ( if it exists )
using the unique string(s) from the match.
• Accumulate the values for fragments and
non-fragments (isolating carbons).
• Correct for proximity
7/20/2015
19
CLOGP example
• 2 * Cl
• guanidyl
• 2*C
• 6*c
• 7*H
• Proximity
Total
7/20/2015
+1.880
–1.930
+0.390
+0.780
+1.589
–0.984
+1.727
20
Estimating values for concepts
• Flexibility
– Ratio of number of rotatable bonds to total number of
bonds
• Rigidity
– Molecular similarity between original molecule and
molecules formed by breaking all rotatable bonds
• Difficulty of synthesis
– Ratio of number of potential chiral centres weighted for
rings to total number of heavy atoms in a molecule
7/20/2015
21
Example
• Flexibility 0.38
• Rigidity
•
0.3819
N
O
Difficulty of synthesis
0.05
S
HN
HN
N
O
O
7/20/2015
22
Example
• Flexibility 0.38(0.00)
• Rigidity
•
0.3819(1.00)
N
O
Difficulty of synthesis
0.05 (0.85)
• Figures in parentheses for
morphine
S
HN
HN
N
O
O
7/20/2015
23
Relationships between
compounds
• Compound sets
– Molecular descriptors
• Fingerprints etc
– Similarity measures
• Tanimoto etc
– Clustering
• Jarvis-Patrick etc
7/20/2015
24
Relationships between
compounds
• Mixtures
– Molecular descriptors
• Modal Fingerprints etc
– Similarity measures
• Tanimoto etc
– Prototypes
– Family Resemblance
7/20/2015
25
Relationships between
compounds
• Reactions
– Molecular descriptors
• Fingerprints
– Rôles
– Schemes/pathways
– Similarity and clustering
7/20/2015
26
Examples
• Creating a spreadsheet of properties.
• Non-standard fingerprinting and similarity.
7/20/2015
27
Don’t let the hedgehogs take over…..
7/20/2015
28
Don’t let the hedgehogs take over…..
7/20/2015
29