Transcript GPCR

Functional 3-D modelling of
G protein coupled receptors
Uğur Sezerman
Central Dogma
DNA
Transcription
mRNA
Translation
PROTEINS
Motivation
• Knowing the structure of molecules
enables us to understand its mechanism
of function
• Current experimental techniques
– X-ray cystallography
– NMR
X-Ray Crystallography
• crystallize and
immobilize single,
perfect protein
• bombard with X-rays,
record scattering
diffraction patterns
• determine electron
density map from
scattering and phase
via Fourier transform:
• use electron density
and biochemical
knowledge of the
protein to refine and
determine a model
"All crystallographic models are not equal. ... The brightly colored stereo views
of a protein model, which are in fact more akin to cartoons than to
molecules, endow the model with a concreteness that exceeds the
intentions of the thoughtful crystallographer. It is impossible for the
crystallographer, with vivid recall of the massive labor that produced the
model, to forget its shortcomings. It is all too easy for users of the model to
be unaware of them. It is also all too easy for the user to be unaware that,
through temperature factors, occupancies, undetected parts of the protein,
and unexplained density, crystallography reveals more than a single
molecular model shows.“
- Rhodes, “Crystallography Made Crystal Clear” p. 183.
NMR Spectroscopy
determining constraints
using constraints to determine
secondary structure
• protein in aqueous solution,
motile and tumbles/vibrates
with thermal motion
•
• NMR detects chemical shifts of
atomic nuclei with non-zero
spin, shifts due to electronic
environment nearby
• determine distances between
specific pairs of atoms based
on shifts, “constraints”
• use constraints and
biochemical knowledge of the
protein to determine an
ensemble of models
Primary
Assembly
Secondary
Folding
Tertiary
Packing
Quaternary
Interaction
PROCESS
STRUCTURE
Biology/Chemistry of Protein Structure
Protein Assembly
• occurs at the ribosome
• involves dehydration
synthesis and
polymerization of amino
acids attached to tRNA:
+
3
2
n
NH - {A + B  A-B + H O} -COO
• yields primary structure
Amino Acids
Forces driving protein folding
• It is believed that hydrophobic collapse is
a key driving force for protein folding
– Hydrophobic core
– Polar surface interacting with solvent
• Minimum volume (no cavities) Van der
Walls
• Disulfide bond formation stabilizes
• Hydrogen bonds
• Polar and electrostatic interactions
PROTEIN FOLDING PROBLEM
• STARTING FROM AMINO ACID SEQUENCE
FINDING THE STRUCTURE OF PROTEINS IS
CALLED THE PROTEIN FOLDING PROBLEM
Secondary Structure
• non-linear
• 3 dimensional
• localized to regions of an
amino acid chain
• formed and stabilized by
hydrogen bonding,
electrostatic and van der
Waals interactions
The a-helix
Ramachandran Plot
•
Pauling built models based on the following
principles, codified by Ramachandran:
(1) bond lengths and angles – should be
similar to those found in individual
amino acids and small peptides
(2) peptide bond – should be planer
(3) overlaps – not permitted, pairs of atoms
no closer than sum of their covalent radii
(4) stabilization – have sterics that permit
hydrogen bonding
•
Two degrees of freedom:
(1)  (phi) angle = rotation about N – Ca
(2)  (psi) angle = rotation about Ca – C
•
A linear amino acid polymer with some folds
is better but still not functional nor
completely energetically favorable 
packing!
Chou-Fasman Parameters
Name
Alanine
Arginine
Aspartic Acid
Asparagine
Cysteine
Glutamic Acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
Abbrv
A
R
D
N
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
P(a)
142
98
101
67
70
151
111
57
100
108
121
114
145
113
57
77
83
108
69
106
P(b) P(turn)
66
83
95
93
146
54
156
89
119
119
74
37
98
110
156
75
95
87
47
160
59
130
101
74
60
105
60
138
152
55
143
75
96
119
96
137
114
147
50
170
f(i)
0.06
0.07
0.147
0.161
0.149
0.056
0.074
0.102
0.14
0.043
0.061
0.055
0.068
0.059
0.102
0.12
0.086
0.077
0.082
0.062
f(i+1)
0.076
0.106
0.11
0.083
0.05
0.06
0.098
0.085
0.047
0.034
0.025
0.115
0.082
0.041
0.301
0.139
0.108
0.013
0.065
0.048
f(i+2)
0.035
0.099
0.179
0.191
0.117
0.077
0.037
0.19
0.093
0.013
0.036
0.072
0.014
0.065
0.034
0.125
0.065
0.064
0.114
0.028
f(i+3)
0.058
0.085
0.081
0.091
0.128
0.064
0.098
0.152
0.054
0.056
0.07
0.095
0.055
0.065
0.068
0.106
0.079
0.167
0.125
0.053
HOMOLOGY MODELLING
• Using database search algorithms find the
sequence with known structure that best
matches the query sequence
• Assign the structure of the core regions
obtained from the structure database to
the query sequence
• Find the structure of the intervening loops
using loop closure algorithms
Homology Modeling: How it works
o Find template
o Align target sequence
with template
o Generate model:
- add loops
- add sidechains
o Refine model
1esr
TURALIGN: Constrained
Structural Alignment Tool For
Structure Prediction
Motif Alignment Using Dynamic
Algorithm
RESULTS
• For all the experiments done, our algorithm perfectly matched
functional sites and motifs given as input to the program.
– 1csh vs 1iomA :
• RMSD = 2.50
– 1csh vs 1k3pA
• RMSD = 2.12
– 1k3pA vs 1iomA
• RMSD = 3.03
– 1b6a vs 1xgsA
• RMSD = 2.23
– 1fp2A vs 1fp1D
• RMSD = 2.98
• At average we got the best results for 5 experiments:
• RMSD = 2.57 with ac:0.4,sc:0.4,tc:0.2,cc:0
Thanks to
• Tural Aksel
Why Functional Classification?
• Huge amount of data accumulated via
genome sequencing projects. 
• Costly experimental structure
prediction methods (X-ray & NMR),
takes months/year. 
• Also computational structure
prediction methods are not accurate
enough.
G-protein coupled receptors
(GPCRs)
• Vital protein bundles with
versatile functions.
• Play a key role in cellular
signaling, regulation of basic
physiological processes by
interacting with more than 50%
of prescription drugs.
• Therefore excellent potential
therapeutic target for drug
design and the focus of current
pharmaceutical research.
GPCR Functional Classification
Problem
• Although thousands of GPCR
sequences are known, the
crystal structure solved only for
one GPCR sequence at
medium resolution to date.
• For many of them, the activating ligand is unknown.
• Functional classification methods for automated
characterization of such GPCRs is imperative.
Relationship between specific binding
of GPCRs into their ligands and their
functional classification
•
Subfamily classifications in GPCRDB are
defined according to which ligands the
receptor binds (based on chemical
interactions rather than sequence
homology).
• According to the binding of
GPCRs with different ligand
types, GPCRs are classified
into at least six families.
• The correlation between sub-family classification and
the specific binding of GPCRs to their ligands can be
computationally explored for Level 2 subfamily
classification of Amine Level 1 subfamily.
Benchmark Dataset
• Dataset
– 352 amines, 595 peptides, 1898 olfactory, 355
rhodopsin, 56 prostanoid
• Derive GPCR proteins from GPCRDB & SWISSPROT through internet
– Group the proteins according to their ligand specificity
(i.e amines, peptides, olfactory, rhodopsin, prostanoid)
– Seperate proteins into train and test groups with 2:1
ratio respectively
– Derive the ecto-domains by using TMHMM (i.e nterminal, loop1, loop2, loop3)
– Rewrite the sequences using 11 letter alphabets
Classification of Amino acids
Class
Amino
Acids
Class
Amino
Acids
a
I,V,L,M
g
G
b
R,K,H
h
W
c
D,E
i
C
d
Q,N
j
Y,F
e
S,T
k
P
f
A
Snake plot of the human beta-2
adrenoceptor
PROTEIN DATABASE
Train proteins; Ligand group: amines
ID
NAME
Sequence
n-term
Loop1
...
1
5H1A_RAT
MDVFSF...
acajejgdgd...
jdaadbhe...
...
2
5H1A_FUGRU
MDLRATS...
bekccbec...
aakjiceeiba..
...
3
5H1A_HUMAN
MDVLSPG..
bdfbfcccaa... aibcfihjbaf...
.
...
4
5H1B_PANTR
MEEPGAQ..
acckgfdifk
kaibcfihj
...
5
5H1B_RABIT
MEEPGAQ..
acckgfdifkka
...
ibcfihjbd ...
FINDING MOST COMMON PATTERNS
FOR EACH LIGAND GROUP
• Form triplets for n-terminal, loop1, loop2 and loop3
seperately
– For 11 letter alphabet 1331 different triplets
• For each triplet find proteins in certain ligand group
those containing the current triplet at a given location
and keep the data in vectors
• Find the ratio of occurence of each triplet in a given
GPCR protein type(i.e amines) in a given location (i.e
loop1)
• Insert the triplets into SQL database with their ratios
• Sort the triplets according to their ratios
VECTORS
ID
WORD
PROTEINS
1
aaa
5H1A_RAT, 5H1A_FUGRU, ...
2
aab
5HT1_APLCA, 5HTA_DROME, ...
3
aac
5HT1_APLCA, 5HTA_DROME, 5H1A_PONPY
4
aad
none
...
...
...
1328
kkh
5H1B_FUGRU , 5HTA_DROME...
1329
kki
none
1330
kkj
5H1F_RAT
1331
kkk
none
FINDING DISTINGUISHING
MOTIFS I
• Compare the ratios of triplets of a certain
ligand group with the occurence of this
triplet with the other ligand groups one by
one(aaa in amines = 0.5; in peptides = 0.1
r = 0.5/0.1
• Keep the motifs with n(150) highest “r”s for
each ligand group pairs. These are the
motifs that distinguish given group from
the other groups
RESULTS
• Success rates for Information theory
CART RESULTS
The classification table showing the only patterns determining amines from all
others
•
•
•
•
•
•
•
•
•
Index Triplet Family
1 CAA Amine
2 AIB Amine
3 HIJ Prostanoid
4 AEA Hormone-protein
5 JAA Hormone-protein
6 AAD TRH
7 ADA TRH
8 JCK Melatonin
i.e. Variable importance of the
amine determining patterns
Patterns
Relative Importance
Loop 1 ‘caa’
100
Loop 1 ‘gbh’
97.46
Loop 3 ‘iak’
83.767
Loop 1 ‘gjh’
64.62
Loop 1 ‘gda’
51.101
Loop 2 ‘aed’
44.942
Loop 1 ‘agj’
43.636
Loop 1 ‘aag’
31.099
Loop 1 ‘dca’
22.736
Loop 3 ‘akc’
17.737
Loop 1 ‘hjj’
16.511
N-term ‘afa’
12.811
N-term ‘eea’
0
Occurence of EIG in Loop2 in
Rhodopsin Family
Triplet JJI at exo-loop 2 in
olfactory sub-family.
Conclusion
• Exploiting the fact that there is a non-promiscuous
relationship between the specific binding of GPCRs into their
ligands and their functional classification, our method
classifies Level 1 subfamilies of GPCRs with a high
predictive accuracy of 98%.
• The presented machine learning approach, bridges the gulf
between the excess amount of GPCR sequence data and
their poor functional characterization.
• The method also finds binding motifs of GPCRs to their
specific ligands which can be exploited for drug design to
block these site
• With such an accurate and automated GPCR classification
method, we are hoping to accelerate the pace of identifying
proper GPCRs and their ligand binding scheme to facilitate
drug discovery especially for neurological diseases.
• Ligand binding motifs and their site
information can be used as contraints to
build better models.
• Highly conserved sites from alignment of
GPCR families can also be used as
constraints
Thanks to
• Murat Can Çobanoğlu
Class A Rhodopsin like
• The largest and most diverse family of
GPCRs
• Conserved sequence motifs
• Unique signal-transduction activities
• Important members:
–
–
–
–
–
–
Adrenergic Receptors
Adenosine Receptors
Chemokine Receptors
Dopamine Receptors
Histamine Receptors
Opsins
Highlighted 4 GPCRs for Structure
Comparison
Species
human
GPCR
β2AR
Ligand
inverse agonists carazolol
(Adrenergic)
avian
β1AR
antagonist cyanopindolol
(Adrenergic)
human
A2A (Adenosine)
antagonist ZM241385
bovine
Rhodopsin
inverse agonist 11-cis retinal
Extracellular surfaces
• The most significant structural divergences lie in the extracellular
loops and ligand-binding region
β2AR/β1A
R
A2A
rhodopsin
- contain a short α-helix that is stabilized by intra- and inter-loop
disulphide bonds
- N-terminal regions are disordered
- lacks a predominant secondary structure and expose the ligand-binding
cavity to extracellular bulk solvent
-forms a short β-sheet that caps the ligand and shielding the
chromophore from bulk solvent and preventing Schiff base hydrolysis
- amino terminus glycosylated
Ligand-Binding Pockets
• For both adrenergic
receptors and rhodopsin,
ligand binding is
mediated by polar and
hydrophobic contact
residues from TM3, TM5,
TM6 and TM7.
• Ligand superpositions
are partly overlapping for
β2AR, β1AR and
rhodopsin, however, for
β2AR and β1AR are
slightly more
extracellular than
rhodopsin.
• This difference results in
a significant in key
rotamer conformational
Ligand-Binding Pockets
• In contrast to the β2AR, β1AR
and rhodopsin, the ligand of A2A (
Adenosin) receptor binds in a
mode that is roughly
perpendicular to the bilayer
plane, and the packing
interactions with the protein,
mostly with TM6 and TM7.
Ligand-Binding Pockets
• Despite the highly conserved seven transmembrane
architecture, GPCRs can support a wide variety of
ligand-binding modes
• Also high conservation in the ligand-binding pocket is
observed as well as in other subfamilies of GPCRs
 probably explains some of the difficulty in obtaining
potent subtype-selective compounds in
pharmaceutical discovery programs
Cytoplasmic surfaces of the GPCR
structures
• Major structural difference between the ligand-activated GPCRs and
rhodopsin lies in the ‘ionic lock’ between the highly conserved
E/DRY motif on TM3 and a glutamate residue on TM6.
• Conserved among all family A GPCRs, these amino acids form a
network of polar interactions that bridges the two transmembrane
helices, stabilizing the inactive-state conformation.
Cytoplasmic surfaces of the GPCR
structures
• One common feature is the chemical environment surrounding
residues of the highly conserved NPXXY motif. The cytoplasmic end
of TM7, in which this motif is located, participates in key
conformational changes associated with GPCR activation.
• The proline in this motif causes a distortion in the α-helical structure,
and the tyrosine faces into a pocket lined by TM2, TM3, TM6 and
TM7.
Mechanism for Activation
• Structures of opsin provide clues to the transmembrane helix
rearrangements that can be expected as a result of agonist
binding
• Most importantly, the side chain of Trp 265 (the toggle switch)
moves into space previously occupied by the ionone ring of
retinal
• The cytoplasmic end of TM6 is shifted more than 6 Å outwards
from the centre of the bundle
Snake plot of the human beta-2
adrenoceptor