Diversity and Plasticity of RNA Beyond the One-Sequence-One-Structure Paradigm Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Chemistry towards Biology Portorož, 8.–

Download Report

Transcript Diversity and Plasticity of RNA Beyond the One-Sequence-One-Structure Paradigm Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Chemistry towards Biology Portorož, 8.–

Diversity and Plasticity of RNA
Beyond the One-Sequence-One-Structure Paradigm
Peter Schuster
Institut für Theoretische Chemie und Molekulare
Strukturbiologie der Universität Wien
Chemistry towards Biology
Portorož, 8.– 12.09.2002
5 ' - en d
O
CH2
O
N a
O
N1
The chemical formula of RNA
consisting of nucleobases,
ribose rings, phosphate groups,
and sodium counterions
Nk = A , U , G , C
O
OH
P
O
CH2
O
N2
Magnesium ions play a special
role and act as coordination
centers which are indispensible
for the formation of full threedimensional structures
O
O
O
Na
P
OH
O
CH2
O
N3
O
O
O
Na
P
OH
O
CH2
O
N4
O
O
O
N a
P
O
OH
O
3 ' - en d
5'-E n d
3'-E n d
G C G G A U U U A G C U C A G D D G G G A G A G C M C C A G A C U G A AYA U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
B io ch em ical and
chem ical pro bing
C rystallo graph y
Stru ctu re predictio n
N M R , F R E T, ......
3'-E n d
3'-E n d
5'-E n d
70
5'-E n d
60
10
50
20
30
40
The one sequence – one structure paradigm
One day, when biomolecular structures were understood in sufficient
detail, we would be able to design molecules with predefined structures
and for a priori given purposes.
Biomolecular structures are not fully understood yet, but the lack of
knowledge in structure and function can be compensated by applying
selection methods.
5 ’- G
G
4
27
C
A
C
G
A
= 1.801  10
G G
16
U
U
U
A G
C
U
A
A
C
possib le d ifferent sequences
C om binatorial diversity of sequences:
N = 4

C
U
C G U
G
C
A
= adenylate
U
= uridylate
C
= cytidylate
G
= guanylate
Number of (different) sequences created by common scale random synthesis:
1015 – 1016.
Combinatorial diversity of heteropolymers illustrated by means of an
RNA aptamer that binds to the antibiotic tobramycin
C -3 ’
Taming of sequence diversity through selection and evolutionary
design of RNA molecules
D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands.
Nature 346 (1990), 818-822
C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment:
RNA ligands to bacteriophage T4 DNA polymerase. Science 249 (1990), 505-510
D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random
sequences. Science 261 (1993), 1411-1418
R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by
RNA. Science 263 (1994), 1425-1429
Amplificat ion
Diver sificat ion
Select ion Cycle
Select ion
Desir ed Pr oper t ies
???
no
Selection cycle used in
applied molecular evolution
to design molecules with
predefined properties
yes
Gen et ic
Diver sit y
E lution of b in ders
C h ro m a to g ra p h ic co lu m n
R eten tion of b ind ers
The SELEX technique for the evolutionary design of aptamers
5’ - G
G
C
A
C
G
A G G U
U
U
A G C
A C
U
A
U
5’ - G
G
3’ - C
C G
C
A
C
U
U
C
G
A G
G
U
A
U G
C
U
C
C
A
G
A
U
C
Formation of secondary structure of the tobramycin binding RNA aptamer
L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Chemistry & Biology 4:35-50 (1997)
C G U G
C
C - 3’
The three-dimensional structure of the
tobramycin aptamer complex
L. Jiang, A. K. Suri, R. Fiala, D. J. Patel,
Chemistry & Biology 4:35-50 (1997)
Mapping RNA sequences onto RNA structures
The attempt to investigate this mapping is understood as a search for the relations
between all possible 4n sequences and all thermodynamically stable structures,
which are the structures of minimal free energy. Sequence-structure mappings of
RNA molecules were studied by a variety of different experimental and in silico
techniques.
5'-E n d
S eq uen ce
3'-E n d
G C G G A U U U A G C U C A G D D G G G A G A G C M C C A G A C U G A AYA U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
3'-E n d
5'-E n d
70
60
10
S eco n da ry stru ctu re
Tertiary stru ctu re
50
20
30
5'-E n d
40
3'-E n d
S ym b olic n ota tion
What is an RNA structure?
The secondary structure is a listing of base pairs, and it is understood in contrast to the full 3D-structure dealing with
atomic coordinates. An intermediate state of structural details is provided by RNA threading or other toy models.
RNA Secondary Structures and their Properties
RNA secondary structures are listings of Watson-Crick and
GU wobble base pairs, which are free of knots and pseudokots.
Secondary structures are folding intermediates in the
formation of full three-dimensional structures.
D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov.
Annu.Rev.Phys.Chem. 52:751-762 (2001)
RNA Minimum Free Energy Structures
Efficient algorithms based on dynamical programming are
available for computation of secondary structures for given
sequences. Inverse folding algorithms compute sequences
for given secondary structures.
M.Zuker and P.Stiegler. Nucleic Acids Res. 9:133-148 (1981)
Vienna RNA Package: http:www.tbi.univie.ac.at (includes
inverse folding, suboptimal structures, kinetic folding, etc.)
I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer,
M.Tacker, and P. Schuster. Mh.Chem. 125:167-188 (1994)
C riterion o f
M in im u m F ree E n erg y
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC
GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG
UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG
CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG
GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
S equ ence S p ace
S hap e S pace
Many sequences from the same minimum free energy secondary structure
S k =  ( I.)
S eq u ence spa ce
fk = f (S k )
P h en otyp e sp ace
N o n -n eg ativ e
n um b ers
Mapping from sequence space into phenotype space and into fitness values
S k =  ( I.)
S eq u ence spa ce
fk = f (S k )
P h en otyp e sp ace
N o n -n eg ativ e
n um b ers
S k =  ( I.)
S eq u ence spac e
fk = f (S k )
P h en otyp e sp ace
N o n -n eg ativ e
n um b ers
A connected neutral network
G ian t C om pon en t
A multi-component neutral network
T> 0K ,t 
T > 0 K , t finite
3 .4 0
2 .8 0
49
4 4 4 64 2
41
37 34 35
45
40
39
38
36
33
32
31
43
29
30
28
24
3 .1 0
19
17
16
3 .40
11
14
10
3.0 0
5 .10
9
18
15
2 .9 0
13
12
8
7
6
5
4
3
7 .4 0
2
5 .90
S2
23
22
21
20
S10
S9 S8
S7
S5 S6
S4
S3
27
25
26
2.6 0
F ree E n erg y
48 47
3 .1 0
3.3 0
T= 0K ,t 
S1
S0
M inim um F ree E n ergy Structure
S0
S u boptim al S tructu res
Different notions of RNA structure including suboptimal conformations
S1
K inetic Structures
S0
Partition Function of RNA Secondary Structures
John S. McCaskill. The equilibrium function and base pair binding probabilities for
RNA secondary structure. Biopolymers 29 (1990), 1105-1119
Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred
Tacker, Peter Schuster. Fast folding and comparison of RNA secondary structures.
Monatshefte für Chemie 125 (1994), 167-188
3'
Example of a small RNA molecule
with two low-lying suboptimal
conformations which contribute
substantially to the partition function
5'
UUGGAGUACACAACCUGUACACUCUUUC
Example of a small RNA molecule: n=28
C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C
C U U U C U C A C A U G U C C A A C A C A U G A G G U U
U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U U G G A G U A C A C A A C C U G U A C A C U C U U U C
U
U
U
U
G
U
C
G
C
U
A
A
G
C
se cond suboptim al c onfiguration
U
A
A
U
 E 0  = 0.55 kc al / m ole
C
G
A
U
C
C
A
C
A
C
U
U
first suboptim a l configura tion
U
U
U
G
C
G
U
A
C
G
A
 E 0  1 = 0.50 kc al / m ole
A
U
A
3' C U
C
U
C
A
U
U C
G G U C
5' U U
A G A C
A
U
A U
C G
A U
C
C
A A C
G
C
U
C
A
A
C
m inim um free energy
co nfig uratio n
 G 0 = - 5 .39 kcal / m ole
„Dot plot“ of the minimum free energy structure (lower triangle) and the partition function
(upper triangle) of a small RNA molecule (n=28) with low energy suboptimal configurations
5 '-E n d
S equenc e
3 '-E n d
G C G G A U U U A G C U C A G D D G G G A G A G C M C C A G A C U G A AYA U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
3 '-E n d
5 '-E n d
70
60
S econda ry Structure
10
50
20
30
S ym bolic N otation
40
5 '-E n d
Phenylalanyl-tRNA as an example for the computation of the partition function
3 '-E n d
G
first subo ptim al config uration
 E 0  1 = 0 .43 kcal / m ole
3’
5’
tR N A p h e
w ith o u t m odified bases
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
A C C A C G C U U A A G A C A C C U A G C P T G U G U C C U MG A G G U C U A Y A A G U C A G A C C M C G A G A G G G D D G A C U C G A U U U A G G C G
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
A
G
C
C
G
A
U
C
G
P
U
T
C
C
C
A
A
A
C
G
C
U
U
A
A
G
G
C
G
G
A
U
U
U
C
M
G
A
G
C
A
A
U
C
U
G
C
C
G
U
A
C
G
M
G
C
U
A
C
G
U
A
Y
A
A
A
C
U
G
A
G
G
D
D
G
G
first subo ptim al co nfigu ration
 E 0  1 = 0.94 k cal / m o le
3’
5’
tR N A p h e
w ith m odified b ases
Kinetic Folding of RNA at Elementary Step Resolution
The RNA folding process is resolved to base pair closure, base pair cleavage
and base pair shift. The kinetic folding behavior is determined by computation
of a sufficiently large ensemble of individual folding trajectories and taking an
average over them. The folding behavior is illustrated by barrier trees showing
the path of lowest energy between two local minima of free energy.
C.Flamm, W.Fontana, I.L.Hofacker and P.Schuster. RNA, 6:325-338 (2000)
clo su re
cleav a ge
sh ift
Move set for elementary steps
in kinetic RNA folding
Mean folding curves for three small RNA molecules with n=15 and very different folding behavior
S
(h)
5
S
Free en erg y  G
0
S
S
S
(h)
7
S
(h)
6
Suboptim al conform ations
Search for local minima in
conformation space
Sh
L ocal m inim um
(h)
2
(h)
9
(h)
1
0
G
G
0
F ree energ y
Saddle point T k
Sk
F ree energy
S
T k
S
Sk
"R eaction coordinate"
"B arrier tree"
S3
S2
O
I1 = A C U G A U C G U A G U C A C
Example of an inefficiently folding small RNA molecule with n = 15
S1
S0
S4
S2
S3
S1
O
I2 = A U U G A G C A U A U U C A C
Example of an easily folding small RNA molecule with n = 15
S0
S3
S2
S1
O
I3 = C G G G C U A U U U A G C U G
Example of an easily folding
and especially stable small
RNA molecule with n = 15
S0
Folding dynamics of the sequence GGCCCCUUUGGGGGCCAGACCCCUAAAAAGGGUC
3’-end
C
U
G
G
G
A
A
A
A
A
U
C
C
C
C
A
G
A
C
C
G
G
G
G
G
U
U
U
C
C
C
C
G
G
M inim u m free en ergy co nform a tion S 0
One sequence is compatible with
two structures
G
G
C
G
C
G
C
G
C
G
U
A
G
C
G
C
G
C
G
C
A
A
C
A
U
U
A
C
G
U
A
U
A
G
C
U
A
G
C
C
G
C
G
C
G
C
G
C
G
C
G
C
G
C
G
G
U
G
C
A
U
G
U
U
S uboptim al c onform ation S 1
A
C
3.40
44 46 42
41
43
45
40
38
39 36
33
37 34 35
32
29
30
28
24
27
25
26
3 .1 0
16
19
17
13
3.40
12
11
14
10
3 .0 0
5.10
9
18
15
2 .9 0
20
23
22
21
2 .6 0
31
3 .1 0
3.3 0
49
2 .8 0
48 47
8
7
6
5
4
7.40
3
2
5 .9 0
Barrier tree of a sequence with
two conformations
S1
S0
Is there experimental evidence for structural multiplicity
of RNA sequences?
Are there RNA molecules with multiple functions?
How can RNA molecules with multiple functions be
designed?
OH
3'
OH 5'
U
A
G
C
C
G
C
G
A
U
A
C lea v a g e site
C
A
G
A
A
G
G
C
C
A
C
C
G
G
G
G
G
U
C
G
C
C
C
C
A
G
C
G
G ppp 5'
C
U
G
A
G
U
A
T h e " h a m m erh ea d " rib o zy m e
The smallest known
catalytically active
RNA molecule
OH 3'
A ribozyme switch
E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of
new ribozyme folds. Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage
ribozyme of hepatitis--virus (B)
The sequence at the intersection:
An RNA molecules which is 88
nucleotides long and can form both
structures
Reference for the definition of the intersection
and the proof of the intersection theorem
Two neutral walks through sequence space with conservation of structure and catalytic activity
Sequence of mutants from the intersection to both reference ribozymes
Reference for postulation and in silico verification of neutral networks
3 '-E n d
5 '-E n d
70
60
10
50
20
30
40
From RNA secondary structures to full three-dimensional structures.
Example: Phenylalanyl-transfer-RNA
Which perspectives have RNA structure modelling and elaborate sequencestructure analysis?
Secondary structures are based on the identification of base pairs with defined and
only marginally varying geometries that fit into A- or A’-type helices. Until now
a great variety of other classifiable base pairs have been found by crystallography
and NMR. They can be readily included in structure prediction methods with are
similar to the current algorithms for conventional secondary structures. What is
needed, however, is the determination of thermodynamic parameters for these
unconventional base-base interactions, as it was done in the nineteen-seventies for
DNA and RNA double helical and loop structures. So far these data are scarce
except H-type pseudo-knots and end-to-end stacking of helices.
It seems that the prediction of RNA structures will be an easier task than that of
proteins.
Classification of purinepyrimidine base pairs
Classification of purine-purine base pairs
Classification of pyrimidinepyrimidine base pairs
General classification
of base pairs
N.B.Leontis and E. Westhof, RNA 7:499-512 (2001)
Coworkers
Walter Fontana, Santa Fe Institute, NM
Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM
Peter Stadler, Universität Leipzig, GE
Ivo L.Hofacker, Christoph Flamm, Universität Wien, AT
Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT
Michael Kospach, Ulrike Langhammer, Ulrike Mückstein, Stefanie Widder
Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler, Stefan Wuchty
Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE
Walter Grüner, Stefan Kopp, Jaqueline Weber