Modeling RNA motifs by graph-grammars Franç[email protected] www.iric.ca MC-Tools: Functions • • • • • • • ( MC-Annotate 3-D ) -> graph ( MC-Cycles graph ) -> [ NCM ] ( MC-Seq graph.

Download Report

Transcript Modeling RNA motifs by graph-grammars Franç[email protected] www.iric.ca MC-Tools: Functions • • • • • • • ( MC-Annotate 3-D ) -> graph ( MC-Cycles graph ) -> [ NCM ] ( MC-Seq graph.

Modeling RNA motifs by
graph-grammars
Franç[email protected]
www.iric.ca
MC-Tools: Functions
•
•
•
•
•
•
•
( MC-Annotate 3-D ) -> graph
( MC-Cycles graph ) -> [ NCM ]
( MC-Seq graph ) -> [ sequence ]
( MC-Fold sequence ) -> [ graph ]
( MC-Cons [ ( sequence, [ graph ] ) ] ) -> [ graph ]
( MC-Search ( graph, [ 3-D ] ) -> [ 3-D ]
( MC-Sym graph ) -> [ 3-D ]
07.05 - Madison (ROC)
2
MC-Tools: Objects
(rat 28S rRNA sarcin/ricin stem-loop)
Nucleotide cyclic motifs:
( MC-Sym graph ) -> [ 3-D ]
Graph:
3-D structure:
Sequence: GGGUGCUCAGUACGAGAGGAACCGCACCC
07.05 - Madison (ROC)
Szewczak et al. PNAS(USA) 1993
Lemieux & Major NAR 2006
Parisien, Thibault & Major (in prep.)
3
Graph
( MC-Annotate 3-D ) -> graph
Gendron, Lemieux & Major JMB 2001
Lemieux & Major NAR 2002
Leontis & Westhof RNA 2001
07.05 - Madison (ROC)
4
Shortest Cycle Basis
X4
C4
( MC-Cycle graph ) -> [ NCM ]
X2
Y1
C5
X C2
3
C3
Y2
C1
X
1 5’
Y3
3’
Horton SIAM J Comp 1987
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
5
The Nucleotide Cyclic Motifs (NCM)
i.
Embrace indistinctly all base pairing types (Watson-Crick
and others)
ii. Precisely designate how any nucleotide in the sequence
relate to others
iii. Are joined through a common base pair (context). This
helps us predict coherent chains of NCMs and to project
them in 3-D. Tentative definition of a motif: “ordered”
chain of NCMs.
iv. Recur within and across all RNAs
v. Are short (< 10 nts; most of 3 to 5 nts)
vi. Compose the classical motifs (cf. GRNA tetraloop;
sarcin/ricin motif, etc). There are exceptions (cf. AA
platform).
Lemieux & Major (2006) NAR 34:2340
Parisien, Thibault & Major (in prep.)
07.05 - Madison (ROC)
6
Aim
We want a computational model that can
encode the valid sequences and structural
features of RNA motifs.
Hypothesis: A relation between the sequence
and the structure of RNA motifs exists.
07.05 - Madison (ROC)
7
Graph Grammars
• A graph grammar is to a set of graphs what a formal generative
grammar is to a set of strings, i.e. a precise and formal description
of that set.
• A graph-grammar consists of a set of rules or productions for
transforming graphs.
• Formally, a graph-grammar, H = {N, , P}, consists of a set of
non-terminal symbols, N, a set of terminal symbols, , and a set of
production rules, P.
Hypothesis: NCMs are “independent” building blocks.
Nagl Computing 1976
Nagl In H. Ehrig et al., eds 1987
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
8
Sarcin/Ricin Graph Grammar
N = {C1, C2, … C5},
the set of NCMs:
 = {S1, S2, … S5}
the sets of sequences for
each NCM:
P is a set of consistent
assignment of the
sequences in  to the
NCMs in N (production
rules):
⇒
⇒
ARNt levure
23S H. marismortui
16S E. coli
⇒
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
9
Sarcin/Ricin Building Blocks
C1 :
Theoretical : 256 (16 x 16)
IMs : 120 (10 x 12)
PDB : 7
C2 :
Theoretical : 64 (16 x 4)
IMs : 40 (10 x 4)
PDB : 5
Theoretical : 16
IMs : 10
PDB : 15
U
A
A
A
U
G
A
A
C5 :
Theoretical : 64 (16 x 4)
IMs : 40 (10 x 4)
PDB : 8
U
A
C4 :
Theoretical : 256 (16 x 16)
IMs : 160 (16 x 10)
PDB : 3
A
U
C3 :
Theoretical : 64 (16 x 4)
IMs : 56 (14 x 4)
PDB : 2
G
A
U
G
A
G
A
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
10
( MC-Seq sarcin-ricin-graph ) -> [ sequence ]
Sequences supported by the NCMs in the PDB:
AGUA-GAA
AGUA-AAA
GGUA-GAA
GGUA-AAA
If we remove the instances of the sarcin/ricin motifs
( MC-Search ( sarcin-ricin-graph, [ PDB ] ) ) -> [ 3-D ]
Then, the same four sequences are supported
=> NCMs are found outside the sarcin/ricin context
Larose et al. (in prep.)
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
11
Graph Grammar Parsing
806 sequences aligned according to E. coli 23S rRNA structure; site 204-207 / 189-191.
Westhof (personal comm.)
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
12
Validation
(MC-Seq vs. PDB vs. Alignment)
Isostericity
matrices
MC-Seq
PDB
GGUA-AAA
AGUA-AAA
AGUA-GAA
GGUA-GAA
AAUA-AAA
AAUA-GAA
ACUA-AAA
ACUA-GAA
ACUA-GAC
AGUA-AAC
AGUA-CAA
AGUA-GAC
AGUA-GAU
AGUA-GCC
AGUA-GGG
AGUA-GUG
AGUC-GAA
AUUA-GAA
10 000 sequences
CGUA-GAA
GAUA-GAA
GGUA-GAU
GUUA-GAA
UGUA-GAA
UGUA-GAC
Alignement: 5S, 16S, 23S
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
13
Perspectives
• We want to develop a version of MC-Seq that would
be useful during the alignment process.
• PDB does not seem to contain enough structural
information yet.
• To avoid too many sequences, the NCMs (context)
are necessary.
• Two more things need to be considered…
07.05 - Madison (ROC)
14
Sarcin/Ricin
(Sequence/Structure Space Is Not Simple)
St-Onge et al. (in prep.)
07.05 - Madison (ROC)
15
Modeling In 3-D Might Be Necessary
MC-Fold
CAUU-AAG
(2.1Å)
Alignment
AUUA-GAA
(0.9Å)
St-Onge et al. NAR 2007
07.05 - Madison (ROC)
16
Acknowledgments
Martin Larose (Res. assistant)
Philippe Thibault (Res. assistant)
Patrick Gendron (Res. assistant)
Romain Rivière (Postdoc, CS)
Véronique Lisi (Ph.D. Molecular Biology)
Marc Parisien (Ph.D. Computer Science)
Emmanuelle Permal (Ph.D. Bioinformatics)
Karine St-Onge (Ph.D. Computer Science)
Louis-Philippe Lavoie (M.Sc. Bioinformatics)
Maxime Caron (M.Sc. Bioinformatics)
Caroline Louis-Jeune (M.Sc. Bioinformatics)
07.05 - Madison (ROC)
Montréal:
Pascal Chartrand
Gerardo Ferberye
Sylvie Hamel
Sébastien Lemieux
Pascale Legault
Luc Desgroseillers
Kathy Borden
Daniel Lamarre
Éric Westhof (Strasbourg)
Alain Denise (Paris)
Dave Mathews (Rochester)
17