iasl.iis.sinica.edu.tw

Download Report

Transcript iasl.iis.sinica.edu.tw

Jia-Ming Chang 0508
Graph Algorithms and Their Applications to Bioinformatics
1/38
Determine Protein Structure

X-ray
 波長約1 Å
長度接近原子間的距離
研究結晶的狀態的分子行為
定出其晶體結構,也包含蛋白質體結構
 X-ray與結構生物學
 利用X-ray繞射法分析高度純化結晶的蛋白質的每個
基團和原子的空間定位。

Nuclear magnetic resonance (NMR)
 NMR是涉及原子核吸收的過程。因為對某些原子核而
言,具有自旋和磁矩的性質。因此,若暴露於強磁場
中原子核會吸收電磁輻射,這是由磁場誘導而發生能
階分裂的結果。科學家並發現,分子環境會影響在磁
場中原子核的無線電波的吸收,利用這種特性來分析
分子的結構
AVANCE 800 AV IBMS, Sinica
2/38
NMR – Nuclear Spin (1/5)
3/38
NMR – Nuclear Spin (2/5)
4/38
NMR - Magnetic Field (3/5)
5/38
NMR – Resonance (4/5)
6/38
NMR – Chemical Shift (5/5)
7/38
Chemical Shift Assignment (1/2)
Find out Chemical Shift for Each Atom
• Backbone: Ca, Cb, C’, N, NH
HSQC, CBCANH, CBCACONH
Cd H3
One amino acid
Cg H2
Cb H2
N
Ca
H
H
CO
8/38
Chemical Shift Assignment (2/2)
18-23
ppm
55-60
CH3
17-23
O
H
H
H
CH3
O H-C-H
-N-C-C-N-C-C-N-C-C-N-C-CH-C-H H
O H-C-H H
H
30-35
Backbone
O
O
16-20
19-24
H
31-34
9
HSQC Spectra

HSQC peaks (1 chemical shifts for an amino acid)
H
N
Intensity
8.109
118.60
65920032
HSQC
10
CBCA(CO)NH Spectra

CBCA(CO)NH peaks (2 chemical shifts for one amino acid)
H
N
C
Intensity
8.116
118.25
16.37
79238811
8.109
118.60
36.52
65920032
11
CBCANH Spectra

CBCANH peaks (4 chemical shifts for one amino acid)
 Ca (+), Cb (-)
H
N
C
Intensity
8.116
118.25
16.37
79238811
8.109
118.60
36.52
-65920032
8.117
118.90
61.58
-51223894
8.119
117.25
57.42
109928374
-
-
+
+
12
# 409,# 414,# 415
# 427
# 428
*3
# 316,# 321
39
7,219
*6
# 390
*53
78
218
# 172,#
103,211
87,223
51,168
A Dataset Example
*9
# 364,# 365,# 366,# 367
*49
27,185
# 372,# 377,# 378,# 381,# 383 *1
*12
# 400
99,266,269
# 413
21,197
108,194
*2
# 314,# 322,# 323
70,159
# 305 *54
# 298
*38
147
# 343,# 356
# 342
160
*30
# 227,# 234,# 238,# 240
117,155
# 432
83,205
*4
# 347,# 350,# 359
69,161
H
*41
# 249
# 242,# 243
5
17
*34
# 389,# 406
# 401
63,212
*16
# 374,# 375,# 376
48,229
*5
# 271,# 278,# 284
89,188
*20
# 336,# 348,# 355
# 346,# 352
25,36
100
# 222
*50
# 188,# 198
93,222
HSQC
HNCACB
# 270,# 272,# 281
*10
# 301,# 302,# 310
47,230
*19
# 262,# 267
CBCA(CO)NH
102,263
N
*22
# 402,# 403,# 407
# 404
38,241
*8
# 332,# 349
13/38
A Perfect Spin System Group
CBCA(CO)NH
N
H
C
Intensity
113.293
7.897
56.294
1.64325e+008
i -1
113.293
7.897
27.853
1.08099e+008
i -1
CBCANH
N
H
C
Intensity
113.293
7.92
62.544
8.52851e+007
Ca
113.293
7.92
56.294
4.71331e+007
Ca
113.293
7.92
68.483
-8.54121e+007
Cb
113.293
7.92
28.165
-3.49346e+007
Cb
Cai-1
Cbi-1
Cai
Cbi
56.294
28.165
62.544
68.483
14
Coding

Translate the target protein sequence
and spin systems into coding sequences
based on the following table.
Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high
throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.
15/38
Backbone Assignment

Goal
 Assign chemical shifts to N, NH, Ca (and Cb)
along the protein backbone.

General approaches
 Generate spin systems
○ A spin system: an amino acid with known
chemical shifts on its N, NH, Ca (and Cb).
 Link spin systems
16/38
Ambiguities
All 4 point experiments are mixed
together
 All 2 point experiments are mixed
together
 Each spin system can be mapped to
several amino acids in the protein
sequence
 False positives, false negatives

17/38
Ambiguous Spin System
N
H
C
Intensity
106.9 8.87 54.92 423879
Two possible spin systems
106.9 8.87 40.35 524522
N
H
C
106.91 8.85 59.7
Intensity
235673
N
H
Cai-1
Cbi-1
C ai
C bi
106.1 8.85 54.93 40.31 59.7 30.5
106.1 8.85 61.5
40.31 59.7 30.5
106.92 8.86 54.93 346234
106.91 8.86 61.5
432432
106.91 8.85 40.31 -335759
106.92 8.86 30.5
-483759
18
Multiple Candidates
One spin system maybe assign to many places
of a protein sequence.
 Spin system(SS)
a
b
a
b

N
H
C
i-1
C
i-1
C
i
C
i
119.7 8.84 58.4 32.7 56.3 40.8

Protein Sequence:
AKFERQHMDSSTSRNLTKDR
Possible place
SS
SS
SS
SS
19
False Positives and False Negatives

False positives
 Noise with high intensity
 Produce fake spin systems

False negatives
 Peaks with low intensity
 Missing peaks

In real wet-lab data, nearly 50% are
noises (false positive).
20/38
# 409,# 414,# 415
# 427
# 428
*3
# 316,# 321
39
7,219
*6
# 390
*53
78
218
# 172,#
103,211
87,223
51,168
Spin System Group
*9
# 364,# 365,# 366,# 367
*49
27,185
*2
# 314,# 322,# 323
70,159
# 305 *54
# 298
*38
147
# 343,# 356
# 342
160
# 372,# 377,# 378,# 381,# 383 *1
*12
# 400
99,266,269
# 413
21,197
108,194
Perfect
*30
# 227,# 234,# 238,# 240
117,155
# 432
83,205
*4
# 347,# 350,# 359
69,161
H
*41
# 249
# 242,# 243
5
17
*34
# 389,# 406
# 401
63,212
# 222
*50
# 188,# 198
93,222
False Negative
*5
# 271,# 278,# 284
89,188
False Positive
*16
# 374,# 375,# 376
48,229
*20
# 336,# 348,# 355
# 346,# 352
25,36
100
HSQC
HNCACB
# 270,# 272,# 281
*10
# 301,# 302,# 310
47,230
*19
# 262,# 267
CBCA(CO)NH
102,263
N
*22
# 402,# 403,# 407
# 404
38,241
*8
# 332,# 349
21/38
Spin System Linking

Goal
 Link spin system as long as possible.

Constraints
 Each spin system is uniquely assigned to a
position of the target protein sequence.
 Two spin systems are linked only if the
chemical shift differences of their intra- and
inter- residues are less than the predefined
thresholds.
22/38
Previous Approaches

Constrained bipartite matching problem*
Legal matching
Illegal matching under constraints
 Can’t deal with ambiguous link
*Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained
bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.
23/38
Naatural Language Processing
─ Noises or Ambiguity ?

Speech recognition:Homopone selection
台 北 市 一 位 小 孩 走 失 了
台 北 市
小 孩
台 北
適 宜
走 失
事 宜
一 位
一 味
移 位
24/38
An Error-Tolerant Algorithm
25
Phrase, Sentence Combination
26
Spin System Positioning
 We
assign spin system groups to a protein
sequence according to their codes.
D 50
G 10
R 40
I 50|51
55.266 38.675 44.555 0
Spin System
55.266 38.675 44.555 0 => 50 10
44.417 0
55.043 30.04 =>10 40
44.417 0
30.665 28.72 =>10 40
55356 29.782 60.044 37.541 => 40 50
44.417 0
55.043 30.04
44.417 0
30.665 28.72
55356 29.782 60.044 37.541
27/38
Link Spin System groups
D
G
44.417 0
R
I
30.665 28.72
Segment 1
55.266 38.675 44.555 0
Segment 2
44.417 0
55.043 30.04
Segment 3
55356 29.782 60.044 37.541
28/38
Iterative Concatenation
DGRI….FKJJREKL
1
Step1
1
1
…
2
2
….
2
Spin Systems
56
56
47
Step2
…
Segment 1
Segment 31
Segment 2
….
Step n-1
Step n
Segment 78
…
Segment 79
Segment 99
29/38
Conflict Segments
DGRIGEIKGRKTLATPAVRRLAMENNIKLS
Segment 78
Segment 79
Segment 97
Segment 71
Segment 99
Segment 98
Two kinds of conflict segments
Overlap (e.g. segment 71, segment 99)
Use the same spin system (e.g. both segment 78 and
segment 79 contain spin system 1)
30/38
Independent Set
Subset S of vertices such that no two vertices in S are connected
www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt
31/38
Independent Set
Subset S of vertices such that no two vertices in S are connected
www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt
32/38
A Graph Model for Spin System Linking

G(V,E)
 V: a set of nodes (segments).
 E: (u, v), u, v  V, u and v are conflict.

Goal
 Assign as many non-conflict segments
as possible => find the maximum
independent set of G.
33
An Example of G
Seg1
SP13
Seg2
Segment1: SP12->SP13->SP14
Overlap
Segment3: SP8->SP15->SP21
Overlap
Segment2: SP9->SP13->SP20->SP4
Segment4: SP7->SP1->SP15->SP3
Seg4
 Seq.
SP15
Seg3
: GEIKGRKTLATPAVRRLAMENNIKLSE
Seg1
Seg3
Seg4
Seg2
34/38
Segment weight
The larger length of segment is, the
higher weight of segment is.
 The less frequency of segment is, the
lower of segment is.

35/38
Find Maximum Weight Independent Set of G (1/2)
V
N(v)
Head_N(v)

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent
Sets by Excluding Subgraphs. BIR, 1992. 32(2).
36
Find Maximum Weight Independent Set of G (2/2)
V

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent
Sets by Excluding Subgraphs. BIR, 1992. 32(2).
37
An Iterative Approach


We perform spin system generation
and linking iteratively.
Three stages.
38/38