De novo glycans De novo glycan structure search with CID

Download Report

Transcript De novo glycans De novo glycan structure search with CID

De novo glycan structure
search with CID MS/MS
spectra of native
N-glycopeptides
18.12.2008
Hannu Peltoniemi
[email protected]
De novo vs database matching
Database matching
Unknown
glycan
glycan
database
MS2
spectrum
Best
scoring
glycan(s)
in the DB
matching
• Only those structures that are in the DB can be found
• OK if comprehensive DB
• If glycan not in the DB the result may be closest matching
(wrong) structure or no result at all
De novo
Unknown
glycan
MS2
spectrum
On the fly
structure
generation
and matching
Best scoring glycans
• No database -> also new structures can be found !
• Computational intensive, requires high quality spectra
• Typically no definite answer, but a set of high scoring
structures.
De novo structure search
Part of the N-glycopeptide workflow:
Joenväärä et al., N-Glycoproteomics
- An automated workflow approach.,
Glycobiology 2008,18(4):339-349.
Input: Protonated, deconvoluted MS2 spectra
Steps:
1) identification of peptides
2) identification of N-glycan compositions
3) identification of de novo N-glycan structures
(branching, no linkage)
Input data
Spectrum with annotated glycopeptide and glycan
composition fragments.
Example data
Peptide: QDQCIYNTTYLNVQR
Glycan composition: 6 Hex 5 HexNac 3 NeuAc
Same data, different view:
composition: 6 Hex 5 HexNac 3 NeuAc
6
Hex
0
NeuAc=0
NeuAc=1
O
OO
O
OOO
OO
O
OOO
O
6
NeuAc=2
O
OO
OO
O
O
OO
O
O
Hex
0
O
OO
O
5
HexNAc
O
Glycan fragments
attached to peptide
O
O
OO
O
O
0
NeuAc=3
O
0
5
HexNAc
Free glycans
O
0
5
HexNAc
0
5
HexNAc
The puzzle
O
OO
O
OOO
OO
O
OOO
?
• All the measured fragment compositions of a
unknown structure with the given total composition
are known
• Some theoretical fragments may be missing
• Some measured fragments may be false
What is the structure that explains
best the data?
Solution
The problem is split to two phases
1) Generation of possible structures:
Structures are grown starting from N-glycan core.
The population size is limited by removing
structures with lowest fit with peptide+glycan
fragments
2) Scoring:
The set of structures are scored with full data.
The final glycopeptide score is set to sum of
peptide and glycan structure scores.
Initialization
Example data: peptide + 5 Hex 4 HexNAc
measured
theoretical
The missfit (cost) between theoretical structure and measured
data is defined as the number of not matching theoretical and
measured fragments.
Start (core)
add unit
add unit
add unit
add unit
End (final
composition)
Growing structures
If population
grows too large
structures with
highest cost
are removed.
Scoring
highest
scoring
lowest
scoring
...
Score is calculated as –log10(P), where P is the probability
(binomial) that a random set of fragments would match as
well or better as the ranked structure.
The final glycopeptide score is sum of peptide and
structure scores.
Assumptions
• All glycosidig bonds can be broken
• Unlimited number of cuts
Options
• Monosaccharide names
• Number of possible connections with each
monosaccharide
• Accepted connections between
monosaccharides
• Start structures (N-glycan cores)
• Max population size when growing structures
Testing with in silico generated
data
structure
theoretical spectrum
fragmentation
Hex
NeuAc=0
NeuAc=1
NeuAc=2
NeuAc=3
x
x x
x x x
x x x x
x x x
x
x x x
x
x x
x x x
x x
x
x x
x
x
x x x
x x x x
x x x x x
x x x x
x x
x x
x
x x x
x x x x
x x x
x
x x x
x
x x
x x
x x
x
Hex
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
HexNAc
randomly removing
and adding noise
fragments
x
x
x
x
HexNAc
HexNAc
peptide+glycan
glycan
HexNAc
randomized spectrum
x x
x
x
x x
x x
x
x x
x
x x x
x
x x x
x
x
x x
x
x
x x
x
x x
x
x x
x
x
x
x x
x x x x x
x
x x
x
x
x
x
x
x
x
x x
x
x
input to the de novo
algoritm
Results of the in silico tests
Correct structure with rank 3
40
60
80
Each mark is a
result of a 100
runs.
no noise
2 noise fragments
4 noise fragments
0
20
Results matching the criteria (%)
80
60
40
0
Percentage
of runs (%)
20
Results matching the criteria (%)
100
100
Correct structure with rank 1
20
(20,40)
30
40
(40,60)
50
60
(60,80)
70
Removed reducing end fragments (%)
80
(80,100)
Removed reducing, non reducing
end fragments (%)
20
(20,40)
30
40
(40,60)
50
60
(60,80)
70
80
(80,100)
Removed reducing end fragments (%)
Removed reducing, non reducing
end fragments (%)
If about ½ of the theoretical fragments present
=> The correct structure is among the few
highest scoring ones.
Testing with serum sample
• Very complex wet lab data set, i.e. a human
serum specimen
• Removal the high abundance proteins prior to
LC-MS/MS
• 80 spectra with identified peptide and
glycan compositions
• 62 spectra with putative structures
• Mostly typical structures
• Mostly small structures, large ones seems to
be hard to catch
Example
serum
spectrum
Serum, m/z=1194.93, z=4
Three best scoring structures.
Glycan is attached to peptide
QDQCIYNTTYLNVQR
(Alpha-1-acid glycoprotein 1).
Score 73.2
72.8
72.6
Measured and theoretical fragments for the best scoring structure.
NeuAc=0
6
Hex
0
6
Hex
0
NeuAc=1
x
O
x O
x
O
x O
x x
x O
x O
x x
O
x O
x x
O
x
O
x
x
x
OOO
x
x x
x O
x
x O
x
x
O
0
x
O
x
x
O
x
O
x
x
x
x
x
x
x
x
x
x
x
x
HexNAc
x
x
x
x
5
O
x
x x
x x x
x O
x x
x x
O
x O
x
O
x
O
0
NeuAc=2
x
O
x O
x
O
x O
x x
O
x x
O
x x
x x
x x
x
HexNAc
5
0
NeuAc=3
x
O
x O
x
O
x
O
x
O
x x x
x O
x x x
x x x
x
O
x x x
HexNAc
5
Reducing end fragments
(attached to peptide).
Non reducing end fragments
(free glycans).
0
HexNAc
5
X: theoretical
O: measured
Structures found from the serum sample
ANT3(224,187), FIBG(78), THRB(121), A1AG1(56), FETUA(156), HPT(241), HRG(344), FIBB(394),
TRFE(630), IGHA1(144), A1AT(70,107,271), { VINEX(102), HPTR(126) }
FIBG(78), HRG(344), IGHA1(144)
VTNC(169)
IGHG1(180), IGHG2(176)
IGHA1(144)
A1AG1(93)
IGHG2(176)
IGHA1(144)
CO2(621), CO3(85)
IGHG2(176)
IGHA1(144)
CO3(85)
Conclusions
• De novo glycan structure identification of
intact glycopeptides is possible
• High quality spectra is necessary
• Typically no definite answer but a few
structures matching equally well => biological
insight still needed if one identified structure
needs to be picked