Faster, More Sensitive Peptide ID by Sequence DB Compression

Download Report

Transcript Faster, More Sensitive Peptide ID by Sequence DB Compression

Proteomics and
Glycoproteomics
(Bio-)Informatics
of Protein Isoforms
Nathan Edwards
Department of Biochemistry and
Molecular & Cellular Biology
Georgetown University Medical Center
Outline

Tandem mass-spectrometry of peptides

Detection of alternative splicing protein
isoforms

Phyloproteomics using top-down mass-spec.

Characterization of glycoprotein
microheterogeneity by mass-spectrometry
2
Mass Spectrometer
Sample
+
_
Ionizer
• MALDI
• Electro-Spray
Ionization (ESI)
Mass Analyzer
• Time-Of-Flight (TOF)
• Quadrapole
• Ion-Trap
Detector
• Electron
Multiplier
(EM)
3
Mass Spectrum
4
Mass is fundamental
5
Sample Preparation for MS/MS
Enzymatic Digest
and
Fractionation
6
Single Stage MS
MS
7
Tandem Mass Spectrometry
(MS/MS)
Precursor selection
8
Tandem Mass Spectrometry
(MS/MS)
Precursor selection +
collision induced dissociation
(CID)
MS/MS
9
Why Tandem Mass
Spectrometry?

MS/MS spectra provide evidence for the
amino-acid sequence of functional proteins.

Key concepts:



Spectrum acquisition is unbiased
Direct observation of amino-acid sequence
Sensitive to small sequence variations
10
Unannotated Splice Isoform

Human Jurkat leukemia cell-line



LIME1 gene:


LCK interacting transmembrane adaptor 1
LCK gene:




Lipid-raft extraction protocol, targeting T cells
von Haller, et al. MCP 2003.
Leukocyte-specific protein tyrosine kinase
Proto-oncogene
Chromosomal aberration involving LCK in leukemias.
Multiple significant peptide identifications
11
Unannotated Splice Isoform
12
Unannotated Splice Isoform
13
Translation start-site correction

Halobacterium sp. NRC-1



GdhA1 gene:


Extreme halophilic Archaeon, insoluble membrane
and soluble cytoplasmic proteins
Goo, et al. MCP 2003.
Glutamate dehydrogenase A1
Multiple significant peptide identifications

Observed start is consistent with Glimmer 3.0
prediction(s)
17
Halobacterium sp. NRC-1
ORF: GdhA1


K-score E-value vs PepArML @ 10% FDR
Many peptides inconsistent with annotated
translation start site of NP_279651
0
40
80 120 160 200 240 280 320 360 400 440
18
What if there is no "smoking
gun" peptide…
20
What if there is no "smoking
gun" peptide…
21
What if there is no "smoking
gun" peptide…
22
HER2/Neu Mouse Model of
Breast Cancer


Paulovich, et al. JPR, 2007
Study of normal and tumor mammary tissue
by LC-MS/MS


Peptide-spectrum assignments



Normal samples (Nn): 161,286 (49.7%)
Tumor samples (Nt): 163,068 (50.3%)
4270 proteins identified in total

23
1.4 million MS/MS spectra
2-unique generalized protein parsimony
Nascent polypeptide-associated
complex subunit alpha
7.3 x 10-8
24
Pyruvate kinase isozymes M1/M2
2.5 x 10-5
25
Phyloproteomics

Fragment intact proteins (top-down MS)

Match the spectra to protein sequences

Place the organism phylogenetically

Works even for unknown microorganisms
without any available sequences
26
[195.00-2000.00]
MS yr_inclusion
60
40
20
CID Protein Fragmentation
Spectrum from Y. rohdei
21.03 21.46
0
19.5
20.0
20.5
21.0
21.5
22.0
22.5
Time (min)
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4
F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
576.83
z=2
100
23.0
23.5
24.0
24.5
25.0
756.70 +8 MW 6044.11
90
80
70
584.57
z=4 720.39
z=2
60
50
785.41
z=4
40
694.62
z=4
30
20
10
840.16
z=7
200.78 329.71
z=?
z=?
903.81
z=3
928.49
z=4
461.16 559.55
z=4
z=?
992.53
z=3
555.29
z=4
0
200
400
600
800
1118.93
z=?
1000
1253.14
z=? 1345.30
z=?
1200
1400
1804.48
z=?
1491.23 1610.27 1666.89
1883.75
z=?
z=?
z=?
z=?
1600
1800
2000
m/z
27
60
CID Protein Fragmentation
Spectrum from Y. rohdei
[email protected]
[195.00-2000.00]
MS yr_inclusion
40
20
21.03 21.46
0
19.5
20.0
20.5
21.0
21.5
22.0
22.5
Time (min)
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4
F: FTMS + p ESI d Full ms2 [email protected] [195.00-2000.00]
576.83
z=2
100
23.0
23.5
24.0
24.5
25.0
756.70 +8 MW 6044.11
90
Match to Y. pestis 50S
Ribosomal Protein L32
80
70
584.57
z=4 720.39
z=2
60
50
785.41
z=4
40
694.62
z=4
30
20
10
840.16
z=7
200.78 329.71
z=?
z=?
903.81
z=3
928.49
z=4
461.16 559.55
z=4
z=?
992.53
z=3
555.29
z=4
0
200
400
600
800
1118.93
z=?
1000
1253.14
z=? 1345.30
z=?
1200
1400
1804.48
z=?
1491.23 1610.27 1666.89
1883.75
z=?
z=?
z=?
z=?
1600
1800
2000
m/z
28
Exact match sequence…
29
Phylogeny: Protein vs DNA
Protein Sequence
16S-rRNA Sequence
30
What about mixtures?
31
Identified E. herbicola proteins

DNA-binding protein HU-alpha


m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Eight proteins identified with "large" |Δ|
34
Identified E. herbicola proteins

DNA-binding protein HU-alpha


m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Extract N- and C-terminus sequence supported
by at least 3 b- or y-ions
36
E. herbicola protein sequences
37
Phylogenetic placement of
E. herbicola
Phylogram
Cladogram
phylogeny.fr – "One-Click"
39
Glycoprotein
Microheterogeneity

Glycosylation is important, but our analytic
tools are rather rudimentary





Detach glycans (PNGase-F) and analyze glycans
Detach glycans (PNGase-F) and analyze peptides
Get glycan structures, but no association with
protein or protein site, or
Get glycosylation sites, but no association with
glycan structures.
We analyze glycopeptides directly…

Challenges all facets of glycoproteomics
40
Altered N-Glycosylation in Cancer
Glycosyltransferase Expression or Glycan Analyses
GalNAc
Sialic Acid
Gal
GlcNAc
Man
Fut-VI
(α1-3 Fuc)
Higai,2008
Fut-VIII
NH3+
(α1-6 Fuc)
Comunale,
2010
N
X
S/T
ST-VI Gal1
(α 2-6 NeuAc)
Hedlund, 2008
GnT-V
(β1-6 GlcNAc)
Wang, 2007
COOK. Chandler
The informatics challenge

Identify glycopeptides in large-scale tandem
mass-spectrometry datasets



Good, but not great, instrumentation


Many glycopeptide enriched fractions
Many tandem mass-spectra / fraction
QStar Elite – CID, good MS1/MS2 resolution
Strive for hypothesis-generating analysis


Site-specific glycopeptide characterization
Glycoform occupancy in differentiated samples
42
CID Glycopeptide Spectrum
43
Observations

Oxonium ions (204, 366) help distinguish
glycopeptides from peptides…


Few peptide b/y-ions to identify peptides…


…but do little to identify the glycopeptide
…but intact peptide fragments are common
If the peptide can be guessed, then…

…the glycan's mass can be determined
44
Haptoglobin (HPT_HUMAN)
Haptoglobin Standard
MVSHHNLTTGATLINE
NLFLNHSE*NATAK
VVLHPNYSQVDIGLIK
•
*
N-glycosylation motif (NX/ST)
Site of GluC cleavage
45
Pompach et al. Journal of Proteome Research 11.3 (2012): 1728–1740.
Tuning the filters…

We estimate the number of false-positives…
…so that the user can tune the search parameters
47
Application of Exoglycosidases
to locate Fucose

At ITIH4 site N517
LPTQNITFQTE
LPTQNITFQTE
LPTQNITFQTE
LPTQNITFQTE
48
K. Chandler
NVVFVIDK ITIH4 Glycopeptide
49
K. Chandler
Similar Glycopeptides Spectra
( mass Δ ~ +162 Da)
?
+162 Da
MVSHHNLTTGATLINE
50
Fragmented Glycopeptides
( mass Δ ~ +162 Da)
?
MVSHHNLTTGATLINE
+162 Da
MVSHHNLTTGATLINE
51
Propagating Annotations
•MVS+A2G2
•MVS+A1G1
•MVS+A2G2
•VVL+A1G1
•MVS+A2G2
•VVL+A2G2
•MVS+A1G1
52
G. Berry
Summary
Mass-spectrometry coupled with protein
chemistry and good informatics can look
beyond the obvious to the unexpected...
…and there is plenty to find!
53
Acknowledgements

Edwards lab




NSF Graduate
Fellowship (Chandler)
Funding: NCI
Fenselau lab (UMD)



Kevin Chandler
Gwenn Berry

Colin Wynne
Avantika Dhabaria
Goldman lab (GU)


Kevin Chandler
Petr Pompach
54