Document 7675141

Download Report

Transcript Document 7675141

MS/MS Libraries of Identified Peptides
and Recurring Spectra in Protein
Digests
Lisa Kilpatrick, Jeri Roth, Paul Rudnick,
Xiaoyu Yang, Steve Stein
Mass Spectrometry
Data Center
Library searching in not new
Organize for Reuse
MS Library Searching
• Hertz, Hites and Biemann Anal. Chem. (1971).
• PBM: McLafferty, Hertel, Villwock Org. Mass
Spectrom. (1974).
• SISCOM: Damen, Henneberg, Weimann, Anal. Chem.
Acta (1978).
• INCOS: Sokolow, Karnofsky, Gustafson , Finnigan
Application Report 2 (March 1978).
• Stein, Scott J. Amer. Soc. Mass Spectrom., (1994).
‘Dot Product’
(cosine of ‘angle’ between a pair of spectra)
Sum over all
peaks in common
 MR
M R
Normalize
• Measured = f(m/z abundance)
• Reference = f(m/z abundance)
• f(abundance) : Weight as you like
Traditional GC/MS Library Search
Variability Depends on S/N
Medians
~7,000
Radiodurans
Peptides,
LCQ
(PNNL/NCRR)
Library Searching for Peptides
• LIBQUEST (Yates)
– Yates et al, Anal. Chem., 1998, 70, 3557
• X!Hunter (Beavis)
– Craig et al, J. Proteome Res., 2006, 5, 1843
• BiblioSpec (MacCoss)
– Frewen et al., Anal. Chem. 2006, 78, 5678
• Spectral Comparison (Kearney)
– Liu et al, Proteome Science 2007, 5:3
• SpectraST (Aebersold)
– Lam et al., Proteomics 2007 6, 655-667
• NIST Peptide Ion Fragmentation Library
– June 2006 release (US-HUPO – March 2004)
Why Spectrum Libraries?
•
•
•
•
•
More sensitive
Better scoring
Faster
Annotation
Unrestricted precursor ion
Identification by Spectrum Matching is More
Sensitive than by Spectrum/Sequence Matching
Fraction of MS/MS Spectra Identified vs S/N
Fraction IDed
1
0.1
0.01
All Peptides
HSA Peptides
HSA-OMSSA
0.001
1
10
100
S/N
Simple Protein Mix
1000
10000
Spectrum/Spectrum Scores are More Robust than
Sequence/Spectrum Scores
99% Confidence
Sequence score
Matching Spectra is Faster than
Matching Sequence
0.005/s vs. 6.2/s per query spectrum
Reference Library Building
• Extract identified spectra from sequence search
– Multiple search engines
– Instrument-class specific
• Create ‘consensus’ spectra
– Two or more matching spectra, also save best
• Assign probability of being correct
– Refine confidence starting from decoy FDR
– Classify peptides – tryptic, missed cleavage, semi, mods
• Create searchable spectral library
– Resolve conflicts, add annotation
Three Classes of Libraries
I. Conventional Target Identification
– Peptides (Proteins)
II. Identifiable
– By unconventional searching
III. Not Identifiable
– Account for all recurring spectra
– QA/QC
I. OMSSA overlap with MS/MS Library Search
747
1350
353
318
1752
833
34K
6/06
Identified spectra (1% FDR) for 1-D Yeast
NCI/CPTAC – Vanderbilt
78K
6/07
Identified Spectra: Yeast - 1 D
Tryptic
Tryptic missed cleavage
Tryptic bad miss
Semitryptic
II. Identify What we Can
Derive Class-specific FDR
• Tryptic
– Simple
– Expected missed cleavages
– Unexpected missed cleavages
• Semitryptic (cleaved tryptic)
– No missed cleavage
• In source (with parent at same retention)
• In sample
– Missed cleavage
• In source (with parent)
• In sample (obey rules)
• Uncommon – reject
• Others …
Atypical Peptide Ions
use Sequence Search Method
• Tryptic only with many mods
• Less common: Methylation, Phosphorylation, …
• Artifacts: Na, K, Carbamyl
• InsPecT/Pevzner (Unidentified, +70)
• High charge states, >2 missed cleavages
• Use class specific score thresholds
HSA/Fibrinogen/Transferrin Mix
6124 Consensus Peptide Spectra, IT, Qtof, TofTof
Ion Trap Peptide Ions: 1300 HSA, 1100 Fibrinogen, 700 Transferrin
Identified Peptide Spectra - Simple Protein Mix
Missed
Bad miss
Unknown mod
Insource
Simple
'Insample'
Missed
Bad miss
contiguous = tryptic, exploded = semitryptic
III. Library of
Recurring, Unidentified Spectra
• Create consensus spectra
– From similar spectra from an experiment
• Combine from multiple experiments
• Identify spectra in other experiments
– QA/QC: Artifacts, in standards, …
– Apply other sequencing methods
Assign all Spectra
• Identified Spectrum
– Matches library peptide or unidentified spectrum
– Subset of peaks match library spectrum (impure)
– Similar to a matched spectrum (cluster)
• Not a Peptide
– Low S/N
• Maximum/Median <15
– High charge state (many large peaks)
• Proteins, large fragments, …
– One dominant peak
• Stable ion, not peptide
– Singly charged (high/low abund < 1.2)
• Probable artifact, lower probability of identification
– Narrow m/z range
• Peptide?
Spectrum Classification - Yeast - 1D
Low S/N
NoID Lib/Impure
NoID Lib
Other
1+ No ID
Peptide?
Peptide/Impure
Peptide
exploded = identified, contiguous = unidentified
Spectrum Classification - Simple Protein Mix
Narrow
Complex
Dominant Peak
Low S/N
NoID Lib/Cluster
NoID Lib/Impure
1+ NoID
NoID lib
Pep/Cluster
Pep/Impure
Peptide?
Peptide
exploded = identified, contiguous = unidentified
Library Pipeline of the Future
assigned
assigned
Sequence Search,
De Novo,
Theoretical Spec,
Similarity, ...
No ID
Pep.
Lib
No ID
Unass.
Lib
No ID
No ID
Garbage filter
Mass
spectrometer
unassigned
NCI/NIH - CPTAC:
Clinical Proteomic Technology Assessment
for Cancer
http://proteomics.cancer.gov
Technology assessment; develop standard protocols and clinical
reference sets; and evaluate methods to ensure data reproducibility.
Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University,
University of California, San Francisco,, and Vanderbilt University School of Medicine.
NCI grants (U24CA126476-01, U24CA126485-01, U24CA126480-01, U24CA126477-01, and
U24CA126479-01).
Run-to-Run Chromatographic Reproducibility
RT:
9.99 - 70.13
NL: 5.53E6
TIC F: ITMS + c
Full m s
[300.00-2000.0
MS
NCI_s tudy2_02
_s am ple1B33_
1
105
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
RT: 10.01 - 70.06
NL: 6.73E6
5
0
105
10
15
20
25
30
35
100
40
Tim e (m in)
45
50
55
60
65
TIC F: ITMS + c ESI
Full m s
70
[300.00-2000.00]
M
NCI_s tudy2_02160
_s am ple1B228_via
03
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
15
20
25
30
35
40
Tim e (m in)
45
50
55
60
65
70
Lab-to-Lab Chromatography
CPTAC_STUDY2_WEEK1_1B144_01
HPLC: CPTAC - Dilute 150x - Inj 2 ul
2/27/2007 11:31:04 AM
IN_LTQm_041907_1B274_02
75.15
493.21
100
TIC F: FTMS + p
NSI Full ms
[300.00-2000.00]
MS
CPTAC_STUDY2_
WEEK1_1B144_01
50
56.53
409.54
40
41.54
749.79
30
9.06
3.95
12.11
363.79 401.11 401.11
10
63.95
575.31
66.44
500.81
33.69
322.18
25.07
337.68
80.95
528.38
70
20
30
40
50
60
70
50
1.17
444.98
40
10.71
493.27
85.52 88.72
426.73 445.12
46.11
547.69
35.05
660.35
3.98
538.07
16.74
722.60
10
21.81
516.71
80
70
52.60
749.65
31.90
501.74
3.43
313.02
371.10
10
20
23.46
358.85
517.78
15.17
588.33
540.29
10 0
400
0
0
10
40.65
409.54
10
20
30
40
50
Vandy
Orbitrap
53.26
507.30
68.38
671.82
56.75
749.38
1000
40
RT : 34.26
70.47
682.70
1200
50
60
m/z
95.221679.17
90.34
1497.98
1612.38
83.06
313.02 313.02 313.02
1400
1600
95.80
835.36
90
100
8080000
80
90
Base Peak m/z=
400.00-2000.00 F:
ITMS + c NSI Full ms
[300.00-2000.00] MS
0703141B289
70000
70
41.64
547.76
60000
60
50000
24.18
516.87
50
40000
4030000
23.07
722.47
472.81 535.49
3020000
100
352.85
1.01
445.03
0
864.14
751.54
408.10
15.13
400
1451.05
42.88
501.15
25.84
718.06
47.81
481.70
635.43
451.36
18.27
0
10
2000
0
70
NL: 5.80E5
Purdue
LTQ
33.33
410.00
2010000
600
874.75
1069.11
1451.95 1553.29
1725.84
53.04
58.161171.71 1280.62
70.94 74.13
95.61
84.48
498.08 426.98 65.81
1200
1400
1600 432.17
1800
615.41 419.12
419.06
419.13
800
1897.56
1000
2000
m/z
10
20
30
40
50
60
70
80
90
100
Time (min)
liebler_Vanderbilt_1B121_100
5/11/2007 10:26:15 AM
2/24/2007 11:17:00 PM
No0.00
scan(s)
match the scan filter.
RT:
- 100.00
RT: 0.00 - 100.00
67.05
493.21
100
NL:
1.08E8
NYU
Orbitrap
90
80
70
60
42.27
722.32 45.38
395.24
41.92
50
40
30
492.75
24.32 30.64
371.10 371.10
33.29
749.79
58.53
575.31
Relative Abundance
TIC F: FTMS + p
NSI Full ms
[300.00-2000.00]
MS
20070511_CPTA
C_1B100
78.46
671.82
55.91
500.75
71.35
829.38
79.61
84.49 90.86
454.69
319.11 673.36
20
30
40
50
60
70
80
90
47.00
647.53
100
NL: 6.09E6
90
Vandy
LTQ
Base Peak m/z=
400.00-2000.00 F: ITMS
+ c NSI Full ms
[300.00-2000.00] MS
liebler_Vanderbilt_1B121
_100
69.41
672.16
51.41
547.79
80
70
60
42.50
409.84
50
71.26
682.89
40
30
20
5.02
401.16
10
0
10
80
NL: 1.05E5
Time (min)
9.45
14.78
371.10 371.10
70
39.43
575.73
9090000
1925.52
1800
P: +
100
100000
TIC F: FTMS + p NSI Full ms
[300.00-2000.00] MS
nw_022207o_liebler_study2_
Vanderbilt_Orib2_week1_1B0
35_070224060826
49.31
547.32
800
30
IN_LT Qm_041907_1B274_02 #3160
F: IT MS + c ESI Full ms
318.42
No scan(s) match the scan filter.
2.79
319.10
60
T ime (min)
3/15/2007 12:58:07
PM
RT: 0.00 - 100.00
NL: 4.30E7
692.30 741.20 869.43
20070511_CPTAC_1B100
19.28
749.95
29.63
441.01
23.82
538.08
58.37
749.72
31.29
33.23
722.72 493.03
65.71
484.14
83.82
673.69
75.32
544.38
94.31
99.02
461.97 406.45
0
100
0
Time (min)
10
20
NCI_study2_021607_sample1B228_vial_03
30
40
50
2/16/2007 8:45:21 PM
60
Time (min)
70
sample 1B
80
90
100
RT: 0.00 - 100.00
No scan(s) match the scan filter.
No scan(s) match the scan filter.
100
32.21
492.49
NL: 2.51E6
Base Peak m/z=
400.00-2000.00 F: ITMS
+ c ESI Full ms MS
NCI_study2_021607_sa
mple1B228_vial_03
NIST
LTQ
90
40.30
647.47
80
YICENQDSISSK
Relative Abundance
Relative Abundance
85.69
1133.34
87.16
840.73
0703141B289
45.28
647.29
40.14
660.06
600
20
0
100
Time (min)
31.61
29.16 387.45
722.32
50
40
90
Relative Abundance
60
50
0
77.97
673.64
2/24/2007 6:08:26 AM
13.35
421.06
70
60
10
76.11
992.86
62.31
556.96
32.59
409.85
30
20
80
CPTAC_STUDY2_WEEK1_1B144_01 #1745 RT: 16.39 P: + NL: 6.89E6
RT:
0.00 - 100.00
F: FTMS
+ p NSI Full ms [300.00-2000.00]
12.65
345.52
100
390.14
100
90
11.79
90
80
401.11
Relative
Abundance
Relative
Abundance
63.29
672.12
60
Intensity
10
nw _022207o_liebler_study2_Vanderbilt_...
20
Base Peak m/z=
400.00-2000.00 F:
IT MS + c ESI Full
ms MS
IN_LT Qm_041907_1
B274_02
0
0
30
20
INCAPS
LTQ
80
0
40
30
NL: 5.50E6
68.00
682.96
90
Relative Abundance
Relative Abundance
46.50
41.98 516.27 48.07
543.25
569.75
60
100
Broad
Orbitrap
80
70
39.78
647.55
NL: 6.63E8
90
20
4/20/2007 11:21:41 PM
RT : 0.00 - 100.00
RT: 0.00 - 100.00
70
60
27.35
22.63 722.50
537.94
50
40
2.69
536.29
30
20
3.66
444.42
10
35.90
660.21
19.46
749.98
47.13
547.69
63.77 66.20
671.96 682.88
49.18
500.93
56.48
829.58
73.99
992.73
7.98
508.17
78.50
86.00
674.14 1133.70
0
0
10
20
30
40
50
Time (min)
NCI_study2_021607_sample1B228_vial_03 #3205 RT: 34.26 P: + NL: 5.70E4
F: ITMS + c ESI Full ms
566.86
60
70
80
90
96.96
435.96
100
HSA_CAM_SigmaA9511_5H_8MS2_m2_10de_040406_05
Measures of Reproducibility
• Identified ions
– Unique peptides, Ions, Spectrum counts
• Unidentified components
– Classify by type, link to origin
• Ion cluster analysis
– MS1 linked to MS2
• Chromatography
– Time evolution of ion clusters
Ion Component Analysis
Ion Component Analysis (Yeast)
Components
All MS2
Sampled
Peptides
1000
Undersampling
Counts
Oversampling
100
10
1E-3
0.01
0.1
Relative Component Intensity
Components in Replicate Runs
total
▲▼ run 1,2
■ in both
Number of Components
1000
sampled
100
identified
10
1E-4
1E-3
0.01
Component Intensity
0.1