Transcript Slide 1

Protein Sequencing Research Group (PSRG): Results of the PSRG 2011 Study: SensitivityAssessment of Edman and Mass Spectrometric Terminal Sequencing
of an Undisclosed Protein
H.A. Remmer1, J.S.Smith2, W.Sandoval3, B.Xiang4, K.Mawuenyega5, D. Suckau6, V. Katta3, J.J. Walters7, P.Hunziker8
of Michigan, Ann Arbor, MI, United States, 2University of Texas Medical Branch, Galveston, TX, United States, 3Genentech, Inc., South San Francisco, CA, United States, 4Monsanto Company, St. Louis, MO,
United States, 5Washington University School of Medicine, St. Louis, MO, United States, 6Bruker Daltonics, Bremen, Germany, 7Sigma-Aldrich, St. Louis, MO, United States, 8University of Zurich, Zurich, Switzerland
1University
INTRODUCTION
Establishing the N-terminal sequence of intact proteins plays a critical role in biochemistry and drug
development. Edman degradation and top-down and bottom-up mass spectrometry methods for Nterminal sequence analysis have been used for that task. In this study, we proposed to determine the
ability of these sequencing techniques to deal with various sample formats and to assay sensitivity.
For the 2011 study, the PSRG distributed three kinds of sample sets (designated A, B or C) of 3 tubes
each. Each tube contained the same artificial recombinant (unknown) protein in varying amounts
and formats (see table below). Participants chose which of three sample sets - or any combination of
sets - they would like to receive.
Participants obtained the following
information: (a) protein MW is ~52 kDa,
(b) the sequence is NOT in a public
database,(c) tubes 1 with lowest
sample amount contains ~ 5 pmol protein in the selected format (d) potential presence of a copurified E. coli protein at <20 kDa in Sample Set A is known, but of no interest to current study and(e)
Sample Set A are soluble in 0.1% TFA, 0.1 % TFA/20 % acetonitrile or 25 mM AMBIC. Study participants
were directed to a website to anonymously upload sequences and supporting data. The analysis of
the results of the 2011 study focuses on the length and accuracy of the sequence calls depending on
increasing amounts of protein. A total of 38 participants requested 74 sample sets.
Study Results: Edman Sequencing
Sample A1
Participant 004
REFERENCE: T. Kishimoto, J. Kondo, T. Takako-Igarashi and H. Tanaka. A novel method for
analyzing protein terminals. Poster presented at the ASMS conference, Salt Lake City, 2010.
ACKNOWLEDGEMENTS
Dr. Robert English (University of Texas Medical Branch) for accumulation & annonimization of
data; Sigma-Aldrich for donation of the study sample; the Executive Board of the ABRF for
support and scrutiny of the study proposal, Dr. Jack Simpson (National Cancer Institute,
Frederick, MD) for functioning as liaison to the ABRF Executive Board, and participating labs for
analyzing sample and returning data.
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
E
E
P
Q
N
L
I
R
V
F
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
P
L
V
E
X
P
Q
N
L
I
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
33
34
35
36
37
38
39
40
41
42
0.1%TFA/30%IPA
2.8
93.60%
G
A
L
X
V
F
D
E
F
K
Procise 494
0.1%TFA/20% ACCN
0.7
95.60%
G
A
L
R
V
F
D
E
F
K
Procise 494HT
0.1% TFA
na
na
note 1
PSRG002
Procise 494HT
0.1% TFA/50% ACCN
3.7
91.30%
G
R
Solvent
Initial yield
Rep. Yield
A
L
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Participant 020
Procise 494HT
na
na
na
X
A
L
X
V
F
D
E
F
K
Participant 024
Procise 494
0.1%TFA/20% ACCN
1.2
86.10%
G
A
L
R
V
F
D
E
F
K
Participant 058
Procise 494HT
0.1% TFA
PSRG002
Procise 494HT
R
Sample A2
Instrument
na
note 1
0.1% TFA/50% ACCN
9.4
94.80%
G
A
L
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
Sample A3
Instrument
Solvent
Initial yield
Rep. Yield
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Participant 024
Procise 494
0.1%TFA/20% ACCN
7.8
98.60%
G
A
L
R
V
F
D
E
F
K
Participant 058
Procise 494HT
0.1% TFA
na
na
note 1
PSRG002
Procise 494HT
na
R
G
42
0.1% TFA/50% ACCN
29.3
95.60%
A
L
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
E
E
P
Q
N
L
I
R
V
F
Instrument
Solvent
Initial yield
Rep. Yield
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Procise 494HT
na
na
na
X
A
L
R
V
F
D
E
F
K
Initial yield
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
P
L
V
K
k
e
e
X
q
n
l
i
note 2
Sample B2
Participant 020
Sample C1
Instrument
Solvent
Rep. Yield
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Procise 494
na
na
na
G
Participant 014
Procise HT
na
0.5
94.60%
X
A
L
X
V
F
D
E
F
K
X
L
Participant 016
Procise 494cLC
na
na
na
X
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
R
E
P
Participant 006
Participant 036
Edman degradation was successfully employed in this study to obtain N-terminal sequence
information of an unknown protein, not present in public databases, independent of the sample
format. However, the most frequently selected sample format was the PVDF membrane followed
by the lyophilized sample. A slight dependency between concentration and read-length was found
but intra group variation was much higher. Bottom-up work applied to the study samples typically
yielded sequences of another protein . However, the correct sequence was called as well. One
participant also called the 70 C-terminal residues. In this study, top-down sequencing was
attempted by MALDI-ISD from samples A without any success. Investigation of the sample by PSRG
showed that the accessible protein amount in samples A (lyophilized) to the analysis was only ~5%
of what was determined by AAA potentially due to poor solubility. Only much higher sample
amounts of A than distributed allowed to retrieve de novo sequences and several bacterial heat
shock proteins (15-16 kDa range) were identified in that sample after LC protein separation. Taken
together, Edman sequencing demonstrated that the strict dependency on sample material in
particular when applied to a membrane after SDS-PAGE, allowed to operate quite robust and
reliably. All mass spectrometric methods, if not linked strictly to an intact protein MW, can easily
identify “non target” sequences. Here the solubility and the homogeneity of the sample play a
much greater role, in particular for the top-down approaches that have the highest requirement
for sample amount and quality to be particularly recognized in future studies.
Rep. Yield
3
Procise 494HT
Participant 024
CONCLUSION
Initial yield
2
A
Participant 058
The PSRG prepared the 3 sample sets for distribution as follows: The study protein (95% purity by
SEC) was dissolved in 50% acetonitrile/0.1% TFA, lyophylized and the protein content was determined
by AAA. The sample was the aliquoted based on protein content to achieve the desired
concentrations (5pmol, 15pmol and 45pmol respectively). Samples A were lyophylized, samples B and
C were subjected to SDS-PAGE (B) and subsequent electroblotting (C). Upon test analyses for
validation, presence of contaminating proteins were acknowledged and found to mimic a client
sample in a core facility setting. The validation analysis by ISD was performed on an UltrafleXtreme
MALDI-TOF/TOF instrument after samples were shipped and showed that much less protein was
available for analysis than anticipated by the original protein quantification. Participants obtained
instructions for dissolution of samples in set A. However, valid ISD was only obtained for nominal
100pmol of the sample. The participants were asked to use their code number to report their data in
Survey Monkey (www.surveymonkey.com).
Edman Degradation Most participants performed the analysis on a Procise 494HT sequencer using
standard reagents and protocols. The majority of participants used the sample as provided. For
sample set C, the pvdf membrane was directly loaded onto the instrument, for set A, the sample was
dissolved in 0.1% TFA containing 20%-50% acetonitrile, and applied onto a prosorb filter. Initial yields
and repetitive yields were reported (see table).
Bottom-up MS Method: Sample sets A and B were used for this analysis; samples A were dissolved in
ammonium bicarbonate and digested usually using Trypsin and 1-2 additional enzymes. The analysis
was mostly performed on an LTQ or LTQ Orbitrap and the MS/MS data were subjected to database
search using Thermo Proteome Discoverer, or manual de novo mascot searches were performed.
Top Down MS Method: The majority of participants utilized an Ultraflex MALDI-TOF/TOF instrument
and performed in-source decay (ISD) using the matrices 2,5-diaminonapthalene (DAN) or 2,5dihydroxybenzioc acid (DHB) as matrix.
Solvent
1
G
Participant 024
STUDY METHODS:
TYPICAL PARTICIPANT METHODS
Instrument
N-terminal Sequence
Procise 494
na
0.5
95.80%
G
A
L
V
F
D
E
F
K
Procise 494HT
na
0.6
90.80%
X
X
L
X
V
F
X
E
F
X
P
L
V
E
Participant 040
Procise
na
na
na
X
A
L
X
V
F
D
E
F
K
X
L
V
E
Participant 058
Procise 494HT
na
1.7
94.80%
X
A
L
R
V
F
D
E
F
K
P
L
V
E
PSRG001
Procise 494cLC
93.00%
G
A
PSRG002
g
L
R
V
F
D
E
E
E
P
Q
N
L
I
R
V
F
D
E
F
Procise 494HT
na
2.3
88.00%
A
L
R
V
F
D
E
F
X
P
X
V
Sample C2
Instrument
Solvent
Initial yield
Rep. Yield
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Participant 006
Procise 494
na
na
0.6
na
na
G
A
L
r
V
F
D
E
F
f
K
Participant 014
Procise HT
na
1.2
96.40%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
X
F
K
Participant 016
Procise 494cLC
na
3.5
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
Participant 020
P
L
V
E
E
P
Q
N
L
I
X
F
95.50%
G
A
L
R
V
F
D
E
F
K
Procise 494HT
na
na
na
G
A
L
R
V
F
D
E
F
K
Participant 024
Procise 494
na
1.4
99.20%
G
A
L
R
V
F
D
E
F
K
Participant 036
Procise 494HT
na
2.3
92.70%
G
A
L
X
V
F
D
E
F
K
Participant 058
Procise 494HT
95.80%
L
F
PSRG001
Procise 494HT
PSRG002
na
4.2
G
A
R
V
D
E
95.70%
G
A
L
R
V
F
D
E
I
R
V
F
D
E
F
K
P
N
X
X
P
E
E
X
Q
X
N
D
I
G
na
3.5
92.20%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
Instrument
Solvent
Initial yield
Rep. Yield
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Participant 006
Procise 494
na
na
na
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
p
Q
Participant 024
Procise 494
na
10.8
97.70%
G
A
L
R
V
F
D
E
F
K
Participant 036
X
E
D
Q
N
F
K
P
V
Procise 494HT
3
K
L
Sample C3
na
F
P
P
L
V
L
E
V
E
E
E
P
Q
N
L
Procise 494HT
na
4.8
93.60%
G
A
L
X
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
X
F
Participant 040
Procise
na
11.5
93.50%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
F
K
P
L
V
K
P
E
Participant 058
Procise 494HT
na
18.1
96.30%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
PSRG001
Procise 494HT
na
5.5
96.60%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
E
E
P
N
L
H
P
X
E
na
11.3
89.80%
G
A
L
R
V
F
D
E
F
K
P
L
V
E
E
P
Q
N
L
I
R
V
F
D
6
F
7
D
PSRG002
Procise 494HT
note 1: no sequence detected. Participant suspects sample not soluble in 0.1% TFA
note 2: a total of 50 amino acid residues were sequenced
correct N-terminal call
a tentative call is denoted with a lower case letter
no call is marked with "X ";
a wrong call is denoted with a letter not color coded;
450
D
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
L V
E
E
P
Q
N
L
I
R
V
F
D
E
F
45
1 452 453 454 455 456 457 458 459 460 461 462 463 464 465
D K
A
L
V
A
L
H
V
H
H
H
H
H
H
MS
Terminal sequence (de novo)
4700 Proteomics Analyzer N-terminus
C-terminus
1
G
X
2
A
X
3
L
X
4
R
X
5
V
X
6
F
X
7
D
X
8
E
X
9
F
X
10
K
X
11
P
X
12 13
L V
X X
14
E
X
15
E
X
16 17
note 1
18
19
20
21
22
23
24
25
26
27
MS
LTQ Orbitrap Velos ETD
Terminal sequence (de novo)
N-terminus
C-terminus
N-terminus
C-terminus
N-terminus
C-terminus
1
X
X
X
X
G
2
X
X
X
X
A
3
X
X
X
X
L
4
X
X
X
X
R
5
X
X
X
X
V
6
X
X
X
X
F
7
X
X
X
X
D
8
X
X
X
X
E
9
X
X
X
X
F
10
X
X
X
X
K
11
X
X
X
X
X
12 13
X X
X X
X X
X X
X X
14
X
X
X
X
X
A
15
X
X
X
X
X
L
16
18
19
20
21
22
23
24
25
26
27
L
H
V
H
H
H
H
H
H
Terminal sequence (de novo)
1
AcM
E
X
X
2
3
4
5
6
7
8
9
10
11
12 13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
F
T
X
X
M
N
X
X
D
L
X
X
D
Y
X
X
F
F
X
X
A
Q
X
X
A
G
X
X
F
D
V
D
E
D
K
D
A
A
V
L
K
V
F
A
V
L
V
H
P
V
R
H
A
H
L
H
E
H
L
H
L
H
F
N-terminal Sequence
Study Results: Bottom-Up Sequencing
C-terminal Sequence
Sample A1
Participant #040
Sample Processing
10 mM AmBiC, Trypsin and Glu-C
LC
N/A
Sample A2
Participant #048
Sample Processing
Trypsin and Chymotrypsin
LC
Not provided
Participant #034
Trypsin, Glu-C and Lys-C
N/A
100 mM AmBiC, Lys-C and Lys-N
Eksigent NanoLC-2D
Samples B1, B2, B3
Sample Processing
LC
Participant # 048*
Trypsin and Chymotrypsin
Not provided
LTQ Orbitrap Velos ETD
Trypsin
Not provided
LTQ MS
PSRG003
Participant #026
Ultraflex TOF/TOF
LTQ MS
MS
N-terminus
C-terminus
N-terminus
C-terminus
1
G
2
A
3
L
4
R
5
V
8
E
9
F
10
K
11
P
440 441 442 443 444 445 446 447 448 449
E
T
N
L
Y
F
Q
G
D
D
V
K
17
note 2
V
A
27
K
* Participant #048 sequenced more than 200 amino acids by manual spectra interpretation .
Note 1: Participant 040 also sequenced by Edman degradation and had the opportunity to search MS/MS data for the correct N-terminal peptide.
Note 2: Participant PSRG003 used Lys-C and Lys-N in combination according to a published procedure for N-terminal sequencing (see reference section).
correct N-terminal call Correct C-terminal call
no call is marked with "X " an incorrect call is denoted with letter not color coded
Study Results: Top-Down Sequencing
Sample A
Participant 016
Participant 028
Participant 002
Participant 034
PSRG001
Instrument
UltraFlex III
Ultraflex II Flex control
Information not provided
Ultraflex TOF/TOF
4800 MALDI-TOF/TOF
Matrix
Methods
DHB, DAN
MALDI-ISD
DHB, DAN
Intact MW , ISD
no details provided
Intact MW
DAN
ISD
DAN
ISD/T3
Sample Prep
used sample as provided
C4 Zip Tip, eluted with 75% ACN, 0.1% TFA
used sample as provided
used sample as provided
Cl-MeOH precip. Reconst. in 0.1%TFA
Results
None of the participants were able to call an N-terminal or C-terminal sequence when analyzing
sample set A. Investigation of the sample by the PSRG showed that the accessible protein amount
in samples A (lyophilized) to the analysis was significantly less than was determined by AAA due to
poor solubility of the sample in aqueous solvents only. The validation analysis by ISD was
performed on an UltrafleXtreme MALDI-TOF/TOF instrument after samples were shipped.
Participants obtained instructions for dissolution of samples in set A. However, valid ISD was only
obtained for nominal 100pmol of the sample after LC purification.
42