www.abrf.org

Download Report

Transcript www.abrf.org

PROTEIN SEQUENCING RESEARCH
GROUP (PSRG): RESULTS OF THE PSRG
2012/13 STUDY YEAR 2
Terminal Sequencing of Standard
Proteins in a Mixture
PSRG Members
Current Members

Greg Cavey

Robert English (Co-Chair)

Mark Garfield

Pegah Jalili

Sara McGrath (Co-Chair)

Ejvind Mortz
Southwest Michigan Innovation Center
University of Texas Medical Branch
NIH/NIAID
Sigma-Aldrich
FDA
Alphalyse
Outgoing Members

Henriette Remmer (Co-Chair)

Jack Simpson (EB liaison)

Detlev Suckau

Jim Walters (ad-hoc)

Viswanatham Katta
University of Michigan
United States Pharmacopeia
Bruker Daltonics
Sigma-Aldrich
Genentech, Inc
Staying on as ad-hoc member for the coming year: Henriette Remmer
Study Background and Design
Status of Terminal Sequencing

N-terminal sequencing is in the midst of a technology transition from classical Edman
sequencing to mass spectrometry (MS)-based sequencing.

Edman sequencing and MS-based techniques both have strengths and weaknesses.

With a complimentary role realized, the PSRG attempts to push the capabilities of the
various sequencing techniques, namely terminal sequencing of proteins in a mixture.
Concept of the Study - Terminal Sequencing of Proteins in a Mixture

Sequencing proteins in a mixture typically requires separation of proteins prior to
analysis for both Edman sequencing and MS-based technology platforms.

Edman Sequencing :
SDS-PAGE and electroblotting of the separated proteins
– well established everywhere

MS-based sequencing:
LC separation necessary prior to analysis

– not well established in most core facilities
PSRG designed a 2-year study

YEAR 1: Terminal sequencing and I.D. of three separated standard proteins

YEAR 2: Proteins (+ one new protein) distributed in a Mixture
Last Year’s Study Objective
To obtain N-terminal sequence information
on three standard proteins supplied as
separated samples
Last Year’s Study Design – The Samples
Protein
Name
Amounts
Provided
(pmol)
N-terminally
blocked?
Fusion
Protein?
Comments
BSA
1mg
No
No
reference protein/
calibrant
Protein A
3x 100
Yes
Yes
Fusion protein with
modified N-term
Endostatin
3x 100
No
No
Contains two Nterm variants

Participants were asked to analyze samples for terminal sequencing using
any technology available.

Participants received all three proteins with ID in sufficient amounts to
sequence each protein utilizing all three technologies. Feasibility of
analysis had been previously validated by PSRG members.

Participants also filled out a survey, all responses were kept anonymous.
Last Year’s Participation & Survey Results

25 laboratories from 12 countries requested samples for Edman sequencing and
most of the labs (23) also for MS sequencing.

14 of the 25 participating laboratories (56%) completed the survey.

7 of the 14 labs utilized Edman sequencing , 6 top-down MS and 1 bottom-up MS (5
used bottom-up for confirmation).

Out of 14 respondents,

9 labs analyzed the reference protein BSA, 8 correctly determined the Nterminus.

13 labs analyzed Protein A , 4 correctly determined the N-terminus (methyl-Met).

14 labs analyzed Endostatin, 12 labs correctly determined the N-terminus , only 7
identified the presence of the second N-terminus.
Last Year’s Edman Summary &Observations
Edman sequencing allows for direct determination of
the N-terminal sequence of a protein.

All labs returned N-terminal data which correlated well with
published protein sequences.

Edman can produce data with and without separation (SDS PAGE
and chromatography).

No C-terminal data is produced with Edman.

If the protein is N-terminally blocked, the reaction will not proceed
for most (but not all) modifications.

Reagents for Edman sequencing can be very expensive.
MS Lessons Learned from Last Year
 Top-Down with ETD or ISD provided reliable N-term sequences
 Top-Down CID was most easily misinterpreted
 Edman and Top-Down complement each other very well: Edman for the first
~10 residues, Top-Down for the inexpensive extension of calls (e.g. through
the fusion site of Protein A)
 Validation of the N-term by either T³-sequencing or Bottom-Up
 Efficient use of Top-Down MS requires good software support
 Bottom-Up was great to confirm N-term results, but not to generate them
 Use of protein HPLC resulted in shortened readouts
 Protein A: Successful analysis of the fusion protein required high experience
 Endostatin ragged N-termini were recognized by those that determined the
intact molecular weight(s) , detected heterogeneity by HPLC or Edman
 Top-Down by ETD or ISD permitted the detection of the C-terminal removal
of Lysine, intact MW determination allowed validation of the finding
PSRG 2012/13 (Year 2/2) Study Objective
To obtain N-terminal sequence information on
three standard proteins supplied
in a mixture
PSRG 2012/13 Study Design – The Samples
Protein Name
Amount
N-terminally
(pmols / vial)
blocked?
Fusion
Protein?
Comments
S1-casein*
800
No
Yes
His-tagged protein w/
free N-term
Protein A
60
Yes
Yes
Fusion protein w/
modified N-term
Endostatin
(collagen frag)
300
No
No
Contains two N-term
variants

Each participant received 2 vials of the mixture (except one late participant), and each vial contained
the protein amounts listed above with buffer components including 4M urea and PBS. Each participant
also received a third vial containing 750 pmol BSA.

Participants were asked to a) separate the proteins, and b) analyze samples for terminal sequencing
using any technology available.

All protein components showed solubility in traditional proteomic buffers, including water, 0.1% FA, and
0.1% TFA. Specifically, endostatin had shown solubility in 20% ACN, 0.1% TFA, 50% pyridine, and
buffers compatible with 1D gel electrophoresis.

Participants also filled out an online survey (responses were kept anonymous).
*
Protein not included in last year’s study.
PSRG 2012/13 Participation Results

15 laboratories from 6 countries requested samples.

5 of the 15 participating laboratories (33%) completed the online survey
(Labs 05S, 16X-”A”, 08D, 12P, and 20M)


Edman sequencing (2 labs)
Top-down MS (3 labs)
Bottom-up MS (2 labs)

Out of 5 respondents,





-casein:
Protein A:
Endostatin:
2 labs correctly determined the N-terminus
3 labs correctly determined the N-terminus
2 labs correctly determined the N-terminus , and 1 lab
identified the presence of the second N-terminus
N-Terminal Techniques:
Edman Degradation
presented by Henriette A. Remmer
Protein N-termini
Protein
N-terminal Sequence
Endostatin Variant 1 D-F-Q-P-V-L-H-L-V-A-L-N-S-P-L-S-G-G-M-R-G-I-R-G-A
(300pmol)
Endostatin Variant 2 H-S-H-R-D-F-Q-P-V-L-H-L-V-A-L-N-S-P-L-S-G-G-M-R-G
Protein A
(60pmol)
Methyl-M-L-R-P-V-E-T-P-T-R-E-I-K-K-L-D-G-L-A-Q-H-D-E-A-Q
-S1-Casein
M-H-H-H-H-H-H-S-S-G-L-V-P-R-G-S-G-M-K-E-T-A-A-A-K
(800pmol)
•
•
•
Note: His-Tag and S-Tag sequences highlighted; fusion protein; Arg45 of the
recombinant protein corresponds to Arg1 in the human protein
Two participants, 16X and 20M submitted Edman sequencing results.
Participant 16X tried different techniques for protein separation
The PSRG used Edman sequencing without prior separation of proteins for comparison
Edman Sample Preparation Workflows
PSRG 2013 Samples
Used sample as
Provided
-no separation(PSRG)
SDS PAGE –
blotting on PVDF
(16X, 20M)
HPLC
(16X)
Gel Eluted Liquid
Fractionation Entrapment
Electrophoresis (GELFrEE)
(16X)
ABI Procise
2 - 494 HT
1 – 492 HT
16 X
Edman Results
Fraction 10
Fraction 11
Entire sample tube A used for HPLC:
50% of fractions 7, 10 and 11 for Edman sequencing:
30 pmol Protein A
150pmol Endostatin
400 pmol -S1 Casein
Fraction 7
Protein A
M
L
R
P
V
E
T
P
T
R
Fraction 7
X
L
L
X
X
P
X
E
T
Y
X
H
X
Fraction 10:
no amino acid assignments
Endostatin
D
F
Q
P
V
L
H
L
V
A
Fraction 11
X
F
Q
P
A
V
G
L
A
T
Q
L
V
G
A
Edman Results
16 X
90% of sample tube B separated on GELFrEE tube gel eluter :
GELFrEE System: Gel Eluted Liquid Fractionation Entrapment Electrophoresis
•
•
•
Disposable cartridges contain SDS-polyacrylamide gel matrix
Proteins are solubilized and electrophoresed
Size based separation and liquid phase recovery
Sample Preparation after GELFrEE separation:
• Fractions evaporated to dryness
• Konigsberg acetone precipitation1 (3x) followed by 2 acetone washes
• Dissolved precipitate in 0.1% TFA
• 50% of solution for Edman sequencing-applied to Glass Fiber Filter with polybrene
treatment
27 pmol Protein A; 135 pmol Endostatin , 360 pmol -S1Casein
No meaningful Edman data obtained –only Gly and Tris artifact peaks
1 LE Henderson, S Oroszlan, W Konigsberg; Anal. Biochem. 1979, 93(1), 153-157
Edman Results
16 X
9% of sample tube B used for SDS-PAGE/pvdf blotting:
5.4 pmol Protein A; 27 pmol Endostatin , 72 pmol -S1Casein
-S1 Casein
M
H
H
H
H
H
H
S
S
G
Band 1
m
H
H
H
H
S
X
A
X
G
-S1 Casein
M
H
H
H
H
H
H
S
S
G
Band 2+3
M
H
H
h
H
s
p
X
y
X
Enostatin
D
H
F
S
Q
H
P
R
V
D
L
F
H
Q
L
P
V
V
A
L
Band 4
D
H
F
S
Q
H
P
R
V
D
L
F
H
Q
L
p
V
V
A
L
Edman Results
20M
30% of one sample tube used with sample preparation by SDS-PAGE/pvdf blotting:
20 pmol Protein A; 100 pmol Endostatin , 250 pmol -S1Casein
Protein A
Reported
sequence
M* L
F
L
R
P
V
E
T
P
T
R
E
I
K
K
L
D
G
L
A
Q
X
P
V
E
T
P
T
X
E
I
K
K
L
-
-
-
-
-
12 of 15 residues correctly assigned
Enostatin
D
F
Q
P
V
L
H
L
V
A
L
N
S
P
L
S
G
G
M
R
Reported
sequence
D
F
Q
P
V
L
H
L
V
A
L
N
S
P
L
-
-
-
-
-
15 residues correctly assigned
-S1 Casein
M
H
H
H
H
H
H
S
S
G
L
V
P
R
G
S
G
M
K
E
Reported
sequence
M
H
H
H
H
H
H
S
S
G
L
V
P
R
G
S
G
M
K
E
25 residues correctly assigned
Edman Results
PSRG
Sequencing of protein mix without prior separation
Major Sequence
D
F
Sequence 2
M
X/S
Sequence 3
H
S/X
Major Sequence
D
Sequence 2
M
Sequence 3
H
Started with 30% of the mix from one tube: 15 pmol Protein A, 80 pmol Endostatin, 200 pmol -S1 casein
Repetitive Yield: 95%
Effectiveness of HPLC for sample preparation
compared to SDS-PAGE/blotting
Number of correct assignments for
the 10 N-terminal amino acids
10
9
8
7
6
5
4
3
2
1
0
a-S1 casein
Endostatin variant 1
after SDSPAGE/blot
Endostatin variant 2
after HPLC separation
Protein A
without separation (mix)
Sample
Description
Lab ID
-S1 Casein
20M
SDS-PAGE/pvdf: 250 pmol
MHHHHHHSSGLVPRGSGMKET
16X
HPLC: 400 pmol
No sequence reported (“not clear why it did not work”)
16X
GELFrEE tube gel eluter: 360 pmol
No sequence reported (interfering buffer components)
16X
SDS-PAGE/pvdf: 72 pmol
MHHHHSXAXG
PSRG
Mix (no separation): 200 pmol
MXHXXXXXXXXXXXXXXXXXXX
20M
SDS-PAGE/pvdf: 20 pmol
F L X P V E T P T X E I K K L
16 X
HPLC: 30 pmol
X L X P X E T X HX
16-X
GELFrEE tube gel eluter: 27 pmol
No sequence reported (interfering buffer components)
16 X
SDS-PAGE/pvdf: 5 pmol
No sequence reported ( low protein amount)
PSRG
Mix (no separation): 15 pmol
X X X P V X T P X RXXXXXX
20M
SDS-PAGE/pvdf: 100 pmol
D F Q P V L H L V A L N S P L
16 X
HPLC: 150 pmol
X F Q P V L T L V A
16 X
GELFrEE tube gel eluter: 135 pmol
No sequence reported (interfering buffer components)
16 X
SDS-PAGE/pvdf: 72 pmol
D F Q P V L H L V A
PSRG
Mix (no separation): 100 pmol
D F Q P V L H L V A L N S P LSGGMRGIR
20M
SDS-PAGE/pvdf
No sequence reported
16 X
HPLC
No sequence reported
16-X
GELFrEE tube gel eluter
No sequence reported (interfering buffer components)
16 X
SDS-PAGE/pvdf
H S H R D F Q P VX
PSRG
Mix (no separation)
HS H R X X X X X X X
Protein A
Endostatin
Sequence
Variant 1
Endostatin
Sequence
Variant 2
Sample Preparation Technique
and Sample Amount
Amino Acid Sequence
PSRG 2013 Edman Conclusions & Observations

None of the participating labs detected all 4 N-termini by Edman sequencing of
the separated proteins.

SDS/PAGE/electroblotting to pvdf showed to be the most successful and easy to
use technology for sample preparation

HPLC is associated with significant preparative losses and electroelution with
buffer interference

Edman sequencing can produce data with and –for simple mixtures – also without
prior separation of proteins. For complex mixtures, protein separation by SDSPAGe/blotting or HPLC is necessary.

Without separation, a single major component can be sequenced within a protein
mixture but sequence assignment for components apparently present in similar
amounts is more challenging.

If the protein is N-terminally modified, the reaction will not proceed for most but
not all modifications.
N-Terminal Techniques:
MALDI-ISD Top-Down Sequencing
presented by Detlev Suckau
What is Top-Down Mass Spectrometry?
why is it Most Appropriate for Terminal Protein Sequencing?

All MS Analysis based on intact undigested protein
N-TERMSEQUENCE
C-TERMSEQUENCE



Intact molecular weight: gross structure validation
Targeted sequencing of the N- and C-terminus
Bottom-up analysis tackles the termini only arbitrarily
Most employed Terminal Sequencing in PSRG Studies
are base on MALDI-In Source Decay (MALDI-ISD)
MALDI N & C-Terminal Sequencing
1,5-DAN
N
T
E
R
M
S
•
•
•
•
Q
U
E
N
C
E
C
MALDI
Intact
Sequencing
N
30 seconds!
E
N
T
N
T
E
N
T
E
R
N
T
E
R
M
N
T
E
R
M
S
N
T
E
R
M
S
E
C
T
E
R
M
M
R
M
E
R
M
T
E
R
M
C
T
E
R
M
E
C
T
E
R
M
E
C
T
E
R
M
Confirm N & C terminal sequences
Identify truncations/terminal PTMs
Generate sequence information without proteolytic digestion
In less than 1 minute
MALDI TOF/TOF
40,000 resolution
MALDI-ISD Process is most affected by the choice
of MALDI Matrix
Intact Panitumumab N & C-Terminal Sequencing –
1 ISD Spectrum > 2 Protein Sequences
sequence match of LC confirming the sequence of both termini
64
21
K
Intact Panitumumab N & C-Terminal Sequencing
1 ISD Spectrum > 2 Protein Sequences
sequence match of HC confirming
N-term pyro-glutamylation and C-term lysine truncation
72
pyroGlu-
50
K
X
Lystruncation
Panitumumab Sequence Covered in a Single
MALDI-ISD Spectrum
Heavy Chain
c-ions
z,y-ions
Q/pE (-18 Da)
18 C-C
(- 36Da)
Q/pE (-18 Da)
Hi
Hi
C
H
2
C
H
2
N295ST
N -glycosylation
C
H
3
C
H
3
0,1, 2 K
1
10
22
27 30
40
50
QVQLQESGPGLVKPSETLSLTCTVSGGSVSSGDYYWTWIRQSPGKGLEWI
53
60
70
80
90
98100
GHIYYSGNTNYNPSLKSRLTISIDTSKTQFSLKLSSVTAADTAIYYCVRD
101
110
120
130
140
146 150
RVTGAFDIWGQGTMVTVSSASTKGPSVFPLAPCSRSTSESTAALGCLVKD
151
160
170
180
190
200
YFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSNFGTQTY
202
210
221 225 230
240
250
TCNVDHKPSNTKVDKTVERKCCVECPPCPAPPVAGPSVFLFPPKPKDTLM
251
259
270
278
290 295 300
ISRTPEVTCVVVDVSHEDPEVQFNWYVDGVEVHNAKTKPREEQFNSTFRV
301
310313
319
330
340
350
VSVLTVVHQDWLNGKEYKCKVSNKGLPAPIEKTISKTKGQPREPQVYTLP
351
360 365 370
382
390
399
PSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPMLDSDG
401
410
423
430
440 445
SFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
Light Chain
1
10
20 24 28
40
50
DIQMTQSPSSLSASVGDRVTITCQASQDISNYLNWYQQKPGKAPKLLIYD
51
60
70
80
89
100
ASNLETGVPSRFSGSGSGTDFTFTISSLQPEDIATYFCQHFDHLPLAFGG
101
110
120
130 135 140
150
GTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKV
151
158
170
180
190 195
200
DNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQG
201
210 214
LSSPVTKSFNRGEC
Sequence Covered after LC-Separation
MALDI-TDS Middle-Down Increases Sequence Coverage for mAbs
Heavy Chain
c-ions
z,y-ions
Q/pE (-18 Da)
18 C-C
(- 36Da)
Q/pE (-18 Da)
Hi
Hi
C
H
2
C
H
2
N295ST
N -glycosylation
C
H
3
C
H
3
0,1, 2 K
1
10
22
27 30
40
50
QVQLQESGPGLVKPSETLSLTCTVSGGSVSSGDYYWTWIRQSPGKGLEWI
53
60
70
80
90
98100
GHIYYSGNTNYNPSLKSRLTISIDTSKTQFSLKLSSVTAADTAIYYCVRD
101
110
120
130
140
146 150
RVTGAFDIWGQGTMVTVSSASTKGPSVFPLAPCSRSTSESTAALGCLVKD
151
160
170
180
190
200
YFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSNFGTQTY
202
210
221 225 230
240
250
TCNVDHKPSNTKVDKTVERKCCVECPPCPAPPVAGPSVFLFPPKPKDTLM
251
259
270
278
290 295 300
ISRTPEVTCVVVDVSHEDPEVQFNWYVDGVEVHNAKTKPREEQFNSTFRV
301
310313
319
330
340
350
VSVLTVVHQDWLNGKEYKCKVSNKGLPAPIEKTISKTKGQPREPQVYTLP
351
360 365 370
382
390
399
PSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPMLDSDG
401
410
423
430
440 445
SFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK
Light Chain
1
10
20 24 28
40
50
DIQMTQSPSSLSASVGDRVTITCQASQDISNYLNWYQQKPGKAPKLLIYD
51
60
70
80
89
100
ASNLETGVPSRFSGSGSGTDFTFTISSLQPEDIATYFCQHFDHLPLAFGG
101
110
120
130 135 140
150
GTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKV
151
158
170
180
190 195
200
DNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQG
201
210 214
LSSPVTKSFNRGEC
Fast and Extensive mAb Product Characterization
Middle-Down Panitumumab Analysis
1. Remove glycan, 2. cleave HC, 3. reduce
HC
Fd
LC
Fc/2
1. Endoglycosidase F2
•
•
2.IdeS cleavage
3.Full TCEP reduction
IdeS cleaves the HC of mAbs specifically at a conserved Gly-Gly motif
in the hinge region
Works for most mammalian antibodies in their native form
Middle-Down Analysis of IdeS Digest
Panitumumab: LC-MALDI
Fd
(M+H)1+
(M+2H)2+
R(t)
LC-MALDI-TDS Analysis of Panitumumab
Fabricator Digest
Column: Zorbax C8 Matrix: sDHB
(M+3H)3+
(M+2H)2+
(M+H)1+
LC
Fc
Middle-Down Analysis of IdeS Digest
Panitumumab IdeS+GlycoZERO: TDS of Fc-Fragment
LC-MALDI-TDS
Fc Fragment
Localization of glycan
C-terminal K truncation
Coverage: 62%
Middle-Down Analysis of IdeS Digest
Panitumumab: TDS of LC
LC-MALDI-TDS Light Chain:
90 AA from N-terminus Coverage: 70 %
Middle-Down Analysis of IdeS Digest
Panitumumab: TDS of Fd-Fragment
LC-MALDI-TDS Fd Fragment
91 AA of variable N-terminus
Coverage: 58 %
Considerations for Top-Down LC-MALDI-ISD
Analysis






≥ 50 pmol protein/LC-fraction is desired
≥ 100 pmol/protein need to be applied to column for
each protein
i.e., a 10 protein mix loads > 1 nmol on column
Suitable columns: C8 capLC, monolithic columns
Severe protein losses typical during LC decrease signal
Reduction/alkylation and oxid. typically increase noise
Software for Top-Down Sequence Analysis
Functionality
 Assign expected sequence to TDS spectrum
 ID Proteins through TD standard Mascot searches
 Manual/Automatic de novo sequencing
 Manual/Automatic de novo sequencing + BLAST
 Generate test hypothesis to explain Dm´s
Available software
 BioTools 3.2
 ProSight PTM
 ProSightPC 2.0
(Bruker Daltonik
(Kelleher Group
(Thermo
(BioPharma QC)
(Discovery, ID)
(Edman like)
(Sequencing+ID)
(terminal mods,
truncations)
ECD/ETD/MALDI-ISD)
ETD)
ETD)
MALDI Top-Down and Middle-Down
Analysis in routine BioPharma QC
•
•
Intact Protein MW < 10 ppm for MW up to 30 kDa
Middle Down antibody work ideally suited for MALDI
•
•
•
•
•
Screen for PTMs or processing errors on the domain level
Validate processing errors by MALDI-TDS
Automated screening /validation in BioPharmaCompass
MWs in minutes
TDS in sec under full automation
MALDI-ISD Fragments and Database Searching
y-(z+2) = 15.01 Da, c-a= 45.02 Da
Top-Down Search principle
•
•
•
Select m/z range of ISD fragments as “virtual
precursor ions”
lower mass fragments are used as dependent
fragments
Standard MS/MS Mascot search with MALDIISD as fragment ion set
MALDI Top-Down Sequencing:
Intens. [a.u.]
Identification from the protein DB in seconds..
x105
MH+
1.25
116429
1.00
0.75
58168
0.50
0.25
77627
232995
174834
349227
0.00
50000
100000
150000
200000
250000
300000
350000
m/z
MALDI-TDS Result from b-Galactosidase
Terminal Sequences Confirmed in Seconds..
N-term Methionine truncation confirmed
Intens. [a.u.]
Manual de novo Sequencing
x105
1.2
3562.83
3725.90
4010.03
1.0
Y + 0.01
R - 0.05
K - 0.01
AP + 0.03
G - 0.02
0.8
3881.95
4235.16
4178.16
0.6
4041.85
0.4
3970.76
c-ion series assigned by:
•High intensity
•-45 Da a-ion satellites
•Proline gaps
3839.82
3636.73
0.2
0.0
3600
3700
3800
3900
4000
4100
4200
m/z
MALDI-TDS De Novo Analysis of the N-terminus
Abs. Int. * 1000
a
c
375
24
9
140
260
11
650
350
22
130
35
240
8
10
600
325
20
120
2209
550
300
7
30
110
18
200
2758
500
100
16
6
180
7
250
25
450
90
14
2256
160
400
80
5
12
20
200
5
140
350
70
10
1754
120
300
60
15
1508
3
100
250
50
3
1256
2
80
10
40
200
100
1
2
4
60
30
150
75
0
5
2
40
20
1
100
50
-1
0
10
20
50
25
0
-20
-20
0
0
L
G
W
G
R
Q
A
S
L
W
S
N
E
C
S
L
F
c 21
c 19
c 24
c 28
c 50
35
cc60
39
c 55
c 45
c 64
1836
3372
4644
2652
6260.0
6635
2282
4006
5586
5128 2038
1838
6262.5
3374
4646
2654
2284
4008
5588
6640
5130
6265.0
2040
1840
3376
4648
2656
2286
4010
6267.5
5590
5132
66452042
1842
3378
6270.0
4650
2658
2288
4012
5592
5134 6650
2044
6272.5
1844
3380
4652
2660
2290
4014
5594
m /z
Resemann et al. 2010 Anal Chem 82:3283-92
Top-down de novo protein sequencing of a 13.6 kDa camelid single heavy chain antibody by MALDI-TOF/TOF MS.
Resemann et al. 2010 AC
Cap-LC (monolithic column, 10 % sample loaded)
Dionex PS-DVB 500
Protein A intact
UV LC traces allow to quantify
Proteins in unknown samples
His-Tag Casein
Protein A
truncation forms
Endostatins
LC-MALDI MS Analysis of the PSRG2013 Sample
Prototype
Endostatins
His-Tag Casein
Protein A intact
Protein A truncation
forms
Difficulties to Tackle in this years study when using
Top-Down MS





Separate proteins by LC
Determine Protein MWs after LC separation
Assign Protein IDs based on MW (Seq. known)
Establish Top-Down sequence analysis online or offline
Map experimental terminal sequences to the known
protein sequences
MS Strategies Used in 2013 Study
What types of analyses did you perform?
3
3
3
2
1
0
Top -down MS
Intact MW
In Source Decay
Most respondents
0
T3 Sequencing
Tandem MS
ECD/ETD
Additional work by some

HPLC of intact proteins with
fraction collection


LC-UV or ESI-MS to determine
fractions of interest


Intact MW by MALDI or ESI
Sequence determination by ISD
(top-down)


Bottom-up MS
Trypsin digest of fractions containing
proteins
CID or ISD for sequence determination
(bottom-up)
Some needed more sample cleanup or
SDS-PAGE visualization of protein
fractions
Goal 1 = good separation, detect expected proteins

Data provided shows good protein separation

First use intact MW to detect known proteins




MALDI MW usually very accurate
ESI MW determination possible with deconvolution software
Poor S/N or modifications complicate interpretation
Most intact MW data provided is good enough to
indicate protein variants


Cannot ID specific differences
Probably not sufficient for samples with unknown proteins
Successful Protein Separation
08D
•
•
•
•
HPLC system: Agilent 1200
Column: Agilent Poroshell 300SB-C8, 2.1x75 mm 5-micron
Solvent A: 0.1% TFA in water, Solvent B: 0.1% TFA in ACN
Gradient: 25-60% B in 9.5 min, Flow: 0.5 ml/min, Column temperature: 70°C
•
•
•
•
HPLC system: Thermo Surveyor
Column: Waters Biosuite Phenyl 1000 2.0x75 mm 10-micron
Solvent A: 0.1% FA in water, Solvent B: 0.1% FA in ACN
Gradient: 5-95% B in 60 min, Flow: 0.1 ml/min
•
•
•
•
HPLC system: Agilent 1260-Dionex Chromeleon
Column: Zorbax 300SB-C8 2.1x150mm
Solvent A: 0.1% TFA in water, Solvent B: 0.1% TFA in ACN
Gradient: 5-95% B in 40 min, Flow: 0.2 ml/min, Column temperature: 40°C
12P
16X
16X
• HPLC system: Agilent ChipCube
• Acquisition conditions unknown
• Agilent does offer Intact Protein Chip
• C-8 SB-ZORBAX, 300Å, 75 μm x 43 mm
08D
Identification by Intact MW – Endostatin
5
MALDI of fractions 6-9 shows protein
corresponding to Endostatin in all
4
Variant 2 only appears in fraction 6
8
6
1
7
9
Intens. [a.u.]
23
19456.1
100
LP20-50 0:E6 MS
20005.5
50
• MS: Bruker Autoflex Speed
• MALDI /DHB matrix for intact MW
Intens. [a.u.]
Intens. [a.u.]
• HPLC system: Agilent 1200
• Column: Agilent Poroshell 300SB-C8
2.1x75 mm 5-micron
• Solvent A: 0.1% TFA in water
• Solvent B: 0.1% TFA in ACN
• Gradient: 25-60% B in 9.5 min
• Flow: 0.5 ml/min
• Column temperature: 70°C
Intens. [a.u.]
0
600
19445.3
LP20-50 0:F5 MS
19442.4
LP20-50 0:F6 MS
400
200
0
300
200
100
0
19520.8
40
LP20-50 0:G5 MS
20
0
18000
19000
20000
21000
22000
m/z
Identification by Intact MW – Endostatin
ESI-MS chromatogram
highlighting 3rd peak
Mulitply-charged ESI
envelope with good S/N –
easy to interpret
Instrument software allows
deisotoping, intact MW, and
determination of variants
• HPLC system: Agilent ChipCube
• MS: Agilent 6210 ESI-TOF
• Acquisition conditions unknown
16X
Identification by Intact MW – Protein A
ESI-MS chromatogram
highlighting 1st peak
Mulitply-charged ESI envelope with
poor S/N –harder to see by eye
Instrument software deisotoping and intact
MW determination still good for this protein
•
•
•
•
HPLC system: Thermo Surveyor
Column: Waters Biosuite Phenyl 1000 2.0x75mm 10um
Solvent A: 0.1% FA in water, Solvent B: 0.1% FA in ACN
Gradient: 5-95% B in 60 min, Flow: 0.1 ml/min
• MS: LTQ-FT Ultra
• ESI infusion for intact MW
12P
Accurate N-term sequence – Protein A
G
D
•
•
•
•
Sample: 0.1% TFA + 50 mg/mL TCEP
Mix 1:1 with 10 mg/mL DAN matrix
MS: Bruker Ultraflex TOF/TOF ("A")
MALDI/ISD top-down
16X
Accurate N-term sequence – Protein A
Methylation of N-terminal methionine
• MS: Bruker Autoflex Speed
• MALDI /DHB matrix
• Fractions for ISD selected based on intact MW results
08D
Sample Manipulation

Sample simplified by reduction/alkylation

S1 Casein observed as a dimer (3 Cys protein)
Reduction of the sample resulted in simplified chromatogram

Some respondents use reduction as a matter of protocol


However, less sample manipulation is better


Sample loss at every step
Oxidation or other artificial modifications to the proteins can occur
Identification by Intact MW – S1 Casein
ESI-MS chromatogram
highlighting 3rd peak
Mulitply-charged ESI envelope
and deconvolution to intact MW
Protein appears as a dimer –
greater chance of misidentification?
•
•
•
•
HPLC system: Thermo Surveyor
Column: Waters Biosuite Phenyl 1000 2.0x75mm 10um
Solvent A: 0.1% FA in water, Solvent B: 0.1% FA in ACN
Gradient: 5-95% B in 60 min, Flow: 0.1 ml/min
• MS: LTQ-FT Ultra with ESI infusion
• Deconvolution for intact MW by ThermoFisher ProMass software
12P
08D
Chromatogram Simplified by Reduction
Native
Reduced and alkylated
5
2
4
8
1
6
1
23
7
9
3 4 5
The native fraction 5 peak (RT = 6.7 min) disappears after reduction and the
RT = 5.3 min peak increases. The reduction cleaves the dimer.
•
•
•
•
HPLC system: Agilent 1200, Column: Agilent Poroshell 300SB-C8 , 2.1x75 mm 5-micron
Solvent A: 0.1% TFA in water, Solvent B: 0.1% TFA in ACN
Gradient: 25-60% B in 9.5 min, Flow: 0.5 ml/min, Column temperature: 70°C
UV detection
CID or ISD Identification – S1 Casein
08D
MALDI-ISD (reduced), intact protein
Correct N-term identification possible
on native protein by top-down MS
ESI-CID (native), tryptic peptide
Bottom-up MS provides
confirmation that N-term
identification is correct
Goal 2 = Accurate N-terminal sequence
Lab ID
Protein
MS details
-S1 Casein
05S
Protein A
Endostatin Seq. 1
HPLC, fraction collect
Axima Resonance QIT-TOF
Intent to perform MALDI of fractions
for intact MW
Endostatin Seq. 2
-S1 Casein
08D
Protein A
Endostatin Seq. 1
Endostatin Seq. 2
-S1 Casein
12P
Protein A
Endostatin Seq. 1
Endostatin Seq. 2
-S1 Casein
16X
Protein A
Endostatin Seq. 1
Endostatin Seq. 2
HPLC, fraction collect
Bruker Autoflex Speed
MALDI/DHB matrix for intact MW
ISD top-down on intact proteins
ISD bottom-up on tryptic fragments
HPLC, fraction collect
LTQ-FT Ultra
ESI infusion for intact MW
Intent to perform top-down CID on
intact proteins with ESI infusion
HPLC, fraction collect
Bruker Ultraflex TOF/TOF ("A")
MALDI/DAN matrix for intact MW
ISD top-down on intact proteins
Agilent 6210 LC-Chip-TOF ("B")
HPLC-ESI of intact proteins
Online ESI for intact MW
Intact MW
Endostatin 1 C-ox
Endostatin 2 C-ox
Protein A
S1-casein dimer
19445.871 Da
19963.414 Da
44612.0 Da
50011.2 Da
Amino acid sequence
--
No sequence reported
--
No sequence reported
--
No sequence reported
--
No sequence reported
--
MHHHHHHSSGLVPRGSGMKETAAAKFE
RQHMD
--
(methyl-M L R P V E T P) T R E I K K L D G L A Q H
19445.3 (-0.5Da)
No sequence reported
20005.5 (+42 Da)
No sequence reported
53622 (+3611 Da?)
No sequence reported
44611 (-1 Da)
No sequence reported
19433.4 (-12 Da)
No sequence reported
19951.0 (-12 Da)
No sequence reported
50233 (+222 Da?)
No sequence reported
44613 (+1 Da)
(methyl-M L R P V E T P) T R E (I/L) K K (I/L) D G (L A Q)
HD
19447 (+2 Da)
XFQPVLXLVAXXXPX
19464 (+1Da)
No sequence reported
Mass Spec - Techniques, Problems, Comments

Proteins collected in fractions may precipitate



Several respondents mentioned poor results for ESI/CID analysis
of samples that had been HPLC-separated and fraction-collected
Led to fewer confident N-terminal identifications
Problem not observed by participants using MALDI-ISD analysis
(more experience with the workflow?)
Mass Spec - Techniques, Problems, Comments
Not enough material?


“There was not much sample for method development. There was no
room for mistakes, which was very challenging.”
In 2012 study, 100 pmol provided by PSRG, 30-100 pmol used by
respondents
Was enough material provided?
Percent Respondants Voting “Yes”

2012
100.00%
2013 (quantity provided by PSRG)
75.00%
50.00%
25.00%
0.00%
Protein A
Endostatin
aS1-casein
Mass Spec - Techniques, Problems, Comments

Not enough material?



“There was not much sample for method development. There was no
room for mistakes, which was very challenging.”
In 2012 study, 30-100 pmol used by respondents
However, in 2013 study, all participants needed protein separation

Significant loss due to separation techniques, multiple injections needed?
5
What methods, if any, did
you use to separate and
purify the sample?
3
2
10
1
2012
2013
Mass Spec - Techniques, Problems, Comments

Analysis

All participants provided intact MW information


Only repeat participants provided correct sequence information



One of the lessons from last year 
Top-down MALDI-ISD appears to be best to sequence N-terminus
Bottom-up if needed to confirm N-or C-terminus
Skill level may be a factor in successful analysis




No information on the skill level of participants
How many participants determine N-terminal sequences on a regular
basis?
We did not get comments on how these study samples compare to
regular client samples
Intact protein separation in MS based workflows is still uncommon in
most core labs. These skills need to be developed for Top-down protein
analysis and Top-down proteomics that is on the rise-a challenge for the
coming years.
Study Conclusions

Edman sequencing allows for direct determination of the N-terminus, prior protein
separation is necessary for a protein mixture.

Top-down –MS with MALDI-ISD provided reliable N-terminal sequences, which can
were further validated by bottom-up MS. Prior protein separation by HPLC works
reasonably well.

Edman and Top-Down complement each other very well: Edman for the first ~10
residues, Top-Down for the inexpensive extension of calls (e.g. through the fusion
site of Protein A).

Protein sequencing subsequent to HPLC separation posed a significant challenge to
all labs and for Edman sequencing as well as Top down mass spectrometry. It is
clear that this workflow needs work to be established in core labs to avoid losses and
chemistry artifacts due to oxidation or reduction/alkylation

Sequences of protein variants present in smaller amounts in a mixture are easily
overlooked by both Edman and top-down MS. Endostatin’s ragged N-termini were
recognized by those that determined the intact molecular weight(s) or used Edman
sequencing with prior protein separation.
Study Conclusions-continued
 -S1Casein was the most abundant protein in the mixture and presented
weak sequence results in Edman and ISD. The expected MW was not
accurately determined by any participants. Typically much higher MW
readings were obtained.
 PSRG MALDI-TOF measurements of the pure casein provided the correct
MW (with heterogeneity up to +200 Da!) and the expected terminal
sequences.
50011+ ~16 Da
Acknowledgments
Sponsors of Study Proteins:
•
ABRF
•
Biomedical Research Core Facilities, University of Michigan
•
Repligen
•
Bruker Daltonics
•
Sigma Aldrich
Anonymizer:
•
Xuemei Luo, University of Texas Medical Branch
Edman Sequencing: Steve Smith, University of Texas Medical Branch
MALDI-ISD:
Anja Resemann, Bruker Daltonics
………and study participants!!!!!!
Proposal for 2014 Study

Focus on protein separation techniques for terminal sequencing of a simple
protein mixture

ABRF-wide Edman sequencing survey

Other ideas?????
Introduction Appendix
Slides
What types of analyses did you perform on the sample? Check all that apply.
Answer Options
Top -down mass spectrometry analysis
Bottom-up mass spectrometry analysis
Edman degradation
In Source Decay
T3 Sequencig
ECD/ETD
Tandem MS
Intact MW Measurement
Electroelution
Biochemical Labeling or Capture
Other (please specify)
Response
Percent
Response
Count
60.0%
40.0%
40.0%
60.0%
0.0%
0.0%
20.0%
60.0%
0.0%
0.0%
3
2
2
3
0
0
1
3
0
0
1
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
Electroelution
Tandem MS
T3 Sequencig
Edman
degradation
Top -down mass
spectrometry
analysis
0.0%
What methods, if any, did you use to separate
and purify the sample?
Response
Percent
Response
Count
Used sample as provided
0.0%
0
SDS_PAGE
25.0%
1
HPLC
75.0%
3
Answer Options
1
Other (please specify)
Number
Response Date
1
AnsweredQuestion
4
SkippedQuestion
1
Other (please
specify)
Categories
Jan 10, 2013 4:12 PM Micro-Biospin 6 column (Bio-Rad)
If YES to question #4, how much material did you use in order to
confidently call the N-terminus of the protein ? Please specify units.
Answer Options
a-S1-Casein
Protein A
Endostatin (Collagen fragment)
Number
Response Date
1
2
3
4
Response
Percent
Response
Count
75.0%
3
100.0%
4
75.0%
3
AnsweredQuestion
4
SkippedQuestion
1
a-S1-Casein
Categories
Endostatin
Protein A Categories (Collagen Categories
fragment)
ISD analysis not yet
ISD analysis not yet completed completed - results
Jan 25, 2013 5:14 PM - results TBD
TBD
estimate 15 pmol if
50% recovery from
Jan 11, 2013 10:56 PM
RPLC
Jan 11, 2013 10:10 AM 50 pmol
50 pmol
Dec 23, 2012 8:56 AM 250 pmol
20 pmol
ISD analysis not yet completed - results TBD
estimate 75 pmol if 50% recovery from RPLC
100
pmol
If NO to #4, how much more material would you need in order to
confidently call the N-terminus of the protein? Please specify units.
Answer Options
a-S1-Casien
Protein A
Endostatin (Collagen fragment)
Number
Response Date
1
2
3
Response
Percent
Response
Count
66.7%
2
33.3%
1
66.7%
2
AnsweredQuestion
3
SkippedQuestion
2
a-S1-Casien
Categories
Endostatin
Protein A Categories (Collagen Categories
fragment)
Jan 11, 2013 10:56 PM not clear why this did not work by Edman or ISD as it is most abundant-can't answer
Jan 11, 2013 10:10 AM
?
Jan 10, 2013 4:12 PM 1600pmol
120pmol
600pmol
Please list the amino acid sequence (in single letter code) that you
were able to CONFIDENTLY call for the N-terminus of the protein
Answer Options
a-S1-Casein
Protein A
Endostatin (Collagen fragment) #1
Endostatin (Collagen fragment) #2
Number
Response Count
75.0%
3
100.0%
4
75.0%
3
25.0%
1
AnsweredQuestion
4
SkippedQuestion
1
Response Date
1
Response
Percent
a-S1-Casein
Jan 25, 2013 5:14 PM no results
Protein A
no results
Endostatin (Collagen
fragment) #1
no results
2
Jan 11, 2013 10:56 PM
(methyl-MLRPVETP)TRE(I/L)KK(I/L)DG(LAQ)HD XFQPVLXLVAXXXPX
3
Jan 11, 2013 10:10 AM MHHHHHHSSGLVPRGSGMKETAAAKFERQHMD
(MLRPVETP)TREIKKLDGLAQH
4
Dec 23, 2012 8:56 AM MHHHHHHSSGLVPRGSGMKETAAAK
FLXPVETPTXEIKKL
DFQPVLHLVALNSPL
Endostatin (Collagen
fragment) #2
no results
Successful Protein Separation
HPLC-ESI
HPLC-UV
5
4
8
6
1
7
23
HPLC-DAD
9
Goal 2 = Accurate N-terminal sequence
Sample Description
Lab ID
Intact MW reported
-S1 Casein
05S
--
No sequence reported
(25006.6 Da)
08D
--
MHHHHHHSSGLVPRGSGMKETAAAKFERQHMD
12P
53622
No sequence reported
16X-A
--
No sequence reported
16X-B
50233
No sequence reported
05S
--
No sequence reported
08D
--
(methyl-M L R P V E T P) T R E I K K L D G L A Q H
12P
44611
No sequence reported
16X-A
--
(methyl-M L R P V E T P) T R E (I/L) K K (I/L) D G (L A Q) H D
16X-B
44613
No sequence reported
05S
--
No sequence reported
08D
19456.1
No sequence reported
12P
19433.4
No sequence reported
16X-A
--
XFQPVLXLVAXXXPX
16X-B
19447
No sequence reported
05S
--
No sequence reported
08D
20005.5
No sequence reported
12P
19951.0
No sequence reported
16X-A
--
No sequence reported
16X-B
19464
No sequence reported
Protein A
(44612.0 Da)
Endostatin Seq. 1
(19445.9 Da)
Endostatin Seq. 2
(19963.4 Da)
Amino acid sequence