Peptide Mass Fingerprinting for Protein identification

Download Report

Transcript Peptide Mass Fingerprinting for Protein identification

Peptide Mass Fingerprinting
Manimalha Balasubramani
Genomics and Proteomics Core
Laboratories
Genomics and Proteomics Core Lab website
www.genetics.pitt.edu
GPCL Inventory
ABI Voyager DE PRO, user operated
ABI 4700 Proteomics Analyzer
Thermoelectron LCQ Deca with Surveyor HPLC
ABI Qstar Elite with Ultimate 3000 HPLC
Bruker micrOTOF with Ultimate 3000 HPLC
Bruker 12 Tesla FTMS with Ultimate 3000 HPLC
4700 Proteomics Analyzer, ABI
Voyager DE PRO, ABI
micrOTOF, Bruker
LCQ Deca XP, Thermofisher
Qstar Elite, ABI
12T FT MS, Bruker
Peptide mass fingerprinting (PMF)
is a technique for protein and
peptide identification
Outline
• PMF Workflow:
– Sample preparation
– Mass spectra: MS, and MS/MS
– Database searches
• Examples, hands-on exercises
• Contaminants, post-translational
modifications, enzyme digestions
• Evaluating PMF analysis
PMF: Sample preparation
Gel separation – 1D or 2D
Trypsin
Digest
Excise
Spot
Protein
Peptide fingerprint
Peptides
Mass Spectra are acquired with..
MALDI TOF MS (Voyager DE PRO, ABI)
MALDI TOF/TOF MS (4700 Proteomics Analyzer, ABI)
MALDI – Matrix Assisted Laser Desorption Ionization
TOF
– Time Of Flight
MS
– Mass Spectrometry
10
0
699.0
20
1081. 5479
1159.2
1399. 7751
1554. 7437
1619.4
Mass (m/z)
2079.6
Mass to charge ratio (m/z)
2539.8
2555. 2903
2493. 3501
2458. 3052
2262. 0557
1895. 0386
1763. 7820
1730. 7723
1687. 8691
1640. 0277
1590. 8619
2045. 1273
1881. 0223
1724. 9272
1567. 8276
1479. 8824
100
1516. 7135
80
1439. 8967
90
1433. 8074
40
1305. 7888
1249. 6954
30
1283. 7881
60
1163. 7000
70
1195. 6243
1121. 5520
927. 5582
50
1014. 6827
898. 5428
841. 5205
789. 5378
% I n ten si ty
Intensity
Mass Spectrum: MS
4700 Reflector Spec #1 M C=>TR[BP = 1479.9, 15779]
1.6E+4
3000.0
FWHM
Full width at half maxima of a peak
Source: wiki
Resolution and mass accuracy
Δm measured at
50% peak
height is the Full
Width at Half
Maxima (FWHM)
R= M
Δm
R
= resolution
M
= mass of the peak of interest
Δ m = width in daltons of the peak
Ubiquitin ESI Spectra on 12T FT-ICR
Mass Error > 0.56 ppm
Ubiquitin ESI Spectra on 12T FT-ICR
Mass Error < 0.56 ppm
Ubiquitin ESI Spectra 12T FT-ICR
Resolution > 175,000
Mass accuracy is measured as parts per million value
ppm = 106Δm = 106
M
R
10
0
699.0
20
1081. 5479
1159.2
1399. 7751
1554. 7437
1619.4
Mass (m/z)
2079.6
2539.8
2555. 2903
2493. 3501
2458. 3052
2262. 0557
1895. 0386
1763. 7820
1730. 7723
1687. 8691
1640. 0277
1590. 8619
2045. 1273
1881. 0223
1724. 9272
1567. 8276
1479. 8824
100
1516. 7135
80
1439. 8967
90
1433. 8074
40
1305. 7888
1249. 6954
30
1283. 7881
60
1163. 7000
70
1195. 6243
1121. 5520
927. 5582
50
1014. 6827
898. 5428
841. 5205
789. 5378
% I n t en si t y
Peptide Mass Fingerprint
4700 Reflector Spec #1 M C=>TR[BP = 1479.9, 15779]
1.6E+4
3000.0
Mass spectrum processing, calibration
• External calibration
• Internal calibration
– trypsin autodigestion peaks
– Keratin peaks
– Spiking with an internal standard
Peak List
• Spectrum viewer
• Compiled from the mass spectra
– Mass list
– Mass list and intensity
• Peak list is submitted for Database
searching
Database searching
Gel separation – 1D or 2D
1479. 8824
4700 Reflector Spec #1 M C=>TR[BP = 1479.9, 15779]
100
0
699.0
2079.6
2555. 2903
2458. 3052
2262. 0557
1895. 0386
1619.4
2493. 3501
1881. 0223
1567. 8276
1730. 7723
1554. 7437
1640. 0277
1763. 7820
1687. 8691
1590. 8619
1724. 9272
1163. 7000
1305. 7888
1249. 6954
1014. 6827
1081. 5479
1121. 5520
1159.2
1195. 6243
841. 5205
789. 5378
10
898. 5428
20
1283. 7881
40
30
1399. 7751
50
1433. 8074
927. 5582
% I n t en si t y
60
2045. 1273
1439. 8967
1.6E+4
80
70
1516. 7135
Trypsin
Digest
Excise
Spot
90
2539.8
3000.0
Mass (m/z)
Protein
Peptides
Mass spectrum (MS)
Peak List
820.7
842.5
1012.6
1296.6
1555.7
……...
Reports Protein
Identification
Algorithm
compares peak
lists
Database
Eg. Protein databases Non-redundant NCBI,
Swiss-Prot,
IPI, etc.
Peak Lists
In silico digest
820.7
842.5
1012.6
1296.6
1555.7
……...
Description of database searching
using Mascot program
- At GPCL, 4800 Proteomics analyzer data
is presented to the Mascot webserver
through ProteinPilot
- Mascot can be accessed through the web
- http://www.matrixscience.com
Mascot scoring
A frequency factor matrix, F, is created, in which each row represents an interval of
100 Da in peptide mass, and each column an interval of 10 kDa in intact protein
mass. As each sequence entry is processed, the appropriate matrix elements fi,j are
incremented so as to accumulate statistics on the size distribution of peptide masses
as a function of protein mass. The elements of F are then normalised by dividing the
elements of each 10 kDa column by the largest value in that column to give the
Mowse factor matrix M:
After searching the experimental mass values against a calculated peptide mass
database, the score for each entry is calculated according to:
Where MProt is the molecular weight of the entry and the product term is calculated
from the Mowse factor elements for each match between the experimental data and
peptide masses calculated from the entry.
Source:
http://www.matrixscience.com/
PMF search page
Parameters used in database
searching
•
•
•
•
•
Database searched
Taxonomy
Enzyme
Missed cleavages
Fixed versus variable modifications
(PTMs)
• MW and pI
• Mass tolerance
Oxidation of methionine in proteins and peptides
+16 Da
+32 Da
From Ionsource.com
S-carboxymethylation of the amino acid residue cysteine with the
alkylating agent iodoacetic acid
Or s-carbamidomethylation with iodoacetamide (+57 da)
+ 58 Da
From Ionsource.com
Databases: NCBI
nr.*tar.gz
non-redundant protein sequence database with
entries from GenPept, Swissprot, PIR, PDF, PDB,
and NCBI RefSeq
Swiss-Prot, IPI, others
Submit a peak list to Mascot
1075.513062
1086.581177
1090.547241
1092.517822
1100.630249
1103.572754
1106.553223
1107.529663
1118.498779
1119.519531
1121.509644
1129.604492
1141.572388
1156.586792
1166.537231
1170.607422
1172.612183
1179.590332
1194.604126
1217.567749
1232.610474
1252.583740
1308.654297
1312.705811
1314.744385
1337.672485
1401.651245
1424.745728
1427.830566
1435.718872
1475.762695
1479.710327
1493.734131
1502.774780
1530.834717
1575.850952
1607.807007
1629.868408
1639.935425
1752.863892
1753.904663
1754.915161
1791.744507
1792.805054
1794.820801
1816.801392
1875.976196
1902.006104
1940.941650
1960.053345
1962.928955
2211.118652
2225.130371
2233.105225
2249.076660
http://matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=PMF
Mascot PMF report
Hands-on exercise
• Go to Desktop
– open txt file
• copy and paste in Mascot search page
– Specify search parameters
» Allow 100ppm error for PMFal_100.txt
» Allow 25ppm error for PMFgd_25.txt
Not all peaks are matched –why?
Matched
Peptides
• Theoretical peptide list
– peptides lengths vs. MS range
– Enzyme – missed/non-specific cleavage
– Incorrect ORF
– Amino acid substitutions
– Ion suppression/efficiency
Not all peaks are matched –why?
• Experimental peptide list
– Contaminants
•
•
•
•
Trypsin autolysis peptides
Hair, skin keratins
Matrix molecules, clusters
Unknown contaminants
– Modifications
• PTM’s – known and unknown, biological origin
• Oxidized methionines, – gel induced artifacts
• Chemical – cysteine carbamidomethylation, sample handling
introduced
• Adducts
• Amino acid substitutions
• Splice variant
Database search takes into account contaminants,
modifications, For eg.
Evaluating PMF analysis
• Acceptable hit
– High score
– Major peaks accounted for
• No hit
– Insufficient data – low intensity MS
– Single gel band contains >2-3 proteins
– Protein not represented in database – ORF/genome
• Further analysis
– MS/MS confirmation of few major peaks, unaccounted peaks –
Ideal
– Low score, good spectrum – LC MS/MS
– Low score, low intensity spectrum – concentrate sample,
reacquire
– High score, some unaccounted peaks – MS/MS
MS/MS
• Plot of m/z versus intensity
• At GPCL,
– MALDI TOF/TOF MS
– ESI QqTOF MS
– ESI IT MS
– MALDI/ESI FT ICR MS
Tandem MS
4700 Proteomics Analyzer, Applied Biosystems
MS
MS, followed by precursor ion selection
1570. 6766
4700 Reflector Spec #1 MC=>TR[BP = 1570.7, 3840]
100
3840.4
60
2465. 1987
50
2093. 0872
40
30
20
10
0
800
1180
1829. 9774
1552. 6698
% I n t en si t y
70
1296. 6848
80
904. 4686
90
1560
1940
Mass (m/z)
2320
2700
Fragment ion spectrum
Tandem MS
4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106]
175.1326
100
3105.9
90
1056.5107
80
1554.7853
% I n t en si t y
70
1571.9679
684.3845
60
1556.5172
50
40
112.0977
30
1558.4042
246.1672
20
333.2105
316.1747
120.0979
10
72.1029
0
69.0
229.1560
813.4371
480.2749
463.2531
400.2173
490.3423
386.8
1441.7213
741.3559
758.3326
627.3450
629.3128
837.0470
704.6
942.4836
910.8679
1039.4810
1040.9976
1022.4
Mass (m/z)
1171.5131
1268.5427
1445.2834
1340.2
1559.9417
1570.2634
1551.7002
1658.0
Tandem mass spectrum
http://qbab.aber.ac.uk
Tandem mass spectra (MS/MS) can be used for peptide sequencing
Database Searching
•Peptide Mass Fingerprinting
•Sequence tag approach
De novo sequencing
inspect raw data
http://qbab.aber.ac.uk
Mascot Search Results
Search title
: SampleSetID: 362, AnalysisID: 567, MaldiWellID:
15790, SpectrumID: 17225, Path=\Mani\102004\New Analysis 1
Database
: NCBInr 20040606 (1846720 sequences; 611532004
residues)
Timestamp
: 20 Oct 2004 at 14:52:50 GMT
Top Score
: 681 for gi|180570, creatine kinase [Homo sapiens]
Probability Based Mowse Score
Score is -10*Log(P), where P is the probability that the observed match is a random
event. Protein scores greater than 75 are significant (p<0.05).
Top hits from Mascot Search – there
are multiple accession numbers for the same
protein
Accession
Mass
Score Description
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
gi|180570
gi|21536286
gi|33304149
gi|125292
gi|180572
gi|125295
gi|180555
gi|203476
gi|31542401
gi|203474
gi|40807002
gi|47477783
gi|13096153
gi|12852054
gi|10946574
gi|47213348
gi|627264
gi|27503418
gi|45384340
gi|6573489
42591
42617
42730
42674
42658
42636
42460
40598
42685
42699
44540
44782
42551
42700
42686
42953
40353
42214
42844
42713
681
681
681
568
538
514
507
473
471
471
469
469
441
427
427
237
236
235
209
201
creatine kinase [Homo sapiens]
brain creatine kinase; creatine kinase-B [Homo sapiens]
creatine kinase, brain [synthetic construct]
CREATINE KINASE, B CHAIN (B-CK) [Cannis familiaris]
creatine kinase-B
CREATINE KINASE, B CHAIN (B-CK)
creatine kinase-B
creatine kinase-B
creatine kinase, brain [Rattus norvegicus]
creatine kinase
Unknown (protein for IMAGE:5598839) [Rattus norvegicus]
Ckb protein [Rattus norvegicus]
Chain A, Crystal Structure Of Bovine Retinal Creatine Kinase
unnamed protein product [Mus musculus]
creatine kinase, brain [Mus musculus]
unnamed protein product [Tetraodon nigroviridis]
creatine kinase (EC 2.7.3.2) isozyme IV - African clawed frog
Ckb-prov protein [Xenopus laevis]
B-creatine kinase [Gallus gallus]
Chain A, Crystal Structure Of Chicken Brain-Type Creatine Kinase
Search returns a cluster of proteins
with the same matching peptides
1. gi|180570
Observed
1232.62
1232.62
1254.57
1303.70
1303.70
1458.70
1586.81
1586.81
1656.79
1657.80
1657.80
1848.94
1864.93
1964.88
1964.88
2120.98
2120.98
2169.91
2225.06
2439.08
2439.08
2518.10
2518.10
3753.61
3753.61
Mr(expt)
1231.61
1231.61
1253.56
1302.70
1302.70
1457.69
1585.80
1585.80
1655.79
1656.79
1656.79
1847.93
1863.92
1963.88
1963.88
2119.97
2119.97
2168.91
2224.05
2438.07
2438.07
2517.09
2517.09
3752.60
3752.60
4. gi|125292
Observed
1254.57
1303.70
1303.70
1458.70
1586.81
1586.81
1624.76
1848.94
1864.93
1964.88
1964.88
2120.98
2120.98
2169.91
2225.06
2439.08
2439.08
2518.10
2518.10
3753.61
3753.61
Mr(expt)
1253.56
1302.70
1302.70
1457.69
1585.80
1585.80
1623.75
1847.93
1863.92
1963.88
1963.88
2119.97
2119.97
2168.91
2224.05
2438.07
2438.07
2517.09
2517.09
3752.60
3752.60
Mass: 42591
Mr(calc)
1231.61
1231.61
1253.58
1302.72
1302.72
1457.67
1585.83
1585.83
1655.82
1656.83
1656.83
1847.97
1863.97
1963.92
1963.92
2120.02
2120.02
2168.96
2224.17
2438.14
2438.14
2517.16
2517.16
3752.73
3752.73
Delta
0.00
0.00
-0.02
-0.02
-0.02
0.02
-0.03
-0.03
-0.03
-0.04
-0.04
-0.04
-0.04
-0.05
-0.05
-0.05
-0.05
-0.05
-0.12
-0.07
-0.07
-0.07
-0.07
-0.13
-0.13
Mass: 42674
Mr(calc)
1253.58
1302.72
1302.72
1457.67
1585.83
1585.83
1623.85
1847.97
1863.97
1963.92
1963.92
2120.02
2120.02
2168.96
2224.17
2438.14
2438.14
2517.16
2517.16
3752.73
3752.73
Delta
-0.02
-0.02
-0.02
0.02
-0.03
-0.03
-0.10
-0.04
-0.04
-0.05
-0.05
-0.05
-0.05
-0.05
-0.12
-0.07
-0.07
-0.07
-0.07
-0.13
-0.13
Score: 681
Start
87
87
97
33
33
139
157
157
367
224
224
342
342
321
321
320
320
14
157
12
12
108
108
97
97
-
End
96
96
107
43
43
151
172
172
381
236
236
358
358
341
341
341
341
32
177
32
32
130
130
130
130
Score: 568
Start
97
33
33
139
157
157
367
342
342
321
321
320
320
14
157
12
12
108
108
97
97
-
End
107
43
43
151
172
172
381
358
358
341
341
341
341
32
177
32
32
130
130
130
130
creatine kinase [Homo sapiens]
Miss
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
0
1
1
1
0
0
1
1
Ions
45
---------54
---81
------47
------------139
---27
------31
---92
------55
Peptide
DLFDPIIEDR
DLFDPIIEDR
HGGYKPSDEHK
VLTPELYAELR
VLTPELYAELR
GFCLPPHCSRGER
LAVEALSSLDGDLAGR
LAVEALSSLDGDLAGR
LEQGQAIDDLMPAQK
TFLVWVNEEDHLR
TFLVWVNEEDHLR
LGFSEVELVQMVVDGVK
LGFSEVELVQMVVDGVK
GTGGVDTAAVGGVFDVSNADR
GTGGVDTAAVGGVFDVSNADR
RGTGGVDTAAVGGVFDVSNADR
RGTGGVDTAAVGGVFDVSNADR
FPAEDEFPDLSAHNNHMAK
LAVEALSSLDGDLAGRYYALK
LRFPAEDEFPDLSAHNNHMAK
LRFPAEDEFPDLSAHNNHMAK
TDLNPDNLQGGDDLDPNYVLSSR
TDLNPDNLQGGDDLDPNYVLSSR
HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
CREATINE KINASE, B CHAIN (B-CK)
Miss
0
0
0
1
0
0
0
0
0
0
0
1
1
0
1
1
1
0
0
1
1
Ions
------54
---81
---------------139
---27
------31
---92
------55
Peptide
HGGYKPSDEHK
VLTPELYAELR
VLTPELYAELR
GFCLPPHCSRGER
LAVEALSSLDGDLAGR
LAVEALSSLDGDLAGR
LEQGQAIDDLVPAQK
LGFSEVELVQMVVDGVK
LGFSEVELVQMVVDGVK
GTGGVDTAAVGGVFDVSNADR
GTGGVDTAAVGGVFDVSNADR
RGTGGVDTAAVGGVFDVSNADR
RGTGGVDTAAVGGVFDVSNADR
FPAEDEFPDLSAHNNHMAK
LAVEALSSLDGDLAGRYYALK
LRFPAEDEFPDLSAHNNHMAK
LRFPAEDEFPDLSAHNNHMAK
TDLNPDNLQGGDDLDPNYVLSSR
TDLNPDNLQGGDDLDPNYVLSSR
HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR
Creatine kinase B is the highest
scoring protein
Match to: gi|21536286 ; Score: 681
Creatine kinase - B [Homo sapiens]
Nominal mass (Mr): 42591; Calculated pI value: 5.34
Observed Mass & pI: 43kd, 6.2-6.27
Sequence Coverage: 46%
1 MPFSNSHNAL KLRFPAEDEF PDLSAHNNHM AKVLTPELYA ELRAKSTPSG
51 FTLDDVIQTG VDNPGHPYIM TVGCVAGDEE SYEVFKDLFD PIIEDRHGGY
101 KPSDEHKTDL NPDNLQGGDD LDPNYVLSSR VRTGRSIRGF CLPPHCSRGE
151 RRAIEKLAVE ALSSLDGDLA GRYYALKSMT EAEQQQLIDD HFLFDKPVSP
201 LLSASGMARD WPDARGIWHN DNKTFLVWVN EEDHLRVISM QKGGNMKEVF
251 TRFCTGLTQI ETLFKSKDYE FMWNPHLGYI LTCPSNLGTG LRAGVHIKLP
301 NLGKHEKFSE VLKRLRLQKR GTGGVDTAAV GGVFDVSNAD RLGFSEVELV
351 QMVVDGVKLL IEMEQRLEQG QAIDDLMPAQ K
GPCL resources for Bioinformatic
analysis
• Mascot version 2.1.0, Matrix Science Ltd
– Mascot Daemon
• ProteinPilot software 2.0, Applied
Biosystems/MDS Sciex
– Paragon algorithm
– And Mascot algorithm
• Sequest, Thermoelectron
Selected list
Resources
http://www.hsls.pitt.edu/guides/genetics/obrc
/proteomics
..its high-throughput…
1st Dimension - Isoelectric focussing
2nd Dimension – SDS PAGE
Spot picking
Trypsin gel digest
Sample separation..
In-solution
Isoelectric
focussing
HPLC
1D or 2D LC MALDI
GPCL services..
• Fee for service model
• Support investigators
– Scientific expertise
– Technical expertise
– Grant submission
Genomics and Proteomics Core Laboratories
Paul Wood
Director
Janette Lamb
Assistant Director
Proteomics Lab
Chris Bolcato
John Cardamone
Emanuel M Schreiber
Guy Ueichi
James Porter
Robert Wolfe
Jason Sun
Billy W. Day
Scientific Director
A mass spectrum
•
•
•
•
•
•
Plot of m/z versus intensity
MALDI TOF (/TOF) MS
ESI TOF MS
ESI QqTOF MS
ESI IT MS
MALDI/ESI FT ICR MS
Mass analyzers – several
designs
Aebersold and Mann, Nature review, 422, p198, 2003
QqTOF MS/MS
Each search engine scores differently
SEQUEST
Each search
engine identifies
about the same
number of
spectra,
But the overlap is
surprisingly small.
9%
22%
4%
Different search
engines match
different spectra.
34%
X!tandem
19%
7%
5%
Courtesy: Proteome Software Inc.
Mascot
James Lyons-Weiler
Scientific Director
Bioinformatics Analysis Core
(412) 393-2087 (office)
(412) 728-8743 (cell)
Fax: 412-648-1891