Transcript www.i2bio.org
Introduction to Protein Chemistry
Gustavo de Souza IMM, OUS October 2013
Relevance of the Proteome
Relevance of the Proteome «The recipe of life» X Chocolate cake: - Egg - Flour - Sugar Baker’s yeast - Chocolate Biological relevance lies on how genes are expressed and translated to proteins, not if genes are present or not
Amino acid structure
AA side chains
Protein Translation
Peptide Bond
Primary Structure
Primary Structure > sp|F2Z333|CA233_HUMAN Fibronectin type-III domain-containing transmembrane protein C1orf233 MRAPPLLLLLAACAPPPCAAAAPTPPGWEPTPDAPWCPYKVLPEGPEAGGGRLCFRSPAR GFRCQAPGCVLHAPAGRSLRASVLRNRSVLLQWRLAPAAARRVRAFALNCSWRGAYTRFP CERVLLGASCRDYLLPDVHDSVLYRLCLQPLPLRAGPAAAAPETPEPAECVEFTAEPAGM QDIVVAMTAVGGSICVMLVVICLLVAYITENLMRPALARPGLRRHP
Folding
Primary Structure - Folding > sp|F2Z333|CA233_HUMAN Fibronectin type-III domain-containing transmembrane protein C1orf233 MR
APPLLLLLAACAPPPCAAAAPTPPGW
EPTPDAPWCPYKVLPEGPEAGGGRLCFRSPAR GFRCQAPGCVLHAPAGRSLRASVLRNRSVLLQWRLAPAAARRVRAFALNCSWRGAYTRFP CERVLLGASCRDYLLPDVHDSVLYRLCLQPLPLRAGPAAAAPETPEPAECVEFTAEPAGM QD
IVVAMTAVGGSICVMLVVICLLVAYITENLM
RPALARPGLRRHP
Folding Proteins can adopt only a limited number of different protein folds
Secondary Structure
Tertiary Structure
Quaternary Structure
Primary to Quaternary
Primary to Quaternary
What is a «protein sample» in proteomics?
RNA-binding protein modules
Take home message 1. Proteins are the functionally active molecule in a cell.
2. They possess a high degree of chemical and structural heterogeneity.
3. Heterogeneity interfere in how a protein sample can be analyzed
Challenges in Protein and Proteomic Analysis
Gustavo de Souza IMM, OUS October 2013
A dangerous idea… One gene, one protein
Homo sapiens
Complexity of Protein Samples in Eukaryotes
Complexity of Protein Samples in Eukaryotes
A less dangerous idea One gene, some proteins (let’s say average 5 per gene)
Homo sapiens
Complexity of Protein Samples in Eukaryotes PTMs (modifications that control conformation changes in histones)
An even less dangerous idea One protein, possible 8 modification sites
Homo sapiens
An even less dangerous idea
But in reality… One specific cell does NOT express all genes at once!
-Several transcriptomics studies indicated that the cells under study have ~14000 transcripts
at a certain time
Homo sapiens
A Proteome Dynamics B C Genome is a relatively static element of an organism, the proteome is changing accordingly to cell type, cell stage developmet, response to stress, etc.
Proteome dynamics within the same cell Proteome can change with the least of the stimuli within a cell
Proteome chemical heterogeneity
DNA -
Negatively charged molecule Has the same phisico-chemical features regardless of: its nucleotide sequence, its tissue source, its donor source, the species of the donor, etc.
Amino acid structure
AA side chains
Proteome chemical heterogeneity Membrane proteins
Proteome dynamic range Genome Transcription/Translation Mostly, individual genes are observed equimolar amounts in a DNA molecule Protein concentration within a cell is unique to each individual protein Difference between most and least abundant molecule = dynamic range
Proteome dynamic range
Proteome dynamic range Dynamic range of a proteome estimated to be around 10e8 (in serum is believed to be over 10e10) Geiger et al., MCP 2012
Proteome dynamic range Difference between the most and lowest abundant proteins Cytoskeleton (Actin, tubulin, vimentin) Chaperons (hsp60, hsp70, calreticulin) Metabolism (glycolisis, ribosomal) Mytochondria (respiratory chain) Structure Nucleus (histones) Organelles Protein GO classification
Signalling pathway proteins, transcription factors, etc
Instrumentation Aebersold & Mann, Nature 2003
Instrumentation - Instrumentations with different hardware generate different types of raw data.
- Different brands developed different computer formats, with need for different libraries to read the file.
- Which lead to development of a whole bunch of specific software using specific computational protocols.
- Lack of standard routine.
Take home message 1. Proteomic composition is at least 6x more complex than the genomic composition of a cell, if only number of entities is considered.
2. It is an ever changing feature, limited by spatial and time constrains.
3. Chemical properties and dynamic range has an relevant impact in success rate of identification using proteomic methods.
4. Instrumentation and Analysis is not standardized.
Introduction to Mass Spectrometry Interpreting peptide/protein data
Gustavo de Souza IMM, OUS October 2013
Lets talk about…physics
3D Quadrupole ion trap Linear Quadrupole ion trap
What is it?
-Instrument which can detect the
mass-to-charge
(m/z) of
ions (or ionized molecules)
.
a) Ionization must generate ions in gas-phase b) Ion detection is proportional to its abundance in the sample c) MS performs at extremely low pressures (vacuum) - Any molecule is ionizable: small organic/inorganic chemicals (less than 300 Da), average sized peptides or DNA fragments, intact proteins.
Mass Spectrometry Scheme
Inlet LC Ion Source MALDI ES Mass Analyzer Time-of-Flight Quadrupole Ion Trap Detector
Ion Intensity = Ion abundance
Isotopes Normally observed in nature.
Mass difference = 1 Da
What to expect from a mass spectrum Avogadro number = 6.022x10e23 /mol m/z
Peptide mass spectrum 100331_Gustavo_Tuberculosis_179rif_Rep1_07 # 2435 T: FTMS + p NSI Full ms [300.00-2000.00] 1034.49
100 RT: 38.32
AV: 1 NL: 4.95E5
95 90 85 80 1034.99
- Isotopes ( 12 C, 13 C, 14 N, 15 N) 55 50 45 40 75 70 65 60 35 30 25 20 15 10 5 0 1033.0
1033.5
1034.0
1034.5
1035.49
1035.0
m/z 1035.5
1035.99
1036.0
1036.5
1037.0
1037.5
Mass Spectrometry Scheme
Inlet LC Ion Source MALDI ES Mass Analyzer Time-of-Flight Quadrupole Ion Trap Detector
How is a sample ionized?
-Electron ionization -Chemical ionization -Fast Atom/Ion Bombardment -Field desorption -Plasma Desorption -Laser Desorption and MALDI -Thermospray -Electrospray -Atmospheric pressure chemical ionization
Matrix Assisted Laser Desorption Ionization
Peptide spectrum on MALDI
Protein spectrum on MALDI
A little history… 1985 – First use: up to a 3 kDa peptide could be ionized 1987 – Method to ionize intact proteins (up to 34 kDa) described
Instruments have no sequence capability
1989 – ESI is used for biomolecules (peptides)
Sequence capability, but low sensitivity
1994 – Term «Proteome» is coined 1995 – LC-MS/MS is implemented
«Gold standard» of proteomic analysis
A little history… - Laborious - Low reproducibility - Time consuming - Low sensitivity - Limited amount of identifications
Electrospray Ionization
Column (75 mm)/spray tip (8 mm) Reverse-phase C18 beads, 3 mm No precolumn or split Platin-wire 2.0 kV
15 cm
Sample Loading:500 nl/min Gradient elution:200 nl/min ESI
Fenn et al., Science 246:64-71, 1989.
ESI multiple charged elements + Peptides + (-NH2) + + + + + + + + + + + + Proteins + + + + + + + + + + + + + + +
+ + + + 1000 Da + + ESI multiple charged elements + + + 250.75 (+4) 334.0 (+3) 500.5 (+2) m/z
100331_Gustavo_Tuberculosis_179rif_Rep1_09 # 3828 T: FTMS + p NSI Full ms [300.00-2000.00] 766.72
100 RT: 56.72
AV: 1 NL: 1.53E7
100331_Gustavo_Tuberculosis_179rif_Rep1_09 T: FTMS + p NSI Full ms [300.00-2000.00] # 3828 T: RT: 56.72
AV: 1 NL: 2.36E6
FTMS + p NSI Full ms [300.00-2000.00] # 3828 100 RT: 56.72
AV: 1 NL: 1.53E7
766.72
95 95 95 90 709.06
90 85 80 1149.07
90 85 80 766.38
85 80 75 70 65 60 75 70 65 60 1150.07
767.05
75 70 65 867.95
55 50 45 40 35 55 50 45 40 35 1150.57
60 55 50 45 40 728.39
30 25 20 15 10 5 0 1148.0
1148.5
1149.0
30 25 20 15 10 5 0 1149.5
764.90
m/z 1151.08
1151.57
1152.08
765.39
766.04
1150.5
765.5
1151.0
766.0
1151.5
766.5
1152.0
767.0
m/z 767.38
767.43
767.72
767.5
768.05
768.39
768.0
768.5
35 30 0.5 Da (+2) 0.33 Da (+3) 25 20 653.33
578.64
15 10 5 0 557.31
343.21
483.80
400 600 800 1149.57
921.51
1063.09
1000 1227.11
1346.65
1453.23
m/z 1200 1400 Mr = 2297.14 Da 1600 1682.72
1891.35
Peptides on ESI 1800 2000 769.0
769.5
ESI of intact protein *
Mass Spectrometry Scheme
Inlet LC Ion Source MALDI ES Mass Analyzer Time-of-Flight Quadrupole Ion Trap Detector
Time-of-flight How is an ion mass measured?
m/z
How is a ion mass measured?
Quadrupoles (RF)
How is a ion mass measured?
Orbitraps
Tandem Mass Spectrometry
Inlet Ion Source Mass Analyzer Detector Ion Source Mass Analyzer Mass Analyzer Mass Analyzer Detector
Collision cell
Data Dependent Acquisition MS1 (or MS)
899.013
*
899.013
MS2 (or MS/MS)
899.013
Important Parameters in MS - Resolution - Sensitivity Dynamic Range… m/z m/z
2+ High resolution in MS Expected mass Observed mass 2+ m/z 1. mass accuracy m/z
High resolution in MS
600 400 200 0 -200 -400 -600 0 Ion trap (LTQ) Mass accuracy 1000 Av. = 65.8 ppm ± 2000 71.5
3000 Mass [Da] 30 20 10 0 -10 500 -20 -30 FTICR MS (LTQ-FT, 500K) 1000 1500 2000 2500 Av. = 2.1 ppm ± 1.9 3000 Mass [Da] 2 1 0 -1 0 -2 -3 60 40 20 0 -20 -40 -60 0 qTOF Mass Accuracy (QSTAR) 1000 2000 Av. = 16.5 ppm ± 11.2 3000 Mass [Da] FTICR MS SIM (LTQ-FT, 50K) 1000 2000 3000 Av. = 0.68 ppm ± 0.47 Mass [Da] 4000
1. mass accuracy
RT 2+
3+
High resolution in MS 2+
3+
RT m/z 2. Peak separation m/z
LC-MS/MS
With all we (hopefully) learned so far 1) Use strong detergent for cell lysis and protein solubization (SDS, Triton, NP40, Tween) 2) LysC (cuts C-terminal side of K) and/or Trypsin (C-terminal of K and R)
With all we (hopefully) learned so far ADFFFSTTHAAS
R
MSHHHGTYYPPH
KR
FSDDDDT ADFFFSTTHAAS
R
MSHHHGTYYPPH
K
FSDDDT + + Arg Lys
With all we (hopefully) learned so far 3) Nano-LC (300nL/min) 5) Quadrupole-Orbitrap (QExactive)
With all we (hopefully) learned so far Mobile phase 20 s A B C18 column, 25cm long Time A = 5% organic solvent in water B = 95% organic solvent in water
With all we (hopefully) learned so far MS1 (or MS)
899.013
899.013
MS2 (or MS/MS)
899.013
With all we (hopefully) learned so far Orbitrap Quadrupole
With all we (hopefully) learned so far From Michalski et al., MCP 10, 2011.
172,800
Take home message
- Mass spectrometry is used to analyze the molecular mass of molecules. - Great diversity of hardware and principles. Different forms of Ionization and Mass measurement.
- For protein ID, information regarding the mass of a integral peptide and the mass of its fragments is enough to provide identification