Transcript Document

Practical tips for cloning, expressing and
purifying proteins for structural biology
Aled Edwards
Banting and Best Department of Medical Research
University of Toronto, Canada
[email protected]
Affinium Pharmaceuticals
Toronto, Canada
[email protected]
Molecular biological approaches to structural biology
An excellent structural sample usually has the following properties
• Lack of conformational heterogeneity
• Soluble at high concentrations
• Pure
Molecular biology is probably fastest way to transform “poor”
sample into an “excellent” one.
Outline
• Historical perspective on engineering proteins for structural biology
• Practical advice for cloning/purification of structural samples
• Ancillary benefits of high-throughput studies
RNA polymerase II
From 15Å to 3Å by eliminating heterogeneity
Another source of sample heterogeneity
Eukaryotic proteins comprise multiple domains
• Conformational heterogeneity lowers probability of crystallization
• Protein domains
• Are resistant to proteolysis
• Fold autonomously
• Can usually be expressed in bacteria
• Are between 15 and 30kDa (NMR or X-ray size)
• Are fundamental unit of protein function
• Domains are often only tractable targets for HTP crystallography
EBNA1 DNA-binding domain
(No sequence homologue in database)
RPA Domain Structure
A collection of OB-folds
RPA70
A
RPA32
RPA14
B
RPA crystallization
• Start with full-length protein purified using baculovirus (Wold)
• Identify domain (aa 1-442) soluble in E coli (Wold)
• Crystallize domain (7Å)
• Use limited proteolysis to define smaller domain (aa161-442)
(3.5Å….and same cell as 7Å crystal)
• Create many constructs varying N- and C-termini to identify
final construct (aa 181-422). (2.2Å…solve structure)
Final tally: 15 different constructs
RPA70 Domains A and B
Two OB-folds bound to DNA
B
L12 loops
L45 loops
A
How does one map domains?
Domain mapping using limited proteolysis
TFIIS
Protease
Integrative Proteomics
TFIIS Domain Structure
240
309
264
1
131
124
Binds holoenzyme.
Similar to elongin,
CRSP70
RNA polymerase binding
I
II
Transcript cleavage
and read-through
(Nucleic acid binding?)
III
DomainHunterTM
Industrialized Domain Mapping
•Partial proteolysis in 96 well plates
•Optimized set of proteases
•Low protein requirement
•No SDS-PAGE
•No N-terminal sequencing
•Direct identification of domains by mass spectrometry
21952
33318
0
0.1
0.25
-0.4
1.0
-0.6
2.5
-0.8
5
25
-1.0
23000
28000
33000
m/z
Protease Titration
-0.2
21612
-0.0
20507
0.2
25360
23332
r.i.
35057
31650
DomainHunterTM
DomainHunter Applied to NMR Sample
Residue Number
N
20
40
60
80
100
120
140
V8 cleavage site
B
C
Chymotrypsin site
A
D
Fragment Mass
B
C
A
D
A
10324.0
12352.0
9131.0
11159.0
Matching sequence
Expression
G[44-133]R
G[44-150]D
I[55-133]R
I[55-150]D
+++
no
++
no
B
Solubility
++
++
Structural Proteomics
MTH40
MTH1615
MTH152
MTH1184
MTH1175
MTH538
MTH150
MTH1790
MTH129
MTH1699
Nat. Str. Biol. Oct/Nov 2000
MTH1048
5 more
done
3 more
soon
Molecular biology for crystallization and
for large-scale studies
1.
Basic steps in creating expression vectors for E. coli
2.
Practical tips for making fewer mistakes
3.
Application of methods to higher-throughput
4.
Alternate expression systems
5.
Some results
E coli is the first choice……why?
• Cost effective
• Easy to grow
• Abundance of expertise and reagents
• Easy to incorporate selenomethionine
• High yield
• Rapid doubling time and rapid scale-up
Factors involved in successful expression of recombinant proteins
in Escherichia coli cytoplasm
Expression vector
Copy number (gene dosage – sometimes better less than more)
Promoter choice (T7, Ptac, Plac, Para )
Little or no expression before induction
Reliable and adjustable expression
mRNA stability (RNAaseE- mutant)
Translation
Consensus SD sequence
Proper spacing and sequence before the initiation codon
Possible mRNA secondary structures that block ribosome binding or
internal ribosome binding site
Codon Bias
But which E coli?
BL21(DE3)
F- ompT hsdSB (rB-,mB-), gal, dcm, (DE3)
BL21-Star(DE3)
F- ompT hsdSB (rB-,mB-), gal, dcm, rne131, (DE3)
BL21-Gold(DE3)
F- ompT hsdS (rB- mB-) dcm+ Tetr gal endA (DE3)
Tuner(DE3)
F- ompT hsdSB (rB- mB-) gal dcm lacY1 (DE3)
Conventional cloning approach
1.
Select vector of choice
2.
Restriction digest the vector
3.
PCR the insert
4.
Restriction digest the insert
5.
Ligate the vector and insert
6.
Transform and plate
7.
Pick colonies and screen for insert
8.
Screen positive clones for protein expression
9.
Sequence positive clones
Which vector/tag?
1.
T7 RNA polymerase-based systems is overwhelming choice
- Highly specific
- High yields
- Exquisitely controlled
2.
Choice of vector
- Restriction sites (are there internal sites in gene?)
- Are there many possible sites?
- Are the enzymes commonly available?
- Do the enzymes cut near ends of DNA fragments?
3.
Which tag?
- Relatively little data on which generates best proteins for
crystallization
- His-tag, GST, MBP all are effective at purification
- His tag offers advantage of being able to screen +/- tag
for crystals (double bang for the buck)
- Make sure there is a protease site to remove tag
Practical issues with cloning
1.
Choice of protease???
- Thrombin (more difficult to get but highly effective)
- TEV, recombinant with his-tag, stable mutant with
less autoproteolysis activity (Waugh), needs calcium,
finicky
- Factor X, enterokinase…..avoid
“I can’t use thrombin, it digests my protein”
Purification of Thrombin from Thrombostat
1.
We start with 10,000 units of Thrombostat from
Parke-Davis and dissolved in 10 ml of 50Mm NaPO4
Ph6.5 and 5% glycerol.
2.
The solution was then spun at 10,000rpm for 10 min
in an SS34 rotor to clarity
This was then loaded onto a Poros S Column (7.5mmX
100mm, Perseptive Biosystems) preequilibrated in the
above buffer at 3ml/min
The column was then washed in the above buffer until
the OD 280 reached zero.
The column was then washed with 100Mm NaPO4 Ph6.5
and 5% glycerol until the absorbance went to zero.
Thrombin was then eluted from the column in 300Mm
NaPO4 Ph8.5 and 5% glycerol at a flow rate of
1ml/min. 0.5 ML fractions were collected and run
out on a 15% SDS-PAGE and 35kD protein (Thrombin)
was pooled and frozen in small aliquots.
Total protein yield was about 3mg in 10 ml of
buffer.
3.
4.
5.
6.
7.
Schleiff, E., Khanna, R.,Orlicky, S. and Vrielink, A.
Expression, purification, and
in vitro characterization of
the human outer mitochondrial membrane receptor human
translocase of the outer mitochondrial membrane 20.
Arch.
Biochem. Biophys.367:95-103 (1999)
Practical issues with cloning
Restrict the plasmid
- Double digestion often leave one end undigested,
which in turn results in high background due to
re-ligation
- Phosphatase treatment and gel purification of
large prep makes life much easier in long run
- Optimize system to get no background
Practical issues with cloning
PCR the insert
- For HTP studies need to optimize condition for genome or clone
- Order primers from reputable supplier (most common
problem is in deprotecting oligos)
- Have someone else double-check primer sequence
- Order primers with requisite overhang (be over-cautious)
- Use error-correcting polymerase
Practical issues with cloning
Digest the PCR insert
- Make sure that there are no internal sites
- Purify the restricted product
Practical issues with cloning
Ligation and transformation
- If vector control background is low, and PCR product is
purified, then should be no problem
- Use highly competent cells
Practical issues with cloning
Screen for positive clones
- PCR screen from colony
- Screen by protein expression
- Make note of expression, as well as solubility
Cloning (conventional method)
gene
T7
6His TEV
STOP
T7
TEV 6His STOP
T7
6His
MBP
TEV
STOP
T7
6His
TRX
TEV
STOP
Screening for inserts by PCR
Clones
TOPO cloning
GATEWAY™ Cloning System Technology - l Phage
l
attP
E.coli
attB
IHF, Int, Xis
attR
IHF, Int
attL
attP
attB
attL
attL+attR
attR
attB+attP
E.coli lysogen
GATEWAY™ Cloning System Technology - l Phage
l
attP attP
attP1attP2
?
E.coli
attB1 attB2
attB
attB
attP1
attP2
attL1
IHF, Int
IHF, Int, Xis
x
?
attB1
attR1
attB2
attL1
attR1
?
attL2
attR2
attB1 x attP1
attR1 x attL1
attB2 x attP2
attR2 x attL2
? attL2
x
attR2
“Gateway type” cloning
“Gateway type” cloning
Cloning and Test Expression
ligate
transform
clones
X 96
PCR x96
300 ul
24 x 3ml LB
Kan, Amp
37C, Induce at OD600
Grow O/N 15C or 20C
Kan, Amp
X 96
300 ul
X 96
X 96
supernatant
Spin,
Dissolve pellet in SDS
Spin, Freeze, Lyse with BugBusterTM
Spin again
SDS PAGE
1750 clones
100
90
80
70
60
50
40
30
20
10
0
cloned
expressed
soluble
Expression systems for eukaryotic proteins
• Baculovirus infection of insect cells
• Simple, relatively cost effective, selenomethionine-compatible,
not fully able to replicate human post-translational modifications
• Viral infection of human cells
• Viruses not as easy to work with, high yield, proper modification
• Stable transformation of human cells
• Usually lower expression. After selection, transcription
sometimes goes away. Low throughput due to selection process
• Transfection of human cells
• High expression in few cells, uses up lots of DNA
Generation of recombinant baculoviruses and gene expression
with the Bac-To-Bac expression system
E.coli (Lac7-)
Containing Recombinant Bacmid
Competent DH10Bac E.coli cells
p10
pPolh
Foreign
Gene 2
lac Z mini attTn7
Foreign
Gene 1
pFastBacDual
Donor
Tn7R
Transposition
Transformation
Antibiotic selection
Helper
Tn7L
Day 1
Recombinant Gene
Expression
or
Viral Amplification
Recombinant
Bacmid DNA
Infection
Bacmid
Helper
Days 2-3
Mini-prep of High
Molecular Weight DNA
Day 8
Day 4
Recombinant
Bacmid DNA
Transfection of
Insect cells with
CELLFECTIN Reagent
Protein Purification
Purification parallel des proteines
1.
2.
1 2 3 4 5 1’ 2’ 3’ 4’ 5’
ProteoMax – Automated Protein
Purification and Concentration System
Affinium Pharmaceuticals
A few observations from our work
Structure determination strategy
< 20 kDa
3-5 weeks of
NMR data collection
15N-labeled
15N/13C-labeled
> 20 kDa
Synchrotron
Data
Se-Methionine
labeled
Orthologues
68 Escherichia coli
68 Thermotoga maritima
Topt 80 °C
Topt 37 °C
1,860,725 bp
4,639,221 bp
1, 877 ORFs
4, 288 ORFs
Expressed & soluble
62
48
Concentratable to
> 2mg/ml
50
44
15
35
9
9 Proteins could not be
purified from either
species
Total Crystals (30)
T. maritima
E. coli
11
3
13
Total Good/Promising NMR spectra (14)
T. maritima
E. coli
4
4
2
NMR & Crystallography: complementary!
24 small proteins for which both crystal trials and NMR data collected
Good/promising
HSQC
crystals
10
3
6
Of 32 proteins that gave poor HSQC’s
7 have crystallized
Data storage and Mining: Defined Vocabulary
Property
Vocabulary
Expression level
0-5 (no expression – high expression)
Solubility (test expression)
0-5 (insoluble – highly soluble)
Concentratability
0-5 (or mg/ml)
Crystal trials
clear
precipitate
crystal
Initial HSQC NMR
good
promising
poor
Expression/solubility testing
5
5
4
3
2
1
0
0
Empirical Bioinformatics
Solubility Tree based
On 58 sequence properties
Kluger & Gerstein
Mostly soluble
Mostly insoluble
Crystallization
conditions
Efficiency through mining crystal screens
Different proteins
Clear drop
Precipitate
Crystal
Affinium Pharmaceuticals
Crystal trial: Diminishing Returns
Collaborators on Structural Proteomics
Lawrence McIntosh (UBC)
C. Mackereth, G. Lee
Thomas Szypersky*
(SUNY Buffalo)
Mike Kennedy (PNNL)*
J. Cort, T. Ramelot
Mark Gerstein (Yale) *
Yval Kluger
Ning Lan
Kalle Gehring (McGill)
I. Ekiel
G. Kozlov
Dave Wishart (U. Alberta)
S. Bhattacharyya
Sherry Mowbray (Sweden)
Liang Tong (Columbia) *
John Hunt (Columbia) *
Andrzej Joachimiak (ANL)*
Weontae Lee (Yonsei U.)
Guy Montelione (Rutgers) *
Emil Pai (U. Toronto)
V. Saridakis, N. Wu
*Northeast Structural Genomics Consortium
*Midwest Structural Genomics Consortium