Document 7385588

Download Report

Transcript Document 7385588

MrBUMP – Molecular Replacement with Bulk
Model Preparation
Automated search model discovery and
preparation for structure solution by
molecular replacement
Ronan Keegan, Martyn Winn
CCP4 group, Daresbury Laboratory
Leuven, August 8th 2006
The aim of Mr Bump
•An automation framework for Molecular Replacement.
•Particular emphasis on generating a variety of search models.
Wraps Phaser and/or Molrep.
•Also uses a variety of helper applications (e.g. Chainsaw)
and bioinformatics tools (e.g. Fasta, Mafft)
•Uses on-line databases (e.g. PDB, Scop)
•In favourable cases, gives “one-button” solution
•In unfavourable cases, will suggest likely search models
for manual investigation
The pipeline - first steps
Target MTZ
& Sequence
Target
`
Details
Template
`
Search
Number of residues & molecular weight
Matthews Coefficient.
Estimated number of molecules in a.s.u.
Generate a list of structures that are
possible templates for search models
Search for homologous proteins
FASTA search of PDB
• Sequence based search using sequence of target structure.
• Can be run locally if user has fasta34 program installed or remotely
using the OCA web-based service hosted by the EBI.
• Local search is done against the complete list of PDB sequences
derived from ATOM records in the PDB structure files.
• All of the resulting PDB id
codes are added to a list
• Not interested in the alignment
to target at this stage.
Search for additional similar structures
• Additional structure-based search (optional)
– Top hit from the FASTA search is used as the template structure
for a secondary structure based search.
– Uses the SSM webservice provided by the EBI (a.k.a. MSDfold)
– Any new structures found are
added to the list.
– Provides structural variation,
not based on direct sequence
similarity to target
• Manual addition
• Can add additional PDB id codes to the list, e.g. from FFAS
or psiBLAST searches
Multiple Alignment
• After the set of PDB ids are collected in the FASTA and SSM
searches, their coordinate-based sequences are collected and
put through a multiple alignment with the target sequence
• Aims:
– Extract pairwise alignment between template and target for
use in Chainsaw step. Multiple alignment should give a
better set of alignments than the original pair-wise FASTA
alignments
– Score template structures in a consistent manner, in order to
prioritise them for subsequent steps
Multiple Alignment
target
model
templates
pairwise
alignment
Jalview 2.08.1 Barton group, Dundee
currently support ClustalW or MAFFT for multiple alignment
Template Model Scoring
•
Alignment Scoring:
score = sequence identity X alignment quality
•
Sequence identity:
•
Alignment quality:
– Ungapped sequence identity i.e. sequence identity of aligned target
residues
– Dependent on the alignment length, the number of gaps created in the
template alignment and the extent of each of these gaps.
– The penalties given for gaps and the size of the gaps is biased so that
alignments that preserve domains of the structure rather than spreading the
aligned residues out score higher.
The top scoring models are then used for further processing
Domains
• Suitable templates for
individual target domains may
exist in isolation in PDB, or in
combination with dissimilar
domains
• In case of relative domain
motion, may want to solve
domains separately
Domains
• Domains search:
– Top scoring templates from multiple alignment are tested to see
if they contain any domains.
– Uses the SCOP database. This only lists domains that appear
more than once in the PDB.
– The database is scanned to to see if domains exist for each of
the PDBs in the list of templates
– Domains are then extracted from the parent PDB structure file
and added to the list of template models as additional search
models for MR.
Multimers
• Multimer search:
– Search for quaternary structures that may be used as search models.
– Better signal-to-noise ratio than monomer, if assembly is correct for the
target.
– Multimeric structures based on top templates are retrieved using the
PQS service at the EBI, and added to the list of search models
– PQS will soon be replaced by the use of the PISA service at the EBI
(Eugene Krissinel)
1n5a
1n5b
1n5c
1n5d
SPLIT-ASU into
4 Oligomeric files of type TRIMERIC
SPLIT-ASU into
2 Oligomeric files of type DIMERIC
SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
Target MTZ
&
Sequence
Target
`
Details
Model
`
Search
Model
`
Preparation
Raw template structures not usually appropriate
for MR. Edit to create search model.
Search Model Preparation
Search models prepared in four ways:
1.
PDBclip
–
original PDB with waters removed, hydrogens removed, most
probable conformations for side chains selected and chain ID’s added
if missing.
2.
Molrep
–
Molrep contains a model preparation function which will align the
template sequence with the target sequence and prune the nonconserved side chains accordingly.
–
Chainsaw
–
Can be given any alignment between the target and template
sequences.
–
Non-conserved residues are pruned back to the gamma atom.
1.
Polyalanine
–
Created by excluding all of the side chain atoms beyond the CB atom
using the Pdbset program
Search Model Preparation
Ensemble for Phaser:
•
Top scoring search models are superposed to create a ensemble model.
•
This may provide a better search model than any of the individual models
on their own.
•
Currently the default is to use the top 5 scoring search models but plan to
create dynamically based on MW and RMSDs of constituent search
models
Target MTZ
&
Sequence
Target
`
Details
Model
`
Search
Model
`
Preparation
Molecular Replacement
`
& Refinement
Molecular Replacement and Refinement
• The search models can be processed with Molrep or Phaser or both.
• The resulting models from molecular replacement are passed to
Refmac for restrained refinement.
• The change in the Rfree value during refinement is used as rough
estimate of how good the resulting model is.
final Rfree < 0.35 or
final Rfree < 0.5 and dropped by 20%

“success”
final Rfree < 0.5 and dropped by 5%

“marginal”

“failure”
otherwise
• MR scores and un-refined models available for later inspection.
Serial mode
Target MTZ
&
Sequence
Target
`
Details
Model
`
Search
Check Scores
and exit or select
the next model
Model
`
Preparation
Molecular Replacement
`
& Refinement
Parallel mode
Target MTZ
&
Sequence
Target
`
Details
Model
`
Search
Start multiple MR
jobs and exit
when one finds a
solution
ar Replacement
efinement
`
Molecular Replacement
& Refinement
`
Model
`
Preparation
Molecular Replacement
& Refinement
`
Molecular Replacement
& Refinement
`
Molecular Rep
& Refinem
`
MrBUMP on compute clusters
• MrBUMP can take advantage of a
compute cluster to farm out the
Molecular Replacement jobs.
• Currently Sun Grid Engine
enabled clusters are supported
but support will be added for LSF
and condor and any other types
of queuing system if there is
enough demand.
Pre-release version of MrBUMP
• Pre-release made available in Jan 06
• Simple installation
• Currently runs on Linux and OSX.
• Windows version almost ready.
•Comes with CCP4 GUI .
•Can also be run from the command line
with keyword input
• First citation in Obiero et al., Acta Cryst.
(2006). F62, 757-760
•Regular updates (currently version 0.3.1)
http://www.ccp4.ac.uk/MrBUMP
Example 1
1vlw: aldolase from T. maritima
3 chains of 205aa. Data in C2221 to
2.3Å. Using Molrep.
Solutions based on 1fq0:
Search model
Seq id. (%)
Contrast
CC
Refmac Rfree
Solution?
1fq0_C_CHNSAW
31.8
3.25
0.342 / 0.318
0.504 / 0.480
yes
1fq0_C_MOLREP
31.8
1.59
0.376 / 0.369
0.521 / 0.476
yes
1fq0_B_CHNSAW
31.8
2.28
0.336 / 0.320
0.523 / 0.499
yes
1fq0_B_MOLREP
31.8
1.37
0.358 / 0.357
0.530 / 0.529
2 chains only
1fq0_A_ CHNSAW
31.8
4.53
0.345 / 0.308
0.526 / 0.466
yes
1fq0_A_MOLREP
31.8
1.61
0.352 / 0.350
0.527 / 0.479
yes
Other models based on 1eun and 1eua
Most, but not all, work.
Example 2
1k6d: alpha subunit of acetate CoA-transferase
from E.Coli
Solutions based on 1ope:
Search model
1ope_a2_CHNSAW
1ope_a2_MOLREP
1ope_a2_PDBCLP
1ope_a2_POLYALA
1ope_A_CHNSAW
1ope_A_MOLREP
1ope_a1_CHNSAW
1ope_a1_MOLREP
Seq id. (%)
40.0
39.7
36.8
Contrast
5.84
3.10
1.33
7.09
5.73
6.16
fail
1.30
CC / Z
0.464 / 0.410
0.496 / 0.455
0.424 / 0.421
0.420 / 0.370
0.439 / 0.385
0.449 / 0.398
Refmac Rfree
0.533 / 0.396
0.528 / 0.391
0.538 / 0.537
0.548 / 0.452
0.540 / 0.451
0.538 / 0.442
0.378 / 0.375
0.565 / 0.579
solution
yes
yes
no
yes
yes
yes
no
no
1ope - longer chains colinear with bacterial alpha and beta subunits
Domain 2 is therefore best model.
Whole chain also works because of alignment / model editing.
A few observations ...
• In difficult cases, success in MrBUMP may depend on
particular template, chain and model preparation method
• Nevertheless, may get several putative solutions
• Ease of subsequent model re-building, model completion may
depend on choice of solution
• First solution or check everything?
• Expectation that quick solution required - in fact, most users
seem happy to let MrBUMP run for long time (hours, days)
• Worth checking “failed” solutions!
Future developments
 Windows support (requires installer)
 Complexes (in progress)
 Processing of multiple target sequences
 Improved alignment:




Multiple alignment against larger sequence database
Alignment from profile-based search
User-supplied alignment
Incorporate PISA multimer determining service (in progress)
 Model generation:
 Identification of flexible loops
 Normal mode generated conformations
 Develop web-service version to allow CCP4i users to run jobs
on CCP4 cluster
Acknowledgements
• Ronan Keegan, CCP4 @ Daresbury
• Thanks to authors of all underlying programs and services
• Other suggestions from:
•
•
•
•
Dave Meredith, Graeme Winter, Daresbury Laboratory.
Eugene Krissinel, EBI, Cambridge.
Eleanor Dobson, YSBL, York University
Geoff Barton, Charlie Bond, University of Dundee
• Randy Read, Airlie McCoy, Cambridge
• Funding:
• BBSRC (e-HTPX, CCP4)
• See posters m34.p07 (Keegan, sic), m03.p03 (Vagin),
m32.p06 (Remacle)
http://www.ccp4.ac.uk/MrBUMP