Bulk Model Construction and Molecular Replacement

Download Report

Transcript Bulk Model Construction and Molecular Replacement

Bulk Model Construction and Molecular Replacement in CCP4 Automation Ronan Keegan, Norman Stein, Martyn Winn.

Overview

• Brute force search method for the best model for Molecular Replacement on a target structure.

• Python script utilising HPC resources.

• Can also run on single machine. • Two main parts: – Model Generation using a variety of methods.

– Feeding a selection of the best models into an MR program.

• User input requirements: target sequence and associated MTZ file.

Overview

Process Target information • Calculate Molecular Weight • Estimate number of molecules in the a.s.u.

• Parse MTZ file for any relevant parameters

Searching for Homologous Structures

• Using target sequence, program consults services based at the EBI for homologous structures based on sequence matching (OCA).

• The top match from the sequence based search is then used for a secondary structure based search using the MSDFold/SSM webservice.

• Using results from above searches, service will also consult PQS at the EBI for any related multimeric structures.

• As an additional option, the top hits from the search can be aligned using Superpose to construct an ensemble of models to be used at the Molecular Replacement stage.

Model Construction

• Once the search stage has been completed all of the associated PDB structure files are retrieved.

• These are then manipulated in several different ways to create a plethora of possible models: –

1) PDB Clipping

(Pdbcur, Pdbset, Coord_format): • Waters and hydrogens are removed • Any anomalies in the structure file such as empty fields are corrected (e.g. missing chain identifiers) • Select most probable confirmations • Individual chains are extracted

Model Construction

2) Molrep

• Uses own sequence alignment to prune the side chains.

• Side chains are stripped to lowest common parts.

3) Chainsaw

(Norman Stein) • Input sequence alignment used to strip side chains.

• More severe pruning than Molrep: “mixed model”.

• Can be given many possible alignments to create different models from the same structure.

• Can use sophisticated sequence aligning such as PSI-Blast and FFAS.

Molecular Replacement

• A cluster or HPC resource spawns multiple MR jobs each taking one of the constructed models along with the target structure data.

• Phaser/Amore/Molrep can all be used to do the MR.

• Phaser used for the Ensemble of top hits.

• If and when the MR program fits the model structure to the target data the resulting PDB file is processed using Refmac to asses whether it is likely to refine.

• Results are then provided to the user for all of the top scoring models.

• User can retrieve the refined structures along with any of the associated log files.

e-HTPX

Jobs can be submitted via the e-HPTX portal to the Daresbury e-HTPX computational resources (cluster or condor pool) or, if the user has a Grid Certificate, to the UK National Grid Resources.

Users can monitor the job results as they are produced via a web page hosted on the e-HTPX server machine and they are notified by email when their job is complete.

Refined structure files are made available to user for downloading upon completion.

First external user as of a couple of days ago!

JCSG Targets

N.B. good homologues available Target Reso 1vrd 1vrg 2.2 2.9 a.s.u. 2 x 482 Hits from OCA / SSM 37 / 0 6 x 515 30 / 0 Top hit (JCSG model) 57% (57%) 61% (58%) Phaser LLG 1eep_A PDBCLP 1634 (1zfj_A PDBCLP 964) 1xnv_A PDBCLP 9498 (1on3 hexamer 7269) Refmac Rfree 1eep_A PDBCLP 33.6

1eep_B MOLREP 40.7 (1zfj_A PDBCLP 40.8) 1xo6_F PDBCLP 34.8

1xnv MULTMR 35.0

1xnv_B MOLREP 51.9

(1on3 hexamer 37.9) RMSD final model Currently working through more challenging examples …

Other points

• Program can also be run on a single machine in a scaled-down fashion.

• Can be run from the command line.

• Easy to swap out Phaser and run Amore, Molrep or other MR program instead.

• Modularised - Model construction can be run on its own. • Other model generating methods can easily be inserted.

Future Plans

• Make it smarter and quicker.

• Use better sequence alignment methods such as PSIBlast, FFAS.

• Use Norman’s Chainsaw program as an extra model creation method.

• Incorporate Norman’s Amore wrapper.

• Integrate it into Graeme’s XIA project – make use of scheduler code wrappers & provide a Model Generation module for XIA-MR.