SGX LIMS and XML export

Download Report

Transcript SGX LIMS and XML export

PSI Data Management and Reporting: Expectations, Standards and Utility

J. Michael Sauder Director, Bioinformatics NYSGXRC Project Leader

NIGMS Expectations

• http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-05-001.html

• “… a database for deposition of information on

experimental outcome data (both successful and unsuccessful)

. • “These data include …

cDNA cloning, expression vector construction, protein production and purification, protein biochemical characterizations, crystallization screening

,

synchrotron and NMR data collection, etc

. • “The PSI Research Network centers will be required to provide plans for the

collection, maintenance, and transfer of experimental results into this central data repository

. •

PepcDB

will contain information on these important results and provide a platform for cross-center data mining to capitalize on the PSI investment

Protocols vs Results

• General protocols are reported by each PSI Center in PepcDB • General protocols have been published in the literature by several Centers • However, one of the real values of PepcDB lies in the detailed experimental trial results for each target – Which clones were made? (PSI-MR) – Which constructs yield soluble protein? (which don’t?) – – – What are the fermentation conditions? Purification?

What was the protein yield? The final concentration? The experimental molecular weight?

What conditions gave crystals? How many crystal forms? What was the cryoprotectant? Which conditions led to diffraction data? To the structure?

TargetDB/PepcDB Data Mining

• TargetDB status is informative, but far more useful would be data about – Small scale expression/solubility testing – Large scale purification yield, concentration, oligomeric state – Conditions that yielded diffracting crystals • Publications –

Overton et al (2008) Bioinformatics 24:901 (PDB, TargetDB, PepcDB) 907. “ ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction”

– – – – –

Martin-Galiano et al (2008) Proteins 70:1243 (TargetDB) 1256 “ Predicting experimental properties of integral membrane proteins by a naive Bayes approach” Bannen et al (2007) J Struct Funct Genomics 8:217 226 “ Effect of low-complexity regions on protein structure determination” (TargetDB/PepcDB) Smialowski et al (2007) Bioinformatics 23:2536 2542 “ Protein solubility: sequence based prediction and experimental verification” (TargetDB) Slabinski et al (2007) Bioinformatics 23:3403 3405 “ XtalPred: a web server for prediction of protein crystallizability” (TargetDB) Nair & Rost (2004) Nucl Acids Res 32:W517 W521 “ LOCnet and LOCtarget: sub cellular localization for structural genomics targets” (TargetDB)

Process vs Reporting

Selected 10 Active 110 Mol biol in progress 140 Fail PCR 170 210 Cloning failed Failed transform Failed expresn Failed solubility 270 Clone completed to ferm 390 Purification in progress 370 365

Soluble

320 315 Purification waiting Purification on hold Fermentation waiting Fermentation on hold 310 Fermentation voided 430 440

Soluble

450 Purification technical error Purification failed Purification research unsuccessful 460 Purification research marginal 470 Purification research successful Purified; completed to collaborator 685 665 655 645 640 620 Optimization crystals Optimization microcrystals Optimization grainy ppt Screening grainy ppt Cryst in optimization Cryst in screening Crystallization admitted 710 Crystal waiting collection Crystal examined 730 Crystal abandoned Dataset collected Structure deposited 950 Structure

Need to Consider the Future… Now

• How much data are we capturing in our databases compared to how much we are reporting?

• What will happen to Center data after PSI-2?

• We should ensure that as much as possible of our Center data is publicly accessible in PepcDB

Trial Data Reporting by Center

Center

JCSG MCSG NESGC NYSGXRC

Experimental trial details reported to PepcDB

Protein sequence, cloning vector, fermentation media, purification method, crystallization conditions Protein sequence, cloning vector, expression host, temperature, media Protein sequence DNA and protein sequence, construct boundaries, cloning vector, small scale expression/solubility scores, media, MW, large scale media, volume, induction time/temp, pellet weight, harvest date, SeMet Y/N, purification yield, concentration, purity, MW, oligomeric state, start/end dates, mass spec pass/fail, analysis comments, MW, crystallization conditions, protein concentration, temperature, cryo, harvest/collection dates, anomalous scatterer, diffraction resolution

PepcDB Trial Schema

NYSGXRC

SGX_MOLBIO_PCR ### Molecular Biology - PCR #### PCR start date: 03/20/2007 PCR last updated: 04/16/2007 Notebook #: 1358 Page: 13 DNA source?

Primers?

SGX_MOLBIO_TOPO_TRANSFORM ### Molecular Biology - cloning #### SGX clonename: 10001b2BSt5p1 Vector: pSGX4 (BS) SGX_MOLBIO_EXPR_SOL ### Small scale expression/solubility ### Expression score: HIGH Solubility rating: HIGH Predicted molecular weight (kDa): 44.95

Growth Media (small scale): ZYP-5052 Observed molecular Weight (kDa): 46 Sonication buffer: PLB1 Host cells?

Antibiotic resistance?

Purification steps?

Buffers?

SGX_FERM_ECOLI_ZYP ### Fermentation ### SGX PID: 11732 Growth Media (large scale): ZYP-5052 Total volume (L): 1 Induction time (hr): 21 Induction temp. (C): 22 Pellet weight (g): 19 Harvest date: 05/17/2006 Selenomet: N SGX_PURIF_ECOLI_BACT ### Purification ### SGX PID: 11732 SGX pool: 1 Selenomet: N Start date: 06/21/2006 Yield (mg): 52.3

Final concentration (mg/ml): 52.3

Observed molecular weight (kDa): 33 Notebook #: 1136 Page: 115 End date: 06/23/2006 Purity (%): 98 Oligomeric state: monomer (1 subunit)

NYSGXRC

SGX_MALDI ### Mass Spec - MALDI ### Mass Spec Status: Passed SGX_ESI-MS ### Mass Spec - ESI-MS ### Mass Spec Status: Passed Observed MW: 32528 SGX_XTAL ### Crystallization ### SGX XID: 27611 Tray barcode: N0081969 Temperature: 21 Protein concentration (mg/ml): 26 Well location: G 12 Well conditions: [100mM] 1M Hepes pH 7.5 + [25%] 50% PEG 3350 +[200mM] 1M Magnesium Chloride hexahydrate Cryoprotectant comment: [20%] 80% Glycerol Harvest date: 09/05/2006 Collection date: 09/05/2006 APS resolution: 2.3

Crystal status: D-DATASET COLLECTED Crystal morphology?

Space group?

Proposed Data Reporting

• Molecular biology – DNA source, primers, vector, PSI-MR clone ID, Host, antibiotic resistance – Expression and solubility rating (small scale), media, predicted and observed molecular weight • Fermentation – Media, volume, induction time, temp, selenoMet?

• Purification – Purification steps, final buffer, yield, concentration, molecular weight, purity, oligomeric state – Accurate MW if mass spec done • Crystallization – Temperature, protein concentration, well conditions, cryoprotectant and resolution, if applicable

<…Value>

• Alternative mechanism to report experimental data – molecular weight – – 32475 Da • Examples – Molecular weight – – – – – Isoelectric point Phosphorylation Methylation Element analysis / stoichiometry etc.

Optional tags

• http://mmcif.pdb.org/sg-data/protprod.html

• PDB-proposed mmCIF-like tags to describe cloning, expression, purification, crystallization, etc.

• Examples – _entity_src_gen_pure.protein_concentration

– – – _entity_src_gen_pure.protein_yield

_entity_src_gen_pure.protein_oligomeric_state

_pdbx_buffer_components.name

– – _pdbx_buffer_components.conc

_exptl_crystal_grow.temp

Recommendation

NYSGXRC plans to further improve our reporting of trial results in 2008

We encourage all PSI Centers to utilize the PepcDB or tags to report as much experimental trial results as possible in their PepcDB XML updates

See associated poster

Acknowledgements

• SGX LIMS development team – Ryan Allis – – – Chris Hansen Peter Hillier Ken Schwinn • AECOM - Veena Venkatagiriyappa (Fiser lab) • Andrei Kouranov (PDB) • LIMS improvements suggested by SGX protein production, crystallization, and beamline staff • This work was supported by SGX Pharmaceuticals, Inc., and NIH Grant U54 GM074945