Transcript Slide 1
www.wwpdb.org
September 29, 2008
Worldwide Protein Data Bank
www.wwpdb.org
Agenda
10:00 am.
10:15
10:35
10:55
11:15
11:45
Noon
1:00pm
1:30
2:00
3:00
3:15
3:30
Welcome and Introductions
Overview of recent wwPDB progress
Outreach
NMR Task Force
Improvements in Data Deposition
and Processing
New Projects
Working Lunch
Funding Update
Matters Arising
Committee membership
Next meeting
Discussion
Executive Session
Feedback
Adjourn
KH
HB
HN
JM
KH
HB
All
Worldwide Protein Data Bank
www.wwpdb.org
Overview
Helen Berman
wwPDBAC 2007 (on wwPDB Intranet)
Worldwide Protein Data Bank
www.wwpdb.org
wwPDBAC 2007 Recommendations
Structure factors and/or NMR restraints should be a
prerequisite for receiving a PDB ID
– Done
Inform the relevant journals of this new policy
– Done; adopted by some but not all
Validation
Establish additional X-ray crystallography and NMR validation procedures
– In progress
Results should be made available to depositors immediately after
submission. Upon depositor request, the validation reports should be
made available to designated scientific journal editors
– Possible now, journal policies have not as yet changed
Work to establish recommendations for additional experimental data
deposition and release requirements
– In progress
Worldwide Protein Data Bank
www.wwpdb.org
wwPDB Achievements
October 2007 - September 2008
Continued growth of archive – now
more than 50,000 structures
Website updates
Download statistics available
Publications and presentations
Enhanced complex molecule
annotation
New Format document
Initiation of Common Annotation Tool
development
Worldwide Protein Data Bank
www.wwpdb.org
Depositions
Deposited To
Processed By
RCSB
PDBj
PDBe
RCSB
PDBj
PDBe
Total
Depositions
Oct 07
456
69
74
361
164
74
599
Nov 07
408
69
113
265
212
113
590
Dec 07
447
53
80
324
176
80
580
Jan 08
460
57
87
340
177
87
604
Feb 08
362
82
81
313
131
81
525
Mar 08
427
60
81
333
154
81
568
Apr 08
407
96
73
323
180
73
576
May 08
458
35
73
353
140
73
566
Jun 08
450
18
79
308
160
79
547
Jul 08
554
28
84
408
174
84
666
Aug 08
459
63
75
362
160
75
597
TOTAL
4888
630
900
3690
1828
900
6418
Number of released entries
Depositions to the PDB by decade
Year:
Depositor
locations
Download
locations
RCSB PDB
PDBe
PDBj
Worldwide Protein Data Bank
www.wwpdb.org
PDB File Downloads
Last 12 months
FTP: 256,753,220
HTTP: 47,102,103
Total: 303,855,323
Worldwide Protein Data Bank
www.wwpdb.org
Outreach
Haruki Nakamura
Worldwide Protein Data Bank
www.wwpdb.org
Outreach
wwPDB website
Simultaneous updating PDB archives
Publications
Professional society meetings
– Presentations
– Exhibit booth
wwPDB website
Deposition and
Release Policies
Format
Description
Meeting information
and preliminary
recommendations
Deposition and
download statistics
Worldwide Protein Data Bank
www.wwpdb.org
Simultaneous weekly update of PDB archive
In the past, PDBj site started to copy the latest data and
load them to the local database system only after the
RCSB-PDB archive was updated on Wednesday.
Therefore, there was some delay in updating the
database at PDBj. This frustrated potential PDBj users
and they preferred to access RCSB-PDB.
From Sept. 2008, PDBj copies the latest data directly
from the internal database in RCSB-PDB to pre-construct
the PDBj database on Saturday midnight.
By receiving a mail sent from RCSB-PDB automatically
after updating the public ftp-site on every Wednesday,
the ftp-site at PDBj is also updated with little time delay.
Worldwide Protein Data Bank
www.wwpdb.org
Joint publications
K. Henrick, Z. Feng, W. Bluhm, D. Dimitropoulos, J.F.
Doreleijers, S. Dutta, J.L. Flippen-Anderson, J.
Ionides, C. Kamada, E. Krissinel, C.L. Lawson, J.L.
Markley, H. Nakamura, R. Newman, Y. Shimizu, J.
Swaminathan, S. Velankar, J. Ory, E.L. Ulrich, W.
Vranken, J. Westbrook, R. Yamashita, H. Yang, J.
Young, M. Yousufuddin, and H. Berman (2008)
Remediation of the Protein Data Bank Archive.
Nucleic Acids Res. 36(Database issue): D426-D433.
J.L. Markley, E.L. Ulrich, H. Berman, K. Henrick, H.
Nakamura, and H. Akutsu (2008) BioMagResBank
(BMRB) as a partner in the Worldwide Protein Data
Bank (wwPDB): New policies affecting biomolecular
NMR depositions. J Biol NMR. 40: 153-155.
S. Dutta, K. Burkhardt, G.J. Swaminathan, T. Kosada, K. Henrick, H. Nakamura, and
H.M. Berman, Data deposition and annotation at the Worldwide Protein Data Bank, in
Methods in Molecular Biology, 426: Structural Proteomics: High-Throughput Methods,
B.G. Kobe, Mitchell; Huber, Thomas Editor. 2008, Humana Press: Totowa, NJ.
C.L. Lawson, S. Dutta, J.D. Westbrook, K. Henrick, and H.M. Berman (2008)
Representation of viruses in the remediated PDB archive. Acta Cryst. D64: 874-882.
Worldwide Protein Data Bank
www.wwpdb.org
Interactions
Exchange visits
–
–
–
–
–
wwPDB Retreat 2007
Phone conference with site directors-twice a year
VTC’s among staff
–
–
–
–
PDBe/RCSB PDB
PDBj/RCSB PDB
PDBj/BMRB
BMRB/RCSB PDB
BMRB/PDBe
BMRB/RCSB PDB twice a month (ADIT-NMR)
MSD/RCSB PDB weekly
RCSB PDB/PDBj and BMRB/PDBj
BMRB/PDBe
Daily emails among staff
– PDBe/RCSB PDB
– PDBj/RCSB PDB
– BMRB/RCSB PDB, PDBj, PDBe
Worldwide Protein Data Bank
www.wwpdb.org
wwPDB Retreat
Worldwide Protein Data Bank
www.wwpdb.org
IUCr Osaka 2008
Joint exhibition stand
Presentations
– Keynote lecture, What the Protein
Data Bank tells us about the past,
present and future of structural
biology
– Validation talk, Data Quality in the
PDB Archive
Q&A at the Commission on
Biological Macromolecules
Specialized Participation
– Small Angle Commission
– Workshop on New Routes to
Crystallographic Data Publication
– COMCIFs
Worldwide Protein Data Bank
www.wwpdb.org
http://www.eccb08.org
A demonstration describing the wwPDB highlighting
the collaboration as well as services offered by member
sites
Worldwide Protein Data Bank
www.wwpdb.org
NMR Update
John Markley
Worldwide Protein Data Bank
www.wwpdb.org
NMR structure depositions
Number of NMR structures deposited
through ADIT-NMR (09/01/07-08/31/08)
– BMRB -> RCSB PDB
461
– PDBj - BMRB -> PDBj 112
Restraints remediation
– Processing is virtually complete
– Will be released as soon as it can be
made consistent with the remediated
chemical components dictionary
Worldwide Protein Data Bank
www.wwpdb.org
wwPDB policies and rules on NMR entries
Two types of NMR experiments will be
distinguished in the PDB entries
– Solution NMR
– Solid-state NMR
NMR entries will have new PDB records
– MDLTYP to indicate MINIMIZED AVERAGE
– NUMMDL to specify number of models in entry
These changes are reflected in Format
Guide 3.2
Worldwide Protein Data Bank
www.wwpdb.org
wwPDB policies and rules on NMR entries
The numbering of models is sequential, beginning
with 1
All models in a deposition (ensemble members and
minimized average, if provided) should be
superimposed in an appropriate author determined
manner, and only one superposition method should
be used.
All models in an NMR ensemble and the minimized
average structure, if provided, should have the same
sequence and covalent structure (exact same
number and type of atoms: hydrogens and heavy
atoms), and chemistry (e.g., protonation state)
Worldwide Protein Data Bank
www.wwpdb.org
Policies clarified by NMR Task Force
August 26, 2008
PDB will accept minimized average structures
only if they meet the above criteria for alignment
and covalent structure
The number of models will not be limited in a
PDB file
Chemical shifts deposition will become
mandatory
Depositors are encouraged to avail themselves of
third-party validation software prior to
deposition of NMR structures
Worldwide Protein Data Bank
www.wwpdb.org
Improvements in Data Deposition
and Annotation
Kim Henrick
A year of VTC’s and
discussions
Worldwide Protein Data Bank
www.wwpdb.org
PDB Contents Guide Version 3.2
The goal was to further clarify all formats
and procedures so as to create a more
uniform archive
Worldwide Protein Data Bank
www.wwpdb.org
Process
Every record was reviewed for scientific
correctness and clarity by wwPDB
annotators
Some records were added and others
expanded
Task Force members were consulted
where appropriate
Worldwide Protein Data Bank
www.wwpdb.org
Added PDB Format Records
SPLIT for large structures to indicate number of PDB entries
NUMMDL number of MODELS in an entry
MDLTYP model types and if C-alpha only chains
REMARK 0 Re-refinement notice
REMARK 475 Residues modeled with zero occupancy
REMARK 480 Polymer atoms modeled with zero occupancy
REMARK 620 Metal coordination
REMARK 630 Inhibitor Description
DBREF1 / DBREF2 To match very long UniProt Identifiers
DBREF (standard format still used)
Worldwide Protein Data Bank
www.wwpdb.org
Internal Documentation
Worldwide Protein Data Bank
www.wwpdb.org
Results
Complete new Format document produced
and released to public September 15, 2008
Files will be processed according to this
specification starting November 15, 2008
All files in archive will be brought up to
this standard Q1 2009
Worldwide Protein Data Bank
www.wwpdb.org
X-ray Validation Task Force Workshop
April 14-16, 2008 EBI, Hinxton, UK
www.wwpdb.org/workshop/2008/index.html
Randy Read (Chair), Paul Adams, Axel Brunger, Paul Emsley, Robbie Joosten, Gerard
Kleywegt, Eugene Krissinel, Thomas Luetteke, Zbyszek Otwinowski, Tassos Perrakis, Jane
Richardson, Will Sheffler, Janet Smith, Ian Tickle, Gert Vriend
Worldwide Protein Data Bank
www.wwpdb.org
wwPDB Validation Task Force
This meeting of the X-ray Validation Task Force was
held to collect recommendations and develop
consensus on additional validation that should be
performed on PDB entries, and to identify software
applications to perform validation tasks.
Preliminary Outcomes:
Workshop report to be published in Fall 2008
Candidate global and local validation measures
were identified
These measures were reviewed in terms of the
requirements of depositors, reviewers, and users
Worldwide Protein Data Bank
www.wwpdb.org
Remediation and Curation of
Complex Chemistry in the PDB
Worldwide Protein Data Bank
www.wwpdb.org
SCOPE
Inhibitor molecules: annotate the chem
comp dictionary and migrate details to PDB
entries
Ribosomal (postranslational modifications)
and non-ribosomal cyclic, modified and
conjugated peptides: consistently given a
SEQRES , SOURCE; annotate an entity look
up table and transfer to PDB entries
Worldwide Protein Data Bank
www.wwpdb.org
2VUM
AMANITIN
Worldwide Protein Data Bank
www.wwpdb.org
Mapping to UNIPROT
recently shown to
be gene product
e.g. AMATX_AMAPH (P85421)
2VUM cyclically permuted, and needs to be corrected
SEQRES 1 M 8 ASN HYP ILX TRX GLY ILE GLY CSX
to
SEQRES 1 M 8 ILX TRX GLY ILE GLY CSX ASN HYP
to align with the gene sequence for beta-amanitin
from Amanita phalloides, and alpha-amanitin from
Amanita bispoigera.
The encoded sequence would be,
Ile-Trp-Gly-Ile-Gly-Cys-Asn-Pro
Needs MODRES to match gene product
AMANITIN
Worldwide Protein Data Bank
www.wwpdb.org
Cyclic, Modified and Conjugated Peptides
May be Ribosomal or Non-Ribosomal
Non-gene peptides e.g. actinomycin D
i.e. require a gene cluster
Nonribosomal peptides
http://bioinfo.lifl.fr/norine/
or
Novel Antibiotics DataBase
http://www.nih.go.jp/~jun/NADB/search.html
Worldwide Protein Data Bank
www.wwpdb.org
Value to users
To understand unique and shared aspects of a
particular occurrence
To find a specific system : Some components of a
PDB file, such as inhibitors and antibiotic
peptides, might not be found or even be apparent
To study related ligands across different proteins
Worldwide Protein Data Bank
www.wwpdb.org
Challenges
Inclusion of non-standard amino acid,
nucleotides, or other chemical groups in
sequence
Non-linear (cyclic or branched) sequences
Microheterogeneity (some cases)
Non-uniform annotation of the same
molecule in different PDB entries
Lack of annotation regarding the source
and function of these molecules
Worldwide Protein Data Bank
www.wwpdb.org
Solutions
Analysis and classification
– Identify antibiotics and inhibitors and group them into
polymeric molecules or single components
Dictionary updates
– Build single chemical components for appropriate cases
– Update dictionary with source, function and other
details
Remediation and future processing
– Edit/revise files to include compound name, sequence,
source and function for all antibiotics and inhibitors
– Establish rules and procedures to make new
annotations consistent
Worldwide Protein Data Bank
www.wwpdb.org
Single component vs. Polymeric
Single component antibiotics or inhibitors
– Build component and retain subcomponent information;
annotate dictionary with details about molecule
– Migrate details from dictionary to entry files in specific remarks
– e.g. D-Phenylalanyl-L-prolyl-L-arginine chloromethyl ketone
(PPACK)
Polymeric (peptide-like) antibiotics or
inhibitors
– Present sequence, compound name, and source information
as any regular polymer
– Include details about functions in specific remarks
– e.g. post-translationally modified ribosomal peptides, nonribosomal cyclic, modified or conjugated peptides
Worldwide Protein Data Bank
~1300 identifiedwww.wwpdb.org
PDB entries
How many?
Antibiotics
– Single component: ~1000
– Polymeric: ~300
Inhibitors
– Natural and synthetic
inhibitors of enzymes and
other cellular processes
– Single component: ~350
– Polymeric:~350
Others
– Toxins: ~120
Antibacterial
Antiviral
Antimicrobial
Antifungal
Antibiotic
Overlap with
Anticancer
Anti-inflammatory
Immunosuppressant
Herbicide
Worldwide Protein Data Bank
www.wwpdb.org
THIOSTREPTON
Worldwide
Protein
4 PDB
entries
withData
4 Bank
www.wwpdb.org
different representations
1e9w
2jq7
1oln
3cf5
SEQRES THR ILE ALA DHA ALA DHA PYT
SEQRES
ILE ALA DHA ALA
LINKed HETs ROP incorrectly used
is single molecule TXX
SEQRES should be
TZO THR TZB TSI TZO XAA QUA ILE ALA DHA ALA XBB TZO DHA PYT
Now matched in all 4 entries, TXX obsolete
THIOSTREPTON
Worldwide Protein Data Bank
www.wwpdb.org
THIOSTREPTON
_entity.pdbx_description
; Thiostrepton complex bacterial natural product containing thiazole rings
that's used as a topical veterinary antibiotic and also has promising
antimalarial and anticancer activity first isolated from bacteria in 1955,
thiostrepton has an unusual type of antibiotic activity: It disables protein
biosynthesis by binding to ribosomal RNA and one of its associated proteins
and interacts directly with 23S rRNA nucleotides 1067A and 1095A
;
_entity.type “Polypeptide, sulfur containing antibiotic”
_entity.details
; Thiostrepton is a macrocyclic antibiotic incorporating thiazoles and other
atypical amino acids. Patented in 1961, thiostrepton has been used as an
antibiotic and acts by binding to ribosomes to prevent the binding of the EFG elongation factor and GTP to the 50S riobsomal subunit. Thiostrepton is
an inducer of tipA, a gene that controls the bacterial transcription regulators,
TipAL and TipAS, members of the MerR proteins that are central regulators
in multidrug resistance. Closely related to siomycin, a recently discovered
inhibitor of oncogenic transcription factor - FoxM1. The thiostreptonresistant gene is also commonly used as a selective marker for recombinant
DNA/plasmid technologies.
Worldwide Protein Data Bank
www.wwpdb.org
1 “CAS” “1393-48-2” ?
THIOSTREPTON
1 “PUBCHEM” “16130278” ?
1 “Merck Index” “11:9295 ; 14:9364” ?
1 “RTECS” “XN6300100” ?
1 “MDL number” “MFCD00135828” http://www.mdli.com/
1 “EG/EC Number” “215-734-9” ?
1 “ChemSpider” 10469505 http://www.chemspider.com/
1 “URL” http://www.fermentek.co.il/Thiostrepton.htm ?
1 “URL” http://www.tebu-bio.com/file/product/170BIA-T1158-1/ ?
1 “URL” http://www.bioaustralis.com/pdfs/thiostrepton.pdf ?
1 “Sigma Aldrich” “T8902” http://www.sigmaaldrich.com/
1 “Chemical Class” “macrolide” ?
1 “MESH” “Peptides, Cyclic [D04.345.566]” ?
1 “Pharm. Action” “Anti-Bacterial Agent” ?
1 “Image” http://pubs.acs.org/cen/images/8239/8239notw4image.gif ?
1 “Image” http://en.wikipedia.org/wiki/Image:Thiostrepton.png ?
Worldwide Protein Data Bank
www.wwpdb.org
Alert - New Protein Modifications
Thu, September 25, 2008 1:17 pm
John S. Garavelli UniProt/RESID database
micrococcin P1
SCTTCVCTCSCCT
Bacillus cereus strain ATCC 14579
UniProt:Q812G9_BACCR,
Incorrectly annotated as a Putative lantibiotic peptide
Now believe that all the pyridinyl polythiazole
antibiotics, including micrococcin P1, thiostrepton,
thiocillin, GE2270 A and sulfamycin B, are genetically
encoded directly.
Worldwide Protein Data Bank
www.wwpdb.org
THIOSTREPTON
SEQRES
TZO THR TZB TSI TZO XAA QUA ILE ALA DHA ALA XBB TZO DHA PYT
QUA ILE ALA SER ALA SER CYS THR THR CYS ILE CYS THR CYS SER CYS SER SER NH2
Worldwide Protein Data Bank
www.wwpdb.org
Inhibitors
Worldwide Protein Data Bank
www.wwpdb.org
CHYMOSTATIN
1ke2
1bcs
1m21
1wvm
1sgc
SEQRES CSI LEU PHA
SEQRES CSI LEU PHA
single HET group CHY
single HET group CHY
single HET group CST
5 PDB entries with 3 representations
all cases bound to Serine-OG
CHY C31 H41 N7 O6 (OG missing aldehyde)
CST C31 H41 N7 O7 (OG present carboxlyic acid)
Convert all to pseudo SEQRES with BIOLOGICAL
SOURCE
Worldwide Protein Data Bank
www.wwpdb.org
Border-line ?
PDB ID 1qr3
FR901277
Inhibitor of human leukocyte elastase from
Streptomyces resistomycificus
Should this be a single component or a polymeric?
Sequence: AIB ORN THR AA3 AA4 PHE AA6 VAL
Worldwide Protein Data Bank
www.wwpdb.org
Miri Hirshberg
Hyunmi Sun
Shuchismita Dutta
John Westbrook
Jasmine Young
Kim Henrick
John S. Garavelli UniProt
Worldwide Protein Data Bank
www.wwpdb.org
New Projects
Helen Berman
Worldwide Protein Data Bank
www.wwpdb.org
Small Angle Scattering
Two-member annotator team reviewing
possible SAXS and SANS templates
Attendance at SAS Commission to
discuss deposition and publication
requirements
Template recommendation expected in
2009
Worldwide Protein Data Bank
www.wwpdb.org
Common Deposition and
Annotation Tool
Selected as the most important project going
forward by participants of the 2007 wwPDB Retreat
Project timeline: Concept in 2008, design and
development 2009 - 2011 with delivery by 2012
Progress
– wwPDB Directors adopted role of Steering Committee and initiated
the project Concept Phase
– Concept Team, representing the 4 partner sites, meet to create
Scope Document (December 2008)
– Steering Committee approved the Scope Document in May 2008
– Core Team Kick Off meeting July 2008
Worldwide Protein Data Bank
www.wwpdb.org
Scope
wwPDB-wide project
Will allow full sharing of data load worldwide and
eliminate individual points of failure
Will implement recommendations of NMR and Xray Validation Task Forces
Will allow for data acquisition of coordinate,
experimental and meta data for all methods
Will ensure quality, consistency and efficiency of
data processing and annotation process
Worldwide Protein Data Bank
www.wwpdb.org
Assumptions
The deposition tools must be able to handle all
current, agreed upon, data entry formats from the
user community
The underlying system design will not be driven by
existing formats
The product must provide an extensible framework
enabling support for new experimental methods
over its ten year life span
The project technical level will be set at a
“reasonable” standard. Technology should not be
bleeding edge nor declining.
Core Team Kick Off
Worldwide Protein Data Bank
www.wwpdb.org
Core Team Meeting Outcome
1. Establish Project Management Strategy for this project
2. Draft a conceptual design for the solution and identify
critical components that need to be investigated
3. Identify the top three challenges and initiate study
groups
• Future system data model (John Westbrook
and Tom Oldfield)
• Technologies and strategies for data and “state”
management (John Westbrook)
• Technologies and strategies for automation of the
validation and annotation pipeline (Sameer
Velankar)
Worldwide Protein Data Bank
www.wwpdb.org
Path Forward
Adapt Agile Development to our environment as appropriate.
Initial
Requirements
& Conceptual
design
Probe
Development &
Testing of critical
component solutions
Develop
Develop
and
Test
Develop
and
Test
Incre.
Products
andDevelop
Test
Incre.
Products
Incre.
andProducts
Test
Incremental Products
Acceptance
and
release
Final design and Full Requirements realized through
incremental deliveries, using lessons learned along the way.
Worldwide Protein Data Bank
www.wwpdb.org
Archiving of Raw Diffraction Data
Discussion at Commission on Biological
Macromolecules
Outcome
Appoint working group to study
requirements for archiving raw
experimental data (Chair: Judith L.
Flippen-Anderson)
Worldwide Protein Data Bank
www.wwpdb.org
Funding Update
RCSB has received approval from NSB for
funding through 2013
BMRB currently funded through Aug 2009 – has
submitted a competitive renewal application to the
National Library of Medicine (U.S. National Institutes
of Health) – even if successful, the current budget
will be reduced by 30%
• PDBj is going to be reviewed in this November,
at the middle of the current project until Mar 2011
• EMBL-EBI (PDBe) Has 6 months bridging funds from
Wellcome Trust to cover transition of team leader, 6
staff funded until 1-Dec-09
Worldwide Protein Data Bank
www.wwpdb.org
Matters Arising
Committee membership
HPUB proposed revision
Industrial structures
Validation guidelines