Protein-Nucleic Acid Interactions Helen M. Berman

Download Report

Transcript Protein-Nucleic Acid Interactions Helen M. Berman

Data Management in the
http://www.pdb.org/ • [email protected]
History of the PDB
1970s
Community discussions about how to establish a PDB
Cold Spring Harbor meeting in protein crystallography
PDB established at Brookhaven (October 1971; 7 structures)
1980s
Number of structures increases as technology improves
Community discussions about requiring depositions
IUCr guidelines established
Number of structures deposited increases
Independent biological databases established – e.g., the NDB
1990s
mmCIF project completed
Structural genomics begins
PDB moves to RCSB
2000s
RCSB PDB renewed
wwPDB established
PDB Mission
To provide the most accurate, well-annotated
data about macromolecular structure in the
most timely and efficient way possible to
facilitate new discoveries and advances in
science
Year
Number of released entries
The Data Pipeline
Structure Determination Pipeline
(X-ray)
Hypothesis
Driven Target
Selection
Crystallomics
Data
Collection
Structure
Determination
Data
Deposition
Publication
Data
Release
Isolation, Expression,
Purification,Crystallization
Data Processing Data Flow
System for Data Collection and
Archiving
Depositor
MAXIT
Validation
Data
ADIT
AutoDep
Input Tool
Reports
Final Files
Data
Views
Metadata
Dictionaries
Database
Loader
Data Processing System
Features
Different dictionaries without software changes
Simple customization of both functionality and content
Automatically scales with changes in content
Can be distributed to multiple deposition sites
Reference data and standard nomenclature (ERFs)
Data Content
of Each PDB Entry
 1970’s
 Name, source, reference, resolution,
sequence,secondary structure, crystal data, coordinates,
unstructured remarks
 1990’s
 Name, source, reference,resolution, refinement details,
data collection and processing details,symmetry details,
biological unit information, missing residues, related
entries, sequence, ligand and ions, secondary structure,
crystal data, coordinates, few unstructured remarks
Annotation and Validation
 ADIT
 Reviewing, adding, correcting entry information
 Maxit
 File format conversions





Blast Automation Tool results
Validation Server Reports
Ligand Depot, ChemDraw
RasMol for Visualization
PubMed, Citation Tracker, Citation Tool
Extending Data Dictionaries for
Deposition
 X-ray
 Structure determination data items
 http://deposit.pdb.org/mmcif/sg-data/xstal.html
 NMR
 Structure determination data items
 http:// deposit.pdb.org /mmcif/sg-data/nmr.html
 Protein Production
 http:// deposit.pdb.org /mmcif/sg-data/protprod.html
Growth of Molecular Complexity
Deposition Xray/NMR/EM by
Deposition of X-ray, NMR & EM structures
year
by
year
2500
2000
X-ray
1500
NMR
EM
1000
500
Year
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
1975
1973
1971
1969
0
MARCH
2005
Sample
Description
Cryo-EM Dictionary Proposal
Biochemical
Preparation
EM Specimen
Preparation
em_sample_preparation
em_vitrification
em_assembly
em_sample_support
em_entity_assembly
em_array_formation
em_entity_assembly_list
em_solution_composition
em_virus_entity
em_stain
em_cryo_stain
em_embedding_agent
em_filaments
em_imaging
em_detector
em_image_scans
em_microscope
em_micrographs
em_electron_diffraction
em_icos_virus_shells
em_single_particle
EM Data Collection
Image Processing
em_singleparticle_selection
Structure
Analysis
em_electron_diffraction_phase
em_3d_fitting
em_electron_diffraction_pattern
em_3d_reconstruction
em_2d_crystal
em_3d_fitting_list
em_particle_picking
em_classes
em_particle_picking_list
em_refinement
em_filament_selection
em_fsc_curve
em_filament_reconstruction
New categories
recommended at
the Oct 2004
workshop
are in pink
Target Registration Database
TargetDB • http://targetdb.pdb.org/
 All targets downloadable in XML (~51,000 Targets)
 Targets downloaded from 18 centers weekly
 Target search by:
 Sequence (FASTA), project target ID, project site, status (selected,
cloned, expressed, … in PDB), update date, protein name, source
organism






Report output in HTML, FASTA, and XML
Integrates PDB entry sequences (~55,600 sequences)
Includes PDB pre-release sequence data
Provides links to related sequence databases
Open to all Structural Genomics projects
Summary reports of target or project progress
Protein Expression Purification and
Crystallization Database (PepcDB)
 Extends content of TargetDB
 All protocols for cloning, expression, purification are
stored and are searchable
 Reports provide links to status history, related
protocols, project, sequence and domain databases
Tracking, Assembling and
Archiving Data
Target Tracking
TargetDB
Target and Protocol Tracking
Protocols
Target
Selection
Sample
preparation
PepcDB
Data
Collection
Data
Processing
Structure
Solution
Refinement
PDB
Merging and
integration
Incremental Assembly
Current Query System
Reengineered Web Site
pdbbeta.rcsb.org
 Built on curated data
 Three-tier architecture
 Database tier
 Middle tier
 Presentation tier
 Feedback from users
 Help desk
 Usability engineering
 Focus groups
 Went into public beta testing in July 2004
Navigation and Query
Persistent
Integrated Help
Search Box (Context-sensitive)
Persistent
Navigation Bar
Site Search
Getting
Started
Hierarchical
Menu Items
 Worldwide PDB (wwPDB)
 RCSB (Research Collaboratory for Structural
Bioinformatics)
 PDBj (Osaka University)
 Macromolecular Structure Database (EBI)
 To ensure that PDB files remain in a single archive
to best serve the worldwide community of depositors
and users
http://www.wwpdb.org/
Acknowledgements
Operated by the Research Collaboratory of Structural Bioinformatics
Supported by:
NIGMS
RCSB-PDB Team
RCSB PDB Team: Ken Addess, Helen M. Berman, Wolfgang F. Bluhm, Phil Bourne, Kyle Burkhardt, Li Chen, Sharon Cousin,
Jim Croker, Nita Deshpande, Shuchismita Dutta, Zukang Feng, Lew-Christiane Fernandez, Judith L. Flippen-Anderson, Gary
Gilliland, Rachel Kramer Green, Vladimir Guranovic, Shri Jain, Ann Kagehiro, Charlie Knezevich, Andrei Kouranov, Kevin
Lwinmoe, Jeff Merino-Ott, Irina Persikova, Suzanne Richman, Melcoir Rosas, Kathryn Rosecrans, Bohdan Schneider, Wayne
Townsend-Merino, Susan Van Arnum, Elizabeth Walker, John Westbrook, Alice Xenachis, Huanwang Yang, Jasmin Yang,
Christine Zardecki, Cindy Zhang
www.pdb.org • [email protected]