Challenges in visualization of complexes and the PDB

Download Report

Transcript Challenges in visualization of complexes and the PDB

Year
Number of released entries
Growth of Molecular Complexity
Number of
Structures
Containing
that Number
of Chains
Number of
Chains
Year
HEADER
TITLE
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
KEYWDS
KEYWDS
EXPDTA
AUTHOR
REVDAT
JRNL
JRNL
JRNL
JRNL
JRNL
COMPLEX (ACETYLATION/ACTIN-BINDING)
30-MAY-97
1HLU
STRUCTURE OF BOVINE BETA-ACTIN-PROFILIN COMPLEX WITH ACTIN
2 BOUND ATP PHOSPHATES SOLVENT ACCESSIBLE
MOL_ID: 1;
2 MOLECULE: BETA-ACTIN;
3 CHAIN: A;
4 MOL_ID: 2;
5 MOLECULE: PROFILIN;
6 CHAIN: P
MOL_ID: 1;
2 ORGANISM_SCIENTIFIC: BOS TAURUS;
3 ORGANISM_COMMON: BOVINE;
4 ORGAN: THYMUS;
5 MOL_ID: 2;
6 ORGANISM_SCIENTIFIC: BOS TAURUS;
7 ORGANISM_COMMON: BOVINE;
8 ORGAN: THYMUS
COMPLEX (ACETYLATION/ACTIN-BINDING), ACTIN, PROFILIN,
2 CONFORMATIONAL CHANGES, CYTOSKELETON
X-RAY DIFFRACTION
J.K.CHIK,U.LINDBERG,C.E.SCHUTT
1
15-OCT-97 1HLU
0
AUTH
J.K.CHIK,U.LINDBERG,C.E.SCHUTT
TITL
THE
STRUCTURE October
OF AN OPEN
STATE OF BETA-ACTIN AT
Visualization
workshop,
2003
TITL 2 2.65 A RESOLUTION
REF
J.MOL.BIOL.
V. 263
607 1996
REFN
ASTM JMOBAK UK ISSN 0022-2836
0070
Challenges in visualization of
complexes and the PDB
What is PDB’s role in molecular visualization?
Coordinate files
of molecules
+
Visualization
software
=
Visualization
of molecules
What is PDB’s responsibility with respect to
molecular visualization?
providing complete &
+ providing links & = visualization for user
correctly annotated
explanations for
community – general
coordinate files in
visualization
users, researchers,
multiple formats (PDB, cif,
software
annotators, students,
XML)
educators, databases,
bio-informaticians
The PDB format
JRNL
JRNL
JRNL
JRNL
JRNL
TITL
COMPARISON OF THE THREE-DIMENSIONAL STRUCTURES OF
TITL 2 RECOMBINANT HUMAN H AND HORSE L FERRITINS AT HIGH
TITL 3 RESOLUTION
REF
J.MOL.BIOL.
V. 268
424 1997
REFN
ASTM JMOBAK UK ISSN 0022-2836
0070
The mmcif format
mmCIF: macromolecular
Crystallographic
Information
File
This is an extension of the
Crystallographic Information
File (CIF) data representation
(used for describing small
molecule structures) to
describe macromolecules.
_citation.id
_citation.title
primary
;Comparison of the three-dimensional structures
of recombinant human H and horse L ferritins at
high resolution.
;
_citation.journal_abbrev
J.Mol.Biol.
_citation.journal_volume
268
_citation.page_first
424
_citation.page_last
448
_citation.year
1997
_citation.journal_id_ASTM
JMOBAK
_citation.country
UK
_citation.journal_id_ISSN
0022-2836
_citation.journal_id_CSD
0070
_citation.book_publisher
?
_citation.pdbx_database_id_PubMed
9159481
Complete and correctly annotated
coordinate files
• Do PDB files conform to uniform standards?
– Yes, Remediated mmcif files are available. They can
be converted to PDB format using CIFTr
(This application is available at
http://deposit.pdb.org/software/)
• Do PDB files contain coordinates for the
complete biological unit?
– Yes, both coordinates and pictures of the biological
unit of all PDB files are now available.
What is a biological unit?
• PDB has coordinates of molecules determined by:
–
–
–
–
X-ray crystallography
NMR
Electron microscopy
Theoretical modeling
Primary coordinate files for crystallographic structure generally
contain one asymmetric (unique) unit.
• The biological molecule (also called a biological unit)
is the macromolecule that has been shown to be or is
believed to be functional. This could include one, a
part of or multiple asymmetric units.
Concept of the biological unit
Biological unit
Biological unit could include one, a part of or multiple asymmetric units.
Downloading biological unit images/
coordinate files from the PDB
PDB ID 1AEW
Information for constructing the biological unit is contained
in remark 350 of the PDB file
Visualizing the biological unit
Some visualization tools
fail to duplicate the
secondary structure
records for symmetry
related molecules in the
biological unit
Biological unit of 1AEW
Viewed in RasMol
Large macromolecular assembly 1:
Viruses
• For viruses, usually the coordinates of the
icosohedral asymmetric unit are deposited to
the PDB. Transformation matrices for
generating the complete virus are also
provided.
• Sometimes additional matrices are provided
to generate the icosohedral asymmetric unit
from the given coordinates
3
Virus particles have high symmetry (5, 3, 2)
5
2
Generating the biological unit of a virus
Coordinates in crystallographic
Symmetry axes
Coordinates in icosohedral
Symmetry axes
Conversion matrix
Asymmetric unit Asymmetric unit
NCS applied
NCS not applied
Asymmetric unit
Recipe &
matrices
Crystallographic
Symmetry
operations
Biological unit
60
matrices
Virus: problems and solutions
Problems:
- matrices for generating the biological unit?
- improper nomenclature of the 60 matrices for
generating the icosohedral virus particle?
- placeholder for the NCS matrices for
completing the icosohedral asymmetric unit ?
- conversion matrix between crystallographic
and icosohedral axes?
Possible solutions:
- uniform representation of matrices for
generating the biological unit?
- change in the nomenclature of the 60 matrices
for generating the virus?
- conversion matrix between crystallographic
and icosohedral axes always available?
- other?
Large macromolecular assembly 2:
Ribosomes
The current PDB format can hold:
• a maximum of 99,999 atom records, and
• upto 62 different polymer chains.
Since there is no way to represent structures that exceed
either of these restrictions in a single PDB file we have
divided such structures into multiple PDB entries.
Although this is not a perfect solution, we have done this to
support existing software that rely on current format.
The mmCIF/ XML format has no such restrictions.
Ribosomes
1GIX: Small subunit of the ribosome
1GIY: Large subunit of the ribosome
Ribosome: problems and solutions
Problems:
restrictions in the PDB file format?
Size of file?
Scaling?
Docking?
Visualization of nucleic acids?
Possible solutions:
use of mmCIF/ XML format?
better way to represent nucleic acids?
other?
What is PDB’s responsibility with respect to
molecular visualization?
providing complete &
+ providing links & = Visualization for user
correctly annotated
explanations for
community – general
coordinate files in multiple
visualization
users, researchers,
formats (PDB, cif, XML)
software
annotators, students,
educators, databases,
bio-informaticians
Annotator needs
• Currently we use RasMol and NdbView for quick
visualization and checking
• For annotation, the visualization software should be
capable of:
– Quick display (especially for large complexes)
– Displaying secondary structure
– Selecting atoms or residues for display or
rendering
– Showing symmetry related molecules
– Coloring all or selected residues or chains
– Computing distances
– Displaying standard and unusual ligands
Educator needs
• Visualization programs for educators and students
should:
– Be free
– Be open source
– Be capable of running on multiple platforms
(without browser dependence)
– Be portable
– Be easy to install and use
– Have user friendly interface
Common themes from a survey of ~25 educators from all over the world
Educator requests and visions
• Visualization software for educators and students
should:
– Be more interactive so that students can for
example, make mutations in the structure
– Have better control on superposition of structures
– Have an undo command
– Be able to import and export more file formats
– Have both a menu driven and command line
interface
– have different interfaces for research and
education and perhaps have a tunable interface
– Be a multifunctional suite of programs that can all
read the same or related formats
Summary: how do we proceed from here?
• How do we ensure that large biological molecules like
viruses and ribosomes are uniformly represented in
the PDB file?
• How do we create a channel of communication
between the user community and visualization
software developers in order to develop better
visualization resources?