Transcript Slide 1

I am not a PDBid I am a
Biological Macromolecule
Philip E. Bourne
University of California San Diego
[email protected]
Striving to be Recognized
• The “identity” of a
macromolecular structure
– functional and structural
features and its broad
role in a living system – is
not established very
easily by the majority of
biologists. Given the
technology available to us
today surely it is time that
this situation changed?
This is Not to Say that the Identity
has not Improved
• Improved chemical description of polymers
and monomers
• Remove sequence and taxonomic
inconsistencies
• Improved representation of viruses
• Primary citation assignments
• REMARKS, SF files, NMR restraints….
Henrick et al. NAR 2008 36: D426-D433
For Example…
• Chemical Components Dictionary:
– Model and idealized coordinates
– Chemical descriptors (e.g. SMILES) and systematic names
– Stereochemical assignments and aromatic bond assignments
– IUPAC nomenclature for standard amino acids and nucleotides
with the exception of the well-established convention for Cterminal atoms OXT and HXT
– More conventional atom labeling
– Removal of redundant ligands
– Additional description of protonation states
This now sets the stage for the
next stage of identity
development
The Problem Can be Defined as
A Need to Change the Workflow
Workflow
Entry Point
Sequence
Structure
Function
Pathway…
Literature
The best way to change the
workflow is to remove the barrier
between the literature
(knowledge) and the PDB (data)
How Can This Happen?
Possibility 1 – Proteopedia
A Completely New Beginning
• Advantages
– Anyone can contribute simply
– Community consensus seems to support
quality (e.g. Wikipedia)
• Disadvantages
– Where is the reward?
– Wiki format limited for providing a structural
identity
Eran Hodis, Eric Martz, Jaime Prilusky, Joel L. Sussman
http://www.proteopedia.org
Possibility 2 - iSee
• Advantages
– High quality annotation
• Disadvantages
– Time consuming
– Does not scale
http://www.sgc.ox.ac.uk/iSee
Possibility 3 – Database and
Literature Integration
• Advantages
– Reward through publication
– Potentially comprehensive
– Retains full power of the database and
literature
• Disadvantages
– Literature accessibility
– Harder to do
The Disadvantage of Literature
Accessibility is Disappearing
Slowly
• The NIH Public Access Policy is a Term
and Condition of Award for all grants and
cooperative agreements active in Fiscal
Year 2008 (October 1, 2007- September
30, 2008) or beyond, and for all contracts
awarded after April 7, 2008.
So What is the Policy for NIH
Sponsored Research?
• You can only agree to a journal copyright
policy if that policy allows you to deposit
the paper in PubMed Central (PMC)
• The paper must be deposited in PMC
• How this happens depends on the journal
BioLit http://biolit.ucsd.edu
Our Effort at Database-Literature Integration
• J.L.Fink, S. Kushch, P.
Williams & P.E.Bourne 2008
BioLit: Integrating Biological
Literature with Databases NAR
36(S2) W385-389
• P.E.Bourne, J.L.Fink,
M.Gerstein 2008 Open
Access: Taking Full Advantage
of the Content PLoS Comp.
Biol. (Editorial) 4(3) e1000037
BioLit: Tools for New Modes of Scientific Dissemination
The Knowledge and Data Cycle
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
3. A composite view of
journal and database
content results
3.
2.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
• Biolit integrates
biological literature
and biological
databases and
includes:
– A database of journal
text
– Authoring tools to
facilitate database
storage of journal text
– Tools to make static
tables and figures
interactive
http://biolit.ucsd.edu
How Much of the Structure Literature is
Currently Found in the Accessible PMC?
PMC Growth
16000
14000
Articles Deposited
12000
10000
8000
6000
4000
2000
0
74 9 76 9 78 9 80 9 82 9 84 9 86 9 88 9 90 9 92 9 94 9 96 9 98 0 00 0 02 0 04 0 06 0 08
19
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
Year
• 74127 articles
• 17161 were not
parasable
• 7% - 3814 PDBids out
of 51633 referenced
in ?? PMC articles
• 338 Figures have
legends that include
PDBids
ICTP Trieste, December 10, 2007
Where Can we Go From Here
with BioLit?
The Ideal Situation is to Capture
Relationships as the Paper is
Written
BioLit Plugin Project
Rather than Post-processing the Document the
Author Controls the Semantic Tagging
BioLit Plugin Project
Author
Publisher
Paper
Word File in Docx format
Plugin Architecture
Context-Sensitive Data Access
• Display of information of
database entries when
the user clicks on the ID
in the document
• Display of ontology
terms related to terms in
the document text, using
local database search
Ontologies are Stored in a Local Database
User Configurable Selection
• Fully user configuration
ontology and database
identifier selection
• All searches occur within
the user’s desktop
computer
• Desired ontologies are
downloaded and
installed automatically,
and update periodically
• BioLit installer XML file
provides the application
with the information
needed to download and
install ontologies.
Possibility 4. SciVee - A Different
Kind of Learning Experience
Why not listen to the enthusiastic
author talk about the structure
while you see the structure
respond to their dialog?
YouTube for Scientists
www.scivee.tv
Motivation
Pubcast – Video Integrated
with the Full Text of the Paper
Pubcast - Making
PSP Washington DC Feb. 2008
Channels – Just Like TV
ICTP Trieste, December 2007
Professional Profile
ICTP Trieste, December 2007
Create & Join Communities
and Discussion Groups
ICTP Trieste, December 2007
Finding What you Want
• Tag clouds generated
automatically from
MESH headings
• Full text of the
papers indexed
• Browsing by audience
type, subject,
language etc.
SciVee – Viral Projects
•
•
•
•
Sweetwater School District
“Postercasts”
Science video competitions
“Pubumentaries”
Summary
• New modes of
learning about
structure are possible
• Number 6 never did
get identified
• Time will tell whether
a PDBid will become
more than a number
Acknowledgements
• SciVee Team
– Apryl Bailey
– Tim Beck
–
–
–
–
Leo Chalupa
Marc Friedman
Alex Ramos
Willy Suwanto
CT Watch 2007, 3(3) 26-31
• BioLit Team
• J. Lynn Fink
• Sergey Kushch
• Parker Williams
• Greg Quinn
[email protected]
Questions?
[email protected]
Questions?